Commit Graph

2562 Commits

Author SHA1 Message Date
49f7eb69bf [Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030)
* [Refactor] Refactor DeleteHandler and Cond module (#4925)

This patch mainly do the following refactors:
- Use int64_t instead of int32_t for 'version' in DeleteHandler
- Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments
- Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid
- Use range loop to simplify code
- Reduce some compare operations in Cond::del_eval
- Improve some branch predictions in Reader
- Fix and improve some unit tests
2020-12-08 10:01:18 +08:00
2dbcb726ac [Bug] Fix bug that failed to write meta image of load job (#5029)
In #4863, we add userInfo in load job, but the userInfo must be analyzed
so that it can be written to the image.
2020-12-08 10:00:42 +08:00
eb0cb04a70 Fix a core dump introduced by pr #5022 (#5032)
* fix a core dump caused by pr #5022
2020-12-08 10:00:07 +08:00
3bd56bd441 fix Get FE log file doc typo (#4985) 2020-12-08 07:07:13 +08:00
b9dabc3b5b [Enhance] Push down predicate on value column of unique table to base rowset (#5022) 2020-12-06 08:50:37 +08:00
6021d6fc7f [Performance Optimization] Remove push down conjuncts in olap scan node (#4999)
Push conjunct to Storage Engine as more as possible

olap scan node do not need filter data use push down conjuncts again.

fix #4986
2020-12-06 08:50:08 +08:00
b954dfd82d [Bug] Fix the bug of Largetint and Decimal json load failed. (#4983)
Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.
2020-12-06 08:49:30 +08:00
b1b99ae884 [Function] Support Decimal to calculate variance and standard deviation (#4959) 2020-12-06 08:49:01 +08:00
42dd821021 [Refactor] Private constructor for singleton (#4956) 2020-12-06 08:47:29 +08:00
c440aa07d1 Revert "[Refactor] Refactor DeleteHandler and Cond module (#4925)" (#5028)
This reverts commit 9c9992e0aa28ee85364eebf86a6675f1073e08fb.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-12-05 21:39:49 +08:00
c5f780305e [Repair] Add an option whether to allow the partition column to be NULL (#5013) 2020-12-05 14:58:32 +08:00
9c9992e0aa [Refactor] Refactor DeleteHandler and Cond module (#4925)
This patch mainly do the following refactors:
- Use int64_t instead of int32_t for 'version' in DeleteHandler
- Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments
- Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid
- Use range loop to simplify code
- Reduce some compare operations in Cond::del_eval
- Improve some branch predictions in Reader
- Fix and improve some unit tests
2020-12-04 12:13:30 +08:00
1f236a5339 [BUG] Fix core when schema change (#5018) 2020-12-04 09:53:19 +08:00
8823f2d928 [Buf] Fix incorrect name of TaskWorkerPool (#5015)
'_task_worker_type' is not well initialized when use it to init '_name',
then '_name' is always 'TaskWorkerPool.CREATE_TABLE', this patch fix
this bug.
2020-12-04 09:30:23 +08:00
1ae6de7117 [Enhance] Add "statistics" meta table and fix some mysql compatibility problem (#4991)
1. Add metadata table 'statistics' to store index information;
2. In the header information returned by mysql, the data type length is returned according to the actual type.
2020-12-03 09:38:18 +08:00
bd558f1895 [Doris][Doris On ES] support prefix @ symbol for column name (#5006)
Support `@` leading  column name, such as:

```
CREATE EXTERNAL TABLE `es_10` (
  `@k3` bigint(20) NULL COMMENT "",
  `@k1` boolean NULL COMMENT "",
  `@k2` varchar(20) NULL COMMENT ""
) ENGINE=ELASTICSEARCH
COMMENT "ELASTICSEARCH"
PROPERTIES (
"hosts" = "ip:port",
"user" = "root",
"password" = "",
"index" = "data_type_test",
"type" = "doc",
"transport" = "http"
); 
```
2020-12-03 09:33:49 +08:00
5215727b45 [Function] Let "str_to_date" return correct type (#5004)
The return type of str_to_date depends on whether the time part is included in the format.
If included, it is DATETIME, otherwise it is DATE.
If the format parameter is not constant, the return type will be DATETIME.
The above judgment has been completed in the FE query planning stage,
so here we directly set the value type to the return type set in the query plan.

For example:
A table with one column k1 varchar, and has 2 lines:
    "%Y-%m-%d"
    "%Y-%m-%d %H:%i:%s"
Query:
    SELECT str_to_date("2020-09-01", k1) from tbl;
Result will be:
    2020-09-01 00:00:00
    2020-09-01 00:00:00

Query:
     SELECT str_to_date("2020-09-01", "%Y-%m-%d");
Return type is DATE

Query:
     SELECT str_to_date("2020-09-01", "%Y-%m-%d %H:%i:%s");
Return type is DATETIME
2020-12-03 09:33:26 +08:00
204c15119f [Bug] ConcurrentModificationException when finish transaction (#5003) 2020-12-03 09:33:04 +08:00
92db00bd86 [Bug] Fix concurrent access of _tablets_under_clone in TabletManager (#5000)
_tablets_under_clone in TabletManager is not sharded but the lock
used to prevent concurrent access is sharded, so when shards size
is not 1, it will cause coredump.
This patch fix this bug, and also do some refactor to make shard
locks more convenient to use.
2020-12-03 09:32:44 +08:00
4fa47bc3f5 [Docs]adding instructions for converting dynamic and manual partition tables to each other (#4994)
There is no clear instruction to manually modify partitions, when dynamic partition feature is enabled.
The user will be informed only after trying to modify the partition in the command line.
This PR adds instructions for converting dynamic and manual partition tables to each other
2020-12-03 09:32:30 +08:00
b4c1eabe3f [Bug] fix finished load jobs cost too much heap (#4993)
Since the plan is retained in the task, if the task is not cleaned up, the memory usage will be too large caused Memory leak or OOM.
When load job finished, there is no need to hold the tasks which are the biggest memory consumers.
Fixed #4992
2020-12-02 17:11:27 +08:00
af06adb57f [Doris On ES][Bug-fix] fix boolean predicate pushdown manner (#4990)
Correct handling `boolean` field predicate through set the predicate value to `true`、`false` or `empty set` for DOE
2020-12-02 10:13:13 +08:00
df1f06e60b Optimized the read performance of the table when have multi versions (#4958)
* Optimized the read performance of the table when have multi versions,
changed the merge method of the unique table,
merged the cumulative version data first, and then merged with the base version.
For the data with only one base version, read directly without merging
2020-12-01 12:25:11 +08:00
99404df8b2 [Bug][Compaction] Fix bug that output rowset is not deleted after compaction failure (#4964)
This CL fix 2 bugs:

1. 
When the compaction fails, we must explicitly delete the output rowset,
otherwise the GC logic cannot process these rows.

2. 
Base compaction failed if compaction process include some delete version in SegmentV2,
Because the number of filtered rows is wrong.
2020-11-30 22:02:03 +08:00
ec7e1c6b1b [Refactor] Execute 'pick rowsets' before applying for permits for a compaction task (#4891)
The current compaction mechanism is that there is a producer thread that has been producing compaction tasks,
and the selected tablet must apply for `permits`.
When a tablet could hold `permits`, compaction task for this tablet will be submitted to  thread pool.
We take compaction score as `permits` which is used for limiting memory consumption.
However,  `pick_rowset_to_compaction()` will be executed before the file merge in compaction thread,
and the number of segment files that actually perform the merge operation is smaller than compaction score.
In addition, it is also possible that compaction task exits directly because the tablet doesn't meet
the requirements of compaction. 

This patch optimizes and refactors the code of compaction, so that we can execute 'pick rowsets'
before applying for permits for a compaction task, calculate the number of segment files that actually
participate in the merge operation, and take this number as `permits`.
2020-11-30 11:41:14 +08:00
27ef5b4d2c [Bug] Use the right queryId to audit master only query in non master (#4978)
Add queryId in TMasterOpResult.
Audit it in non master FE.
2020-11-29 11:14:17 +08:00
bb36de52a6 [Bug] Fix locate bug when start_pos larger than str len (#4975)
```
select locate('', 'abc', 10); 
```
Return 0 not 10
2020-11-29 10:38:30 +08:00
d7225d61ef [CodeFormat] Add clang-format script (#4934)
run build-support/check-format.sh to check cpp styles;
run build-support/clang-format.sh to fix cpp style issues;
2020-11-28 18:40:06 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
f944bf4d44 [Compile][Bug] Fix FE compilation bug (#4979)
[Bug] Fix compile failed that cannot find symbol for variable scanRangeLength, Introduced by #4914 #4912
2020-11-28 16:19:54 +08:00
4c63dc0027 [Metric] Add metrics for compaction permits and log for compaction merge (#4893)
1. Add metrics to `used permits` and `waitting permits` for compaction.
It would be useful to monitor `permits` hold by all executing compaction tasks and waitting compaction task.

2. Add log which can be chosen by config  for merge rowsets. 
It would be helpful to track the process of rowsets merging for compaction task which lasts for a long time.
2020-11-28 10:00:08 +08:00
f1248cb10e [BUG] Fix colocate balance bug when there is decommissioned be (#4955)
We should ignore decommissioned BE when select BEs to balance group bucketSeq.
2020-11-28 09:59:25 +08:00
2e9c8dda04 [Doris On ES][Bug-Fix] fix problem for selecting random be (#4972)
1.  Random().nextInt() maybe return negative numeric value which would result in `java.lang.ArrayIndexOutOfBoundsException`, 
pass a positive numeric value would avoid this problem.

```
int seed = new Random().nextInt(Short.MAX_VALUE) % nodesInfo.size()
```

2.  EsNodeInfo[] nodeInfos = (EsNodeInfo[]) nodesInfo.values().toArray() maybe lead `java.lang.ClassCastException  in some JDK version : [Ljava.lang.Object; cannot be cast to [Lorg.apache.doris.external.elasticsearch.EsNodeInfo` , pass the original `Class Type` can resolve this.

```
EsNodeInfo[] nodeInfos = nodesInfo.values().toArray(new EsNodeInfo[0]);
```
2020-11-28 09:57:44 +08:00
2331ce10f1 [Bug]Parquet map/list/struct structure recognize (#4968)
When a parquet file contains a `Map/List/Struct` structure, Doris can not recognize the column correctly,
and throws exception 'Invalid column: xxxx', that means Doris can not find the column.
The `Map` structure will be recognized into two columns: `key and value`.
The follow is the schema of a parquet file recognized by Doris. This patch tries to solve this problem.
2020-11-28 09:56:29 +08:00
cb749ce51d [Improvement] Add parquet file name to the error message (#4954)
When a user tries to load parquet file into Doris, like this path: `hdfs://hadoop/user/data/date=20201024/*`,
but acturally the path contains some none parquet files,the error is throwed
`Couldn't deserialize thrift: No more data to read.\\nDeserializing page header failed.`.
If the error message includes the file name information, we can quickly locate the errors.
Therefore, this patch try to add the file name to the error message.
2020-11-28 09:54:18 +08:00
c6bc30e375 [Bug] Fix httpv2 append extra useless information in get_small_file api (#4953) 2020-11-28 09:52:52 +08:00
55ce88da34 [Schema change] Support More column type in schema change (#4938)
1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE
and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937)

2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.
2020-11-28 09:52:28 +08:00
3b56b601fb Show fe commit hash on proc (#4943)
Show FE's commit has in SHOW PROC "/frontends" result.
2020-11-28 09:50:48 +08:00
0493eb172f [Optimize] optimize host selection strategy (#4914)
When a tablet selects which replica's host to execute scan operation,
it takes `round-robin` strategy to load balance. `minAssignedBytes` is the current load of one host.
If a backend is not alive momently, it will randomly take one of other replicas as the choice,
but the unalive backend's `minAssignedBytes`  not be descreased and the new choice's `minAssignedBytes`
also not be increased. That will make the real load of the backends not correct.
2020-11-28 09:48:13 +08:00
68db176013 [Refator]Modify code write error (#4950)
* fix typo in udf: replace function

Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2020-11-27 12:16:45 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
2682712349 [Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941) 2020-11-24 22:21:19 +08:00
b7b1d5eb38 [Refactor] Short circuit return to avoid meaningless loop (#4933) 2020-11-24 13:46:50 +08:00
37a6731244 [BUG] Fix Colocate table balance bug (#4936)
Fix bug that colocation group is always in unstable status.
2020-11-22 21:22:44 +08:00
584b33f95b [Bug] Fix the bug of NULL do not show in CTE statement. (#4932)
All Column create in inlineView will set `allowNull = false`, which will cause `NULL` data in CTE be process will be ignore.
So we should set column in inlineView allowNull to make sure correct of query.
2020-11-22 20:58:03 +08:00
8e9bbfb3ba [Script] Check and create if the log directory not existed before outputing message to the log file. (#4929)
This is a minor issue when we had FE start after a fresh installation,
but it will occur an error about the log directory is missing due to log directory is not existed
before some environment check message outputing to the log file.
the log directory creation code in bin/start_fe.sh is in the wrong place,
only need to put the log directory creation code in the beginning.
2020-11-22 20:52:32 +08:00
c28769c512 [Bug] Avoid partition prune if predicate is not with SlotRef (#4833) (#4921) 2020-11-22 20:49:20 +08:00
4f7c6da1f5 [Refactor] Refactor function getScanRangeLength (#4912)
getScanRangeLength always return 1, it is no need to maintain a function like this.
2020-11-22 20:44:11 +08:00
fb7f4c8791 [Bug] fix bug that be thrift client cannot connect to fe thrift server when fe thrift server use TThreadedSelectorServer model (#4908)
Fix bug that be thrift client cannot connect to fe thrift server when fe thrift server use TThreadedSelectorServer model
2020-11-22 20:40:33 +08:00
f1b57c4418 [Optimize] Avoid repeated sending of common components in Fragments (#4904)
This CL mainly changes:

1. Avoid repeated sending of common components in Fragments

    In the previous implementation, a query may generate multiple Fragments,
these Fragments contain some common information, such as DescriptorTable.
Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly
and generated repeatedly on the BE side.

    In some complex SQL, these public information may be very large,
thereby increasing the execution time of Fragment.

    So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry
these public information, and it will be cached on the BE side, and subsequent Fragments
no longer need to carry this information.

    In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second.

2. Add the time-consuming part of FE logic in Profile

    Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.
2020-11-22 20:38:05 +08:00