doris

Author	SHA1	Message	Date
Yingchun Lai	49f7eb69bf	[Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030 ) * [Refactor] Refactor DeleteHandler and Cond module (#4925) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-08 10:01:18 +08:00
Mingyu Chen	2dbcb726ac	[Bug] Fix bug that failed to write meta image of load job (#5029 ) In #4863, we add userInfo in load job, but the userInfo must be analyzed so that it can be written to the image.	2020-12-08 10:00:42 +08:00
Zhengguo Yang	eb0cb04a70	Fix a core dump introduced by pr #5022 (#5032 ) * fix a core dump caused by pr #5022	2020-12-08 10:00:07 +08:00
Zhengguo Yang	3bd56bd441	fix Get FE log file doc typo (#4985 )	2020-12-08 07:07:13 +08:00
Zhengguo Yang	b9dabc3b5b	[Enhance] Push down predicate on value column of unique table to base rowset (#5022 )	2020-12-06 08:50:37 +08:00
HappenLee	6021d6fc7f	[Performance Optimization] Remove push down conjuncts in olap scan node (#4999 ) Push conjunct to Storage Engine as more as possible olap scan node do not need filter data use push down conjuncts again. fix #4986	2020-12-06 08:50:08 +08:00
HappenLee	b954dfd82d	[Bug] Fix the bug of Largetint and Decimal json load failed. (#4983 ) Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.	2020-12-06 08:49:30 +08:00
Xinyi Zou	b1b99ae884	[Function] Support Decimal to calculate variance and standard deviation (#4959 )	2020-12-06 08:49:01 +08:00
Hao Tan	42dd821021	[Refactor] Private constructor for singleton (#4956 )	2020-12-06 08:47:29 +08:00
Mingyu Chen	c440aa07d1	Revert "[Refactor] Refactor DeleteHandler and Cond module (#4925 )" (#5028 ) This reverts commit 9c9992e0aa28ee85364eebf86a6675f1073e08fb. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-12-05 21:39:49 +08:00
Xinyi Zou	c5f780305e	[Repair] Add an option whether to allow the partition column to be NULL (#5013 )	2020-12-05 14:58:32 +08:00
Yingchun Lai	9c9992e0aa	[Refactor] Refactor DeleteHandler and Cond module (#4925 ) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-04 12:13:30 +08:00
Zhengguo Yang	1f236a5339	[BUG] Fix core when schema change (#5018 )	2020-12-04 09:53:19 +08:00
Yingchun Lai	8823f2d928	[Buf] Fix incorrect name of TaskWorkerPool (#5015 ) '_task_worker_type' is not well initialized when use it to init '_name', then '_name' is always 'TaskWorkerPool.CREATE_TABLE', this patch fix this bug.	2020-12-04 09:30:23 +08:00
songchuangyuan	1ae6de7117	[Enhance] Add "statistics" meta table and fix some mysql compatibility problem (#4991 ) 1. Add metadata table 'statistics' to store index information; 2. In the header information returned by mysql, the data type length is returned according to the actual type.	2020-12-03 09:38:18 +08:00
Yunfeng,Wu	bd558f1895	[Doris][Doris On ES] support prefix @ symbol for column name (#5006 ) Support `@` leading column name, such as: ``` CREATE EXTERNAL TABLE `es_10` ( `@k3` bigint(20) NULL COMMENT "", `@k1` boolean NULL COMMENT "", `@k2` varchar(20) NULL COMMENT "" ) ENGINE=ELASTICSEARCH COMMENT "ELASTICSEARCH" PROPERTIES ( "hosts" = "ip:port", "user" = "root", "password" = "", "index" = "data_type_test", "type" = "doc", "transport" = "http" ); ```	2020-12-03 09:33:49 +08:00
Mingyu Chen	5215727b45	[Function] Let "str_to_date" return correct type (#5004 ) The return type of str_to_date depends on whether the time part is included in the format. If included, it is DATETIME, otherwise it is DATE. If the format parameter is not constant, the return type will be DATETIME. The above judgment has been completed in the FE query planning stage, so here we directly set the value type to the return type set in the query plan. For example: A table with one column k1 varchar, and has 2 lines: "%Y-%m-%d" "%Y-%m-%d %H:%i:%s" Query: SELECT str_to_date("2020-09-01", k1) from tbl; Result will be: 2020-09-01 00:00:00 2020-09-01 00:00:00 Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d"); Return type is DATE Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d %H:%i:%s"); Return type is DATETIME	2020-12-03 09:33:26 +08:00
wangbo	204c15119f	[Bug] ConcurrentModificationException when finish transaction (#5003 )	2020-12-03 09:33:04 +08:00
Yingchun Lai	92db00bd86	[Bug] Fix concurrent access of _tablets_under_clone in TabletManager (#5000 ) _tablets_under_clone in TabletManager is not sharded but the lock used to prevent concurrent access is sharded, so when shards size is not 1, it will cause coredump. This patch fix this bug, and also do some refactor to make shard locks more convenient to use.	2020-12-03 09:32:44 +08:00
qiye	4fa47bc3f5	[Docs]adding instructions for converting dynamic and manual partition tables to each other (#4994 ) There is no clear instruction to manually modify partitions, when dynamic partition feature is enabled. The user will be informed only after trying to modify the partition in the command line. This PR adds instructions for converting dynamic and manual partition tables to each other	2020-12-03 09:32:30 +08:00
qiye	b4c1eabe3f	[Bug] fix finished load jobs cost too much heap (#4993 ) Since the plan is retained in the task, if the task is not cleaned up, the memory usage will be too large caused Memory leak or OOM. When load job finished, there is no need to hold the tasks which are the biggest memory consumers. Fixed #4992	2020-12-02 17:11:27 +08:00
Yunfeng,Wu	af06adb57f	[Doris On ES][Bug-fix] fix boolean predicate pushdown manner (#4990 ) Correct handling `boolean` field predicate through set the predicate value to `true`、`false` or `empty set` for DOE	2020-12-02 10:13:13 +08:00
Zhengguo Yang	df1f06e60b	Optimized the read performance of the table when have multi versions (#4958 ) * Optimized the read performance of the table when have multi versions, changed the merge method of the unique table, merged the cumulative version data first, and then merged with the base version. For the data with only one base version, read directly without merging	2020-12-01 12:25:11 +08:00
Mingyu Chen	99404df8b2	[Bug][Compaction] Fix bug that output rowset is not deleted after compaction failure (#4964 ) This CL fix 2 bugs: 1. When the compaction fails, we must explicitly delete the output rowset, otherwise the GC logic cannot process these rows. 2. Base compaction failed if compaction process include some delete version in SegmentV2, Because the number of filtered rows is wrong.	2020-11-30 22:02:03 +08:00
weizuo93	ec7e1c6b1b	[Refactor] Execute 'pick rowsets' before applying for permits for a compaction task (#4891 ) The current compaction mechanism is that there is a producer thread that has been producing compaction tasks, and the selected tablet must apply for `permits`. When a tablet could hold `permits`, compaction task for this tablet will be submitted to thread pool. We take compaction score as `permits` which is used for limiting memory consumption. However, `pick_rowset_to_compaction()` will be executed before the file merge in compaction thread, and the number of segment files that actually perform the merge operation is smaller than compaction score. In addition, it is also possible that compaction task exits directly because the tablet doesn't meet the requirements of compaction. This patch optimizes and refactors the code of compaction, so that we can execute 'pick rowsets' before applying for permits for a compaction task, calculate the number of segment files that actually participate in the merge operation, and take this number as `permits`.	2020-11-30 11:41:14 +08:00
Lijia Liu	27ef5b4d2c	[Bug] Use the right queryId to audit master only query in non master (#4978 ) Add queryId in TMasterOpResult. Audit it in non master FE.	2020-11-29 11:14:17 +08:00
lichaoyong	bb36de52a6	[Bug] Fix locate bug when start_pos larger than str len (#4975 ) ``` select locate('', 'abc', 10); ``` Return 0 not 10	2020-11-29 10:38:30 +08:00
sduzh	d7225d61ef	[CodeFormat] Add clang-format script (#4934 ) run build-support/check-format.sh to check cpp styles; run build-support/clang-format.sh to fix cpp style issues;	2020-11-28 18:40:06 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Mingyu Chen	f944bf4d44	[Compile][Bug] Fix FE compilation bug (#4979 ) [Bug] Fix compile failed that cannot find symbol for variable scanRangeLength, Introduced by #4914 #4912	2020-11-28 16:19:54 +08:00
weizuo93	4c63dc0027	[Metric] Add metrics for compaction permits and log for compaction merge (#4893 ) 1. Add metrics to `used permits` and `waitting permits` for compaction. It would be useful to monitor `permits` hold by all executing compaction tasks and waitting compaction task. 2. Add log which can be chosen by config for merge rowsets. It would be helpful to track the process of rowsets merging for compaction task which lasts for a long time.	2020-11-28 10:00:08 +08:00
gengjun-git	f1248cb10e	[BUG] Fix colocate balance bug when there is decommissioned be (#4955 ) We should ignore decommissioned BE when select BEs to balance group bucketSeq.	2020-11-28 09:59:25 +08:00
Yunfeng,Wu	2e9c8dda04	[Doris On ES][Bug-Fix] fix problem for selecting random be (#4972 ) 1. Random().nextInt() maybe return negative numeric value which would result in `java.lang.ArrayIndexOutOfBoundsException`, pass a positive numeric value would avoid this problem. ``` int seed = new Random().nextInt(Short.MAX_VALUE) % nodesInfo.size() ``` 2. EsNodeInfo[] nodeInfos = (EsNodeInfo[]) nodesInfo.values().toArray() maybe lead `java.lang.ClassCastException in some JDK version : [Ljava.lang.Object; cannot be cast to [Lorg.apache.doris.external.elasticsearch.EsNodeInfo` , pass the original `Class Type` can resolve this. ``` EsNodeInfo[] nodeInfos = nodesInfo.values().toArray(new EsNodeInfo[0]); ```	2020-11-28 09:57:44 +08:00
xinghuayu007	2331ce10f1	[Bug]Parquet map/list/struct structure recognize (#4968 ) When a parquet file contains a `Map/List/Struct` structure, Doris can not recognize the column correctly, and throws exception 'Invalid column: xxxx', that means Doris can not find the column. The `Map` structure will be recognized into two columns: `key and value`. The follow is the schema of a parquet file recognized by Doris. This patch tries to solve this problem.	2020-11-28 09:56:29 +08:00
xinghuayu007	cb749ce51d	[Improvement] Add parquet file name to the error message (#4954 ) When a user tries to load parquet file into Doris, like this path: `hdfs://hadoop/user/data/date=20201024/*`, but acturally the path contains some none parquet files，the error is throwed `Couldn't deserialize thrift: No more data to read.\\nDeserializing page header failed.`. If the error message includes the file name information, we can quickly locate the errors. Therefore, this patch try to add the file name to the error message.	2020-11-28 09:54:18 +08:00
Zhengguo Yang	c6bc30e375	[Bug] Fix httpv2 append extra useless information in get_small_file api (#4953 )	2020-11-28 09:52:52 +08:00
HappenLee	55ce88da34	[Schema change] Support More column type in schema change (#4938 ) 1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937) 2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.	2020-11-28 09:52:28 +08:00
ccoffline	3b56b601fb	Show fe commit hash on proc (#4943 ) Show FE's commit has in SHOW PROC "/frontends" result.	2020-11-28 09:50:48 +08:00
xinghuayu007	0493eb172f	[Optimize] optimize host selection strategy (#4914 ) When a tablet selects which replica's host to execute scan operation, it takes `round-robin` strategy to load balance. `minAssignedBytes` is the current load of one host. If a backend is not alive momently, it will randomly take one of other replicas as the choice, but the unalive backend's `minAssignedBytes` not be descreased and the new choice's `minAssignedBytes` also not be increased. That will make the real load of the backends not correct.	2020-11-28 09:48:13 +08:00
xinghuayu007	68db176013	[Refator]Modify code write error (#4950 ) * fix typo in udf: replace function Co-authored-by: wangxixu <wangxixu@xiaomi.com>	2020-11-27 12:16:45 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
HappenLee	2682712349	[Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941 )	2020-11-24 22:21:19 +08:00
Yingchun Lai	b7b1d5eb38	[Refactor] Short circuit return to avoid meaningless loop (#4933 )	2020-11-24 13:46:50 +08:00
gengjun-git	37a6731244	[BUG] Fix Colocate table balance bug (#4936 ) Fix bug that colocation group is always in unstable status.	2020-11-22 21:22:44 +08:00
HappenLee	584b33f95b	[Bug] Fix the bug of NULL do not show in CTE statement. (#4932 ) All Column create in inlineView will set `allowNull = false`, which will cause `NULL` data in CTE be process will be ignore. So we should set column in inlineView allowNull to make sure correct of query.	2020-11-22 20:58:03 +08:00
id4alexsu	8e9bbfb3ba	[Script] Check and create if the log directory not existed before outputing message to the log file. (#4929 ) This is a minor issue when we had FE start after a fresh installation, but it will occur an error about the log directory is missing due to log directory is not existed before some environment check message outputing to the log file. the log directory creation code in bin/start_fe.sh is in the wrong place, only need to put the log directory creation code in the beginning.	2020-11-22 20:52:32 +08:00
ccoffline	c28769c512	[Bug] Avoid partition prune if predicate is not with SlotRef (#4833 ) (#4921 )	2020-11-22 20:49:20 +08:00
xinghuayu007	4f7c6da1f5	[Refactor] Refactor function getScanRangeLength (#4912 ) getScanRangeLength always return 1, it is no need to maintain a function like this.	2020-11-22 20:44:11 +08:00
caiconghui	fb7f4c8791	[Bug] fix bug that be thrift client cannot connect to fe thrift server when fe thrift server use `TThreadedSelectorServer` model (#4908 ) Fix bug that be thrift client cannot connect to fe thrift server when fe thrift server use TThreadedSelectorServer model	2020-11-22 20:40:33 +08:00
Mingyu Chen	f1b57c4418	[Optimize] Avoid repeated sending of common components in Fragments (#4904 ) This CL mainly changes: 1. Avoid repeated sending of common components in Fragments In the previous implementation, a query may generate multiple Fragments, these Fragments contain some common information, such as DescriptorTable. Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly and generated repeatedly on the BE side. In some complex SQL, these public information may be very large, thereby increasing the execution time of Fragment. So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry these public information, and it will be cached on the BE side, and subsequent Fragments no longer need to carry this information. In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second. 2. Add the time-consuming part of FE logic in Profile Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.	2020-11-22 20:38:05 +08:00

1 2 3 4 5 ...

2562 Commits