doris

Author	SHA1	Message	Date
Dam1029	834834dc44	[SparkLoadk] Avoid to read whole hive table when we add a where (#5047 ) When we use spark load from hive table, the function loadDataFromHiveTable will read whole hive table and then filter the data in process() if hive table have lots of partitions and history data，the load will be cost too much time and resource. So we can do filter work in loadDataFromHiveTable function when read from hive table. Co-authored-by: 杜安明 <anming.du@mihoyo.com>	2020-12-15 09:26:42 +08:00
Lijia Liu	ff4bd1223f	[Profile] Add cpu time cost in query audit (#5051 )	2020-12-13 22:22:15 +08:00
Lijia Liu	f847e22eeb	[AuditLog] Send queryId to master FE (#5064 ) For fix #4977, we return queryId in master FE when finish query for non master to audit it in #4978. But when the query fail(timeout), the client may not receive the right queryId for audit. In this PR: None master FE send queryId to master for querying; Add more log.	2020-12-13 22:05:35 +08:00
HappenLee	115d4332aa	[ODBC] Support ODBC Sink for insert into data to ODBC external table (#5033 ) issue:#5031 1. Support ODBC Sink for insert into data to ODBC external table. 2. Support Transaction for ODBC sink to make sure insert into data is atomicital. 3. The document about ODBC sink has been modified	2020-12-13 21:53:27 +08:00
Zhengguo Yang	1267d6bf66	[Bug][MultiLoad] Fix multiload missing userinfo and rebase error (#5058 )	2020-12-11 12:01:32 +08:00
sduzh	e47fb502b2	[Compatibility] Support embedded quota in string literal (#5045 ) ``` mysql> select 'I''m a student'; +-----------------+ \| 'I'm a student' \| +-----------------+ \| I'm a student \| +-----------------+ mysql> select "I""m a student"; +-----------------+ \| 'I"m a student' \| +-----------------+ \| I"m a student \| +-----------------+ mysql> select 'I""m a student'; +------------------+ \| 'I""m a student' \| +------------------+ \| I""m a student \| +------------------+ mysql> select "I''m a student"; +------------------+ \| 'I''m a student' \| +------------------+ \| I''m a student \| +------------------+ ```	2020-12-10 21:34:06 +08:00
Zhengguo Yang	bc063ebce2	fix typo in docs (#5046 )	2020-12-10 15:10:22 +08:00
Zhengguo Yang	e278e0b3db	[Load] Support full StreamLoad feature in multiload (#4717 )	2020-12-10 09:37:18 +08:00
Mingyu Chen	ca9e5c4785	[Bug] Add a flag to prevent repeated close operation of OlapTabletSink (#5034 ) The close method of OlapTabletSink may be called twice. In the open_internal() method of plan_fragment_executor, close is called once. If an error occurs in this call, it will be called again in fragment_mgr. So here we use a flag to prevent repeated close operations. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-12-09 09:30:09 +08:00
leonxpray	56fd82ffa1	[Doc] Fix enable_strict_storage_medium_check description (#5023 )	2020-12-09 09:29:37 +08:00
weizuo93	f2d69a51d4	[Docs]Remove some unused variables and update BE config documents (#4987 ) Remove some unused variables and update BE config documents about compaction.	2020-12-09 09:28:56 +08:00
Yingchun Lai	49f7eb69bf	[Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030 ) * [Refactor] Refactor DeleteHandler and Cond module (#4925) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-08 10:01:18 +08:00
Mingyu Chen	2dbcb726ac	[Bug] Fix bug that failed to write meta image of load job (#5029 ) In #4863, we add userInfo in load job, but the userInfo must be analyzed so that it can be written to the image.	2020-12-08 10:00:42 +08:00
Zhengguo Yang	eb0cb04a70	Fix a core dump introduced by pr #5022 (#5032 ) * fix a core dump caused by pr #5022	2020-12-08 10:00:07 +08:00
Zhengguo Yang	3bd56bd441	fix Get FE log file doc typo (#4985 )	2020-12-08 07:07:13 +08:00
Zhengguo Yang	b9dabc3b5b	[Enhance] Push down predicate on value column of unique table to base rowset (#5022 )	2020-12-06 08:50:37 +08:00
HappenLee	6021d6fc7f	[Performance Optimization] Remove push down conjuncts in olap scan node (#4999 ) Push conjunct to Storage Engine as more as possible olap scan node do not need filter data use push down conjuncts again. fix #4986	2020-12-06 08:50:08 +08:00
HappenLee	b954dfd82d	[Bug] Fix the bug of Largetint and Decimal json load failed. (#4983 ) Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.	2020-12-06 08:49:30 +08:00
Xinyi Zou	b1b99ae884	[Function] Support Decimal to calculate variance and standard deviation (#4959 )	2020-12-06 08:49:01 +08:00
Hao Tan	42dd821021	[Refactor] Private constructor for singleton (#4956 )	2020-12-06 08:47:29 +08:00
Mingyu Chen	c440aa07d1	Revert "[Refactor] Refactor DeleteHandler and Cond module (#4925 )" (#5028 ) This reverts commit 9c9992e0aa28ee85364eebf86a6675f1073e08fb. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-12-05 21:39:49 +08:00
Xinyi Zou	c5f780305e	[Repair] Add an option whether to allow the partition column to be NULL (#5013 )	2020-12-05 14:58:32 +08:00
Yingchun Lai	9c9992e0aa	[Refactor] Refactor DeleteHandler and Cond module (#4925 ) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-04 12:13:30 +08:00
Zhengguo Yang	1f236a5339	[BUG] Fix core when schema change (#5018 )	2020-12-04 09:53:19 +08:00
Yingchun Lai	8823f2d928	[Buf] Fix incorrect name of TaskWorkerPool (#5015 ) '_task_worker_type' is not well initialized when use it to init '_name', then '_name' is always 'TaskWorkerPool.CREATE_TABLE', this patch fix this bug.	2020-12-04 09:30:23 +08:00
songchuangyuan	1ae6de7117	[Enhance] Add "statistics" meta table and fix some mysql compatibility problem (#4991 ) 1. Add metadata table 'statistics' to store index information; 2. In the header information returned by mysql, the data type length is returned according to the actual type.	2020-12-03 09:38:18 +08:00
Yunfeng,Wu	bd558f1895	[Doris][Doris On ES] support prefix @ symbol for column name (#5006 ) Support `@` leading column name, such as: ``` CREATE EXTERNAL TABLE `es_10` ( `@k3` bigint(20) NULL COMMENT "", `@k1` boolean NULL COMMENT "", `@k2` varchar(20) NULL COMMENT "" ) ENGINE=ELASTICSEARCH COMMENT "ELASTICSEARCH" PROPERTIES ( "hosts" = "ip:port", "user" = "root", "password" = "", "index" = "data_type_test", "type" = "doc", "transport" = "http" ); ```	2020-12-03 09:33:49 +08:00
Mingyu Chen	5215727b45	[Function] Let "str_to_date" return correct type (#5004 ) The return type of str_to_date depends on whether the time part is included in the format. If included, it is DATETIME, otherwise it is DATE. If the format parameter is not constant, the return type will be DATETIME. The above judgment has been completed in the FE query planning stage, so here we directly set the value type to the return type set in the query plan. For example: A table with one column k1 varchar, and has 2 lines: "%Y-%m-%d" "%Y-%m-%d %H:%i:%s" Query: SELECT str_to_date("2020-09-01", k1) from tbl; Result will be: 2020-09-01 00:00:00 2020-09-01 00:00:00 Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d"); Return type is DATE Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d %H:%i:%s"); Return type is DATETIME	2020-12-03 09:33:26 +08:00
wangbo	204c15119f	[Bug] ConcurrentModificationException when finish transaction (#5003 )	2020-12-03 09:33:04 +08:00
Yingchun Lai	92db00bd86	[Bug] Fix concurrent access of _tablets_under_clone in TabletManager (#5000 ) _tablets_under_clone in TabletManager is not sharded but the lock used to prevent concurrent access is sharded, so when shards size is not 1, it will cause coredump. This patch fix this bug, and also do some refactor to make shard locks more convenient to use.	2020-12-03 09:32:44 +08:00
qiye	4fa47bc3f5	[Docs]adding instructions for converting dynamic and manual partition tables to each other (#4994 ) There is no clear instruction to manually modify partitions, when dynamic partition feature is enabled. The user will be informed only after trying to modify the partition in the command line. This PR adds instructions for converting dynamic and manual partition tables to each other	2020-12-03 09:32:30 +08:00
qiye	b4c1eabe3f	[Bug] fix finished load jobs cost too much heap (#4993 ) Since the plan is retained in the task, if the task is not cleaned up, the memory usage will be too large caused Memory leak or OOM. When load job finished, there is no need to hold the tasks which are the biggest memory consumers. Fixed #4992	2020-12-02 17:11:27 +08:00
Yunfeng,Wu	af06adb57f	[Doris On ES][Bug-fix] fix boolean predicate pushdown manner (#4990 ) Correct handling `boolean` field predicate through set the predicate value to `true`、`false` or `empty set` for DOE	2020-12-02 10:13:13 +08:00
Zhengguo Yang	df1f06e60b	Optimized the read performance of the table when have multi versions (#4958 ) * Optimized the read performance of the table when have multi versions, changed the merge method of the unique table, merged the cumulative version data first, and then merged with the base version. For the data with only one base version, read directly without merging	2020-12-01 12:25:11 +08:00
Mingyu Chen	99404df8b2	[Bug][Compaction] Fix bug that output rowset is not deleted after compaction failure (#4964 ) This CL fix 2 bugs: 1. When the compaction fails, we must explicitly delete the output rowset, otherwise the GC logic cannot process these rows. 2. Base compaction failed if compaction process include some delete version in SegmentV2, Because the number of filtered rows is wrong.	2020-11-30 22:02:03 +08:00
weizuo93	ec7e1c6b1b	[Refactor] Execute 'pick rowsets' before applying for permits for a compaction task (#4891 ) The current compaction mechanism is that there is a producer thread that has been producing compaction tasks, and the selected tablet must apply for `permits`. When a tablet could hold `permits`, compaction task for this tablet will be submitted to thread pool. We take compaction score as `permits` which is used for limiting memory consumption. However, `pick_rowset_to_compaction()` will be executed before the file merge in compaction thread, and the number of segment files that actually perform the merge operation is smaller than compaction score. In addition, it is also possible that compaction task exits directly because the tablet doesn't meet the requirements of compaction. This patch optimizes and refactors the code of compaction, so that we can execute 'pick rowsets' before applying for permits for a compaction task, calculate the number of segment files that actually participate in the merge operation, and take this number as `permits`.	2020-11-30 11:41:14 +08:00
Lijia Liu	27ef5b4d2c	[Bug] Use the right queryId to audit master only query in non master (#4978 ) Add queryId in TMasterOpResult. Audit it in non master FE.	2020-11-29 11:14:17 +08:00
lichaoyong	bb36de52a6	[Bug] Fix locate bug when start_pos larger than str len (#4975 ) ``` select locate('', 'abc', 10); ``` Return 0 not 10	2020-11-29 10:38:30 +08:00
sduzh	d7225d61ef	[CodeFormat] Add clang-format script (#4934 ) run build-support/check-format.sh to check cpp styles; run build-support/clang-format.sh to fix cpp style issues;	2020-11-28 18:40:06 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Mingyu Chen	f944bf4d44	[Compile][Bug] Fix FE compilation bug (#4979 ) [Bug] Fix compile failed that cannot find symbol for variable scanRangeLength, Introduced by #4914 #4912	2020-11-28 16:19:54 +08:00
weizuo93	4c63dc0027	[Metric] Add metrics for compaction permits and log for compaction merge (#4893 ) 1. Add metrics to `used permits` and `waitting permits` for compaction. It would be useful to monitor `permits` hold by all executing compaction tasks and waitting compaction task. 2. Add log which can be chosen by config for merge rowsets. It would be helpful to track the process of rowsets merging for compaction task which lasts for a long time.	2020-11-28 10:00:08 +08:00
gengjun-git	f1248cb10e	[BUG] Fix colocate balance bug when there is decommissioned be (#4955 ) We should ignore decommissioned BE when select BEs to balance group bucketSeq.	2020-11-28 09:59:25 +08:00
Yunfeng,Wu	2e9c8dda04	[Doris On ES][Bug-Fix] fix problem for selecting random be (#4972 ) 1. Random().nextInt() maybe return negative numeric value which would result in `java.lang.ArrayIndexOutOfBoundsException`, pass a positive numeric value would avoid this problem. ``` int seed = new Random().nextInt(Short.MAX_VALUE) % nodesInfo.size() ``` 2. EsNodeInfo[] nodeInfos = (EsNodeInfo[]) nodesInfo.values().toArray() maybe lead `java.lang.ClassCastException in some JDK version : [Ljava.lang.Object; cannot be cast to [Lorg.apache.doris.external.elasticsearch.EsNodeInfo` , pass the original `Class Type` can resolve this. ``` EsNodeInfo[] nodeInfos = nodesInfo.values().toArray(new EsNodeInfo[0]); ```	2020-11-28 09:57:44 +08:00
xinghuayu007	2331ce10f1	[Bug]Parquet map/list/struct structure recognize (#4968 ) When a parquet file contains a `Map/List/Struct` structure, Doris can not recognize the column correctly, and throws exception 'Invalid column: xxxx', that means Doris can not find the column. The `Map` structure will be recognized into two columns: `key and value`. The follow is the schema of a parquet file recognized by Doris. This patch tries to solve this problem.	2020-11-28 09:56:29 +08:00
xinghuayu007	cb749ce51d	[Improvement] Add parquet file name to the error message (#4954 ) When a user tries to load parquet file into Doris, like this path: `hdfs://hadoop/user/data/date=20201024/*`, but acturally the path contains some none parquet files，the error is throwed `Couldn't deserialize thrift: No more data to read.\\nDeserializing page header failed.`. If the error message includes the file name information, we can quickly locate the errors. Therefore, this patch try to add the file name to the error message.	2020-11-28 09:54:18 +08:00
Zhengguo Yang	c6bc30e375	[Bug] Fix httpv2 append extra useless information in get_small_file api (#4953 )	2020-11-28 09:52:52 +08:00
HappenLee	55ce88da34	[Schema change] Support More column type in schema change (#4938 ) 1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937) 2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.	2020-11-28 09:52:28 +08:00
ccoffline	3b56b601fb	Show fe commit hash on proc (#4943 ) Show FE's commit has in SHOW PROC "/frontends" result.	2020-11-28 09:50:48 +08:00
xinghuayu007	0493eb172f	[Optimize] optimize host selection strategy (#4914 ) When a tablet selects which replica's host to execute scan operation, it takes `round-robin` strategy to load balance. `minAssignedBytes` is the current load of one host. If a backend is not alive momently, it will randomly take one of other replicas as the choice, but the unalive backend's `minAssignedBytes` not be descreased and the new choice's `minAssignedBytes` also not be increased. That will make the real load of the backends not correct.	2020-11-28 09:48:13 +08:00

... 209 210 211 212 213 ...

13073 Commits