doris

Author	SHA1	Message	Date
morrySnow	54c07f8782	[regression](Nereids) add back tpch regression test cases (#13826 ) 1. add back TPC-H regression test cases 2. fix decimal problem on aggregate function sum and agg introduced by #13764 3. fix memo merge group NPE introduced by #13900	2022-11-08 16:40:46 +08:00
Pxl	df89e46761	[fix](build) fix compile fail on Segment::open (#14058 )	2022-11-08 14:38:40 +08:00
zhangstar333	f7ecb6d79f	[Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978 ) fix sub_bitmap calculate wrong result to return null	2022-11-08 14:10:12 +08:00
Mingyu Chen	1c07a01038	[feature](multi-catalog) Support data on s3-compatible oss and support aliyun DLF (#13994 ) Support Aliyun DLF Support data on s3-compatible object storage, such as aliyun oss. Refactor some interface of catalog, to make it more tidy. Fix bug that the default text format field delimiter of hive should be \x01 Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.	2022-11-08 14:02:41 +08:00
谢健	61d4974ba1	[fix](Nereids) Use simple cost to calculate benefit and avoid unuseless calculation (#14056 ) In GraphSimplifier, we can use simple cost to calculate the benefit. And only when the best neighbor of the apply step is the processing edge, we need to update recursively.	2022-11-08 13:11:38 +08:00
slothever	c2a01e84b4	[feature-wip](multi-catalog) fix page index filter bug (#14015 ) Fix page index filter not take effect when multiple columns Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-11-08 12:10:12 +08:00
HappenLee	63ea233ae2	[thirdpart](lib) Add lock free queue of concurrentqueue (#14045 )	2022-11-08 11:34:23 +08:00
morrySnow	e6b12ce8e8	[feature](Nereids) support query that group by use alias generated in aggregate output (#14030 ) support query having alias in group by list, such as: SELECT c1 AS a, SUM(c2) FROM t GROUP BY a;	2022-11-08 11:02:42 +08:00
Pxl	9d8b4bc176	[Enhancement](Dictionary-codec) update dict once on same segment (#13936 ) update dict once on same segment	2022-11-08 10:59:35 +08:00
Mingyu Chen	b09e5ced97	[fix](priv) fix meta replay bug when upgrading from 1.1.x to 1.2.x (#14046 )	2022-11-08 10:43:33 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Zhengguo Yang	e1654bc6ef	[Enhancement](function) add to_bitmap() function with int type (#13973 ) to_bitmap function only support string param only，add to_bitmap() function with int type, this can avoid convert int type to string and then convert string to int	2022-11-08 09:15:26 +08:00
Kang	34f43ac781	[bug](like function)fix like '' (empty string) get wrong result with all rows #14035	2022-11-08 08:51:39 +08:00
luozenglin	6ed443c7e8	[enhancement](profile) add instanceNum, tableIds to profile. (#13985 )	2022-11-08 08:49:16 +08:00
starocean999	95591ce49a	[refactor](cv)wait on condition variable more gently (#12620 )	2022-11-08 08:40:31 +08:00
jakevin	17a4746a08	[enhancement](Nereids) support otherJoinConjuncts in cascades join reorder (#13681 )	2022-11-08 00:08:44 +08:00
zhannngchen	d1cbaa1de8	[fix](load) fix a bug that reduce memory work on hard limit might be triggered twice (#13967 ) When the load mem hard limit reached, all load channel should wait on the lock of LoadChannelMgr, util current reduce mem work finished. In current implementation, there's a bug might cause some threads be woke up before reduce mem work finished: thread A found that soft limit reached, picked a load channel and waiting for reduce memory work finish. The memory keep increasing thread B found that hard limit reached (either the load mem hard limit, or process soft limit), it picked a load channel to reduce memory and set the variable _should_wait_flush to true thread C found that _should_wait_flush is true, waiting on _wait_flush_cond thread A finished it's reduce memory work, found that _should_wait_flush is true, set it to false, and notify all threads. thread C is woke up and pick a load channel to do the reduce memory work, and now thread B's work is not finished. We can see 2 threads doing reduce memory work when hard limit reached, it's quite confusing.	2022-11-08 00:07:52 +08:00
TaoZex	241801ca17	[typo](doc) fix get_start doc (#14001 )	2022-11-07 21:28:45 +08:00
Gabriel	1c2532b9dc	[Bug](udf) Make UDF's type always nullable (#14002 )	2022-11-07 20:51:31 +08:00
morrySnow	4ea1b39cb2	[enhancement](Nereids) remove unnecessary decimal cast (#13745 )	2022-11-07 19:24:10 +08:00
谢健	f2978fb6ff	[feat](Nereids) add graph simplifier (#14007 )	2022-11-07 18:45:45 +08:00
morrySnow	22b4c6af20	[feature](Nereids) support statement having aggregate function in order by list (#13976 ) 1. add a feature that support statement having aggregate function in order by list. such as: SELECT COUNT() FROM t GROUP BY c1 ORDER BY COUNT() DESC; 2. add clickbench analyze unit tests	2022-11-07 17:01:31 +08:00
zy-kkk	0031304015	[typo](docs)fix config doc #14010	2022-11-07 17:00:16 +08:00
starocean999	bb9182d602	[fix](repeat)remove unmaterialized expr from repeat node (#13953 )	2022-11-07 14:13:05 +08:00
Wanghuan	7254999f02	[typo](docs) fix docs，delete redundant words #13849	2022-11-07 13:51:10 +08:00
zhoumengyks	3c8524b9d8	[security](fe jar) upgrade commons-codec:commons-codec to 1.13 #13951	2022-11-07 13:50:07 +08:00
yiguolei	32fea672b0	[chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-07 13:39:52 +08:00
Yiliang Qiu	e8d2fb6778	[feature](function)add search functions: multi_search_all_positions & multi_match_any (#13763 ) Co-authored-by: yiliang qiu <yiliang.qiu@qq.com>	2022-11-07 11:50:55 +08:00
lihangyu	7ffe88b579	[feature-array](array-type) Add array function array_popback (#13641 ) Remove the last element from array. ``` mysql> select array_popback(['test', NULL, 'value']); +-----------------------------------------------------+ \| array_popback(ARRAY('test', NULL, 'value')) \| +-----------------------------------------------------+ \| [test, NULL] \| +-----------------------------------------------------+ ```	2022-11-07 10:48:16 +08:00
Xinyi Zou	c7b2b90504	[fix](memtracker) Fix DCHECK !std::count(_consumer_tracker_stack.begin(), _consumer_tracker_stack.end(), tracker)	2022-11-06 16:41:03 +08:00
Tiewei Fang	27549564a7	[feature](table-valued-function) Support S3 tvf (#13959 ) This pr does three things： 1. Modified the framework of table-valued-function(tvf). 2. be support `fetch_table_schema` rpc. 3. Implemented `S3(path, AK, SK, format)` table-valued-function.	2022-11-06 11:04:26 +08:00
Mingyu Chen	fb5a3e118a	[feature-wip](dlf) prepare to support aliyun dlf (#13969 ) [What is DLF](https://www.alibabacloud.com/product/datalake-formation) This PR is a preparation for support DLF, with some changes of multi catalog 1. Add RuntimeException for most of hive meta store or es client visit operation. 2. Add DLF related dependencies. 3. Move the checks of es catalog properties to the analysis phase of creating es catalog TODO(in next PR): 1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage. 2. Finish the implementation of supporting DLF	2022-11-06 10:01:57 +08:00
zhengyu	f29e43fee9	[fix](storage) rm unacessary check (#13986 ) (#13988 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-05 23:46:30 +08:00
Liqf	1724faf9a5	[test](java-udf)add java udf RegressionTest about the currently supported data types #13972	2022-11-05 19:25:58 +08:00
Mingyu Chen	d01f7c546a	[refactor](iceberg-hudi) disable iceberg and hudi table by default (#13932 )	2022-11-05 19:22:27 +08:00
wxy	620a137bd7	[enhancement](test) support tablet repair and balance process in ut (#13940 )	2022-11-05 19:20:23 +08:00
caoliang-web	380395a61f	[doc](routineload)Common mistakes in adding routine load #13975	2022-11-05 19:17:33 +08:00
lihaijian	087488db3b	[typo](doc) fixed spelling errors (#13974 )	2022-11-05 15:40:55 +08:00
Gabriel	2ee7ba79a8	[Improvement](javaudf) improve java loader usage (#13962 )	2022-11-05 13:20:04 +08:00
TengJianPing	04830af039	[fix](tablet sink) fallback to non-vectorized interface in tablet_sink if is in progress of upgrding from 1.1-lts to 1.2-lts (#13966 )	2022-11-05 10:19:51 +08:00
Xinyi Zou	f87be09d69	[fix](load) Fix load channel mgr lock (#13960 ) hot fix load channel mgr lock	2022-11-05 00:48:30 +08:00
jiafeng.zhang	a19e6881c7	[chore](be web ui)upgrade jquery version to 3.6.0 (#13942 ) * upgrade jquery version to 3.6.0 * update license dist	2022-11-04 16:20:17 +08:00
924060929	06a1efdb01	[fix](Nerieds) fix tpch and support trace plan's change event (#13957 ) This pr fix some bugs for run tpc-h 1. fix the avg(decimal) crash the backend. The fix code in `Avg.getFinalType()` and every child class of `ComputeSinature` 2. fix the ReorderJoin dead loop. The fix code in `ReorderJoin.findInnerJoin()` 3. fix the TimestampArithmetic can not bind the functions in the child. The fix code in `BindFunction.FunctionBinder.visitTimestampArithmetic()` New feature: support trace the plan's change event, you can `set enable_nereids_trace=true` to open trace log and see some log like this: ``` 2022-11-03 21:07:38,391 INFO (mysql-nio-pool-0\|208) [Job.printTraceLog():128] ========== RewriteBottomUpJob ANALYZE_FILTER_SUBQUERY ========== before: LogicalProject ( projects=[S_ACCTBAL#17, S_NAME#13, N_NAME#4, P_PARTKEY#19, P_MFGR#21, S_ADDRESS#14, S_PHONE#16, S_COMMENT#18] ) +--LogicalFilter ( predicates=((((((((P_PARTKEY#19 = PS_PARTKEY#7) AND (S_SUPPKEY#12 = PS_SUPPKEY#8)) AND (P_SIZE#24 = 15)) AND (P_TYPE#23 like '%BRASS')) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (R_NAME#1 = 'EUROPE')) AND (PS_SUPPLYCOST#10 = (SCALARSUBQUERY) (QueryPlan: LogicalAggregate ( phase=LOCAL, outputExpr=[min(PS_SUPPLYCOST#31) AS `min(PS_SUPPLYCOST)`#33], groupByExpr=[] )), (CorrelatedSlots: [P_PARTKEY#19, S_SUPPKEY#12, S_NATIONKEY#15, N_NATIONKEY#3, N_REGIONKEY#5, R_REGIONKEY#0, R_NAME#1]))) ) +--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \|--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \| \|--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \| \| \|--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \| \| \| \|--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.part, output=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27], candidateIndexIds=[], selectedIndexId=11076, preAgg=ON ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.supplier, output=[S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18], candidateIndexIds=[], selectedIndexId=11124, preAgg=ON ) \| \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON ) \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.nation, output=[N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6], candidateIndexIds=[], selectedIndexId=11044, preAgg=ON ) +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.region, output=[R_REGIONKEY#0, R_NAME#1, R_COMMENT#2], candidateIndexIds=[], selectedIndexId=11108, preAgg=ON ) after: LogicalProject ( projects=[S_ACCTBAL#17, S_NAME#13, N_NAME#4, P_PARTKEY#19, P_MFGR#21, S_ADDRESS#14, S_PHONE#16, S_COMMENT#18] ) +--LogicalFilter ( predicates=((((((((P_PARTKEY#19 = PS_PARTKEY#7) AND (S_SUPPKEY#12 = PS_SUPPKEY#8)) AND (P_SIZE#24 = 15)) AND (P_TYPE#23 like '%BRASS')) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (R_NAME#1 = 'EUROPE')) AND (PS_SUPPLYCOST#10 = min(PS_SUPPLYCOST)#33)) ) +--LogicalProject ( projects=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27, S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18, PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11, N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6, R_REGIONKEY#0, R_NAME#1, R_COMMENT#2, min(PS_SUPPLYCOST)#33] ) +--LogicalApply ( correlationSlot=[P_PARTKEY#19, S_SUPPKEY#12, S_NATIONKEY#15, N_NATIONKEY#3, N_REGIONKEY#5, R_REGIONKEY#0, R_NAME#1], correlationFilter=Optional.empty ) \|--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \| \|--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \| \| \|--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \| \| \| \|--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] ) \| \| \| \| \|--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.part, output=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27], candidateIndexIds=[], selectedIndexId=11076, preAgg=ON ) \| \| \| \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.supplier, output=[S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18], candidateIndexIds=[], selectedIndexId=11124, preAgg=ON ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON ) \| \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.nation, output=[N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6], candidateIndexIds=[], selectedIndexId=11044, preAgg=ON ) \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.region, output=[R_REGIONKEY#0, R_NAME#1, R_COMMENT#2], candidateIndexIds=[], selectedIndexId=11108, preAgg=ON ) +--LogicalAggregate ( phase=LOCAL, outputExpr=[min(PS_SUPPLYCOST#31) AS `min(PS_SUPPLYCOST)`#33], groupByExpr=[] ) +--LogicalFilter ( predicates=(((((P_PARTKEY#19 = PS_PARTKEY#28) AND (S_SUPPKEY#12 = PS_SUPPKEY#29)) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (CAST(R_NAME AS STRING) = CAST(EUROPE AS STRING))) ) +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#28, PS_SUPPKEY#29, PS_AVAILQTY#30, PS_SUPPLYCOST#31, PS_COMMENT#32], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON ) ```	2022-11-04 15:01:06 +08:00
zhengyu	554f566217	[enhancement](compaction) introduce segment compaction (#12609 ) (#12866 ) ## Design ### Trigger Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare. ### Target Selection We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction. ### Compaction Process A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings: - build a MergeIterator from the target segments - create a new segment writer - for each block readed from MergeIterator, the Writer append it ### SegID handling SegID must remain consecutive after segment compaction. If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4: - we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3 - delete seg_0, seg_1, seg_2 and seg_3 - rename seg_0-3 to seg_0 - rename seg_4 to seg_1 It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.	2022-11-04 14:12:51 +08:00
Gabriel	948e080b31	[minor](error msg) Fix wrong error message (#13950 )	2022-11-04 13:49:46 +08:00
morrySnow	dc01fb4085	[enhancement](Nereids) remove unnecessary string cast (#13730 ) convert string like literal to the cast type instead of run cast in runtime	2022-11-04 11:18:22 +08:00
morrySnow	9bf20a7b5d	[enhancement](Nereids) remove unnecessary int cast (#13881 )	2022-11-04 11:07:59 +08:00
morrySnow	efb2596c7a	[enhancment](Nereids) enable push down filter through aggregation (#13938 )	2022-11-04 11:04:00 +08:00
yinzhijian	e09033276e	[fix](runtime-filter) build thread destruct first may cause probe thread coredump (#13911 )	2022-11-04 09:29:37 +08:00
Jibing-Li	f2d84d81e6	[feature-wip][refactor](multi-catalog) Persist external catalog related metadata. (#13746 ) Persist external catalog/db/table, including the columns of external tables. After this change, external objects could have their own uniq ID through their lifetime, this is required for the statistic information collection.	2022-11-04 09:04:00 +08:00

1 2 3 4 5 ...

7054 Commits