doris

Author	SHA1	Message	Date
minghong	e623f3fb9e	[runtimeFilter](nereids) use runtime filter default size for debug purpose (#20065 ) use rf default size for debug	2023-05-30 14:34:14 +08:00
zy-kkk	a220b1c34a	[typo](docs) fix oceanbase jdbc catalog error (#20197 )	2023-05-30 14:18:16 +08:00
bingquanzhao	5351153b04	[doc](fix)Modified the description about trino #20174	2023-05-30 13:14:45 +08:00
bobhan1	bb12a1cb49	[Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724 )	2023-05-30 13:09:19 +08:00
Gabriel	c7b8c83a7f	[Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER (#20166 ) * [Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER	2023-05-30 12:55:30 +08:00
lihangyu	945cb56fb6	[Bug](segment iterator) remove DCHECK for block row count (#20199 ) CHECK rows count of block at segment iterator is not ready when `enable_common_expr_pushdown`	2023-05-30 11:34:25 +08:00
Mryange	94e1072d14	Revert "[fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926 )" (#20204 ) This reverts commit 8ca4f9306763b5a18ffda27a07ab03cc77351e35.	2023-05-30 10:35:33 +08:00
AKIRA	72cfe5865a	[feat](optimizer) Support CTE reuse (#19934 ) Before this PR, new optimizer would inline CTE directly. However in many scenario a CTE could be referenced many times, such as in TPC-DS tests, for these cases materialize the result sets of CTE and reuse it would significantly agument performance. In our tests on tpc-ds related sqls, it would improve the performance by up to almost 4 times than before. We introduce belowing plan node in optimizer 1. CTEConsumer: which hold a reference to CTEProducer 2. CTEProducer: Plan defined by CTE stmt 3. CTEAnchor: the father node of CTEProducer, a CTEProducer could only be referenced from corresponding CTEAnchor's right child. A CTEConsumer would be converted to a inlined plan if corresponding CTE referenced less than or equal `inline_cte_referenced_threshold` (it's a session variable, by default is 1). For SQL: ```sql EXPLAIN REWRITTEN PLAN WITH cte AS (SELECT col2 FROM t1) SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1)) UNION ALL SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1)); ``` Rewritten plan before this PR: ``` +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false ) \| \| \|--LogicalJoin[559] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] ) \| \| \| \|--LogicalProject[551] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true ) \| \| \| \| +--LogicalFilter[549] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) ) \| \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| \| +--LogicalProject[555] ( distinct=false, projects=[col2#20 AS `col2`#8], excepts=[], canEliminate=true ) \| \| \| +--LogicalFilter[553] ( predicates=(__DORIS_DELETE_SIGN__#22 = 0) ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| +--LogicalProject[575] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false ) \| \| +--LogicalJoin[573] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[565] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true ) \| \| \| +--LogicalFilter[563] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| +--LogicalProject[569] ( distinct=false, projects=[col2#24 AS `col2`#13], excepts=[], canEliminate=true ) \| \| +--LogicalFilter[567] ( predicates=(__DORIS_DELETE_SIGN__#26 = 0) ) \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` After this PR ``` +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false ) \| \| \|--LOGICAL_CTE_ANCHOR#-1164890733 \| \| \| \|--LOGICAL_CTE_PRODUCER#-1164890733 \| \| \| \| +--LogicalProject[427] ( distinct=false, projects=[col2#1], excepts=[], canEliminate=true ) \| \| \| \| +--LogicalFilter[425] ( predicates=(__DORIS_DELETE_SIGN__#3 = 0) ) \| \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| \| +--LogicalJoin[373] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] ) \| \| \| \|--LogicalProject[370] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true ) \| \| \| \| +--LogicalFilter[368] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) ) \| \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| \| +--LOGICAL_CTE_CONSUMER#-1164890733#1038782805 \| \| +--LogicalProject[384] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false ) \| \| +--LogicalJoin[382] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[379] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true ) \| \| \| +--LogicalFilter[377] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| +--LOGICAL_CTE_CONSUMER#-1164890733#858618008 \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ ```	2023-05-30 10:18:59 +08:00
Qi Chen	9b32d42ee4	[Fix](multi-catalog) fix all nested type test which introduced by #19518(support insert-only transactional table). (#20194 ) Fix `qt_nested_types_orc` in `test_tvf_p2` which introduced by #19518(support insert-only transactional table). ### Test case error `qt_nested_types_orc` in `test_tvf_p2` ``` select count(array0), count(array1), count(array2), count(array3), count(struct0), count(struct1), count(map0) from hdfs( "uri" = "hdfs://172.21.16.47:4007/catalog/tvf/orc/all_nested_types.orc", "format" = "orc", "fs.defaultFS" = "hdfs://172.21.16.47:4007") ``` Error Message： errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Wrong data type for colum 'struct1'	2023-05-30 09:55:40 +08:00
Qi Chen	2abbc9f921	[Fix](multi-catalog) Fix parquet bugs of #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'. (#20191 ) Fix some parquet reader bugs which introduced by #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'.	2023-05-30 09:55:12 +08:00
Jibing-Li	6f31ee9492	[fix](p0 regression)Update hive docker test case result data (#20176 ) Doris updated array type output format, using double quote for Strings. Before, it was using single quote. So we need to update the case out file using double quote.	2023-05-30 00:17:30 +08:00
airborne12	90b4e127e3	[Feature](inverted index) add parser_mode properties for inverted index parser (#20116 ) We add parser mode for inverted index, usage like this: ``` CREATE TABLE `inverted` ( `FIELD0` text NULL, `FIELD1` text NULL, `FIELD2` text NULL, `FIELD3` text NULL, INDEX idx_name1 (`FIELD0`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "fine_grained") COMMENT '', INDEX idx_name2 (`FIELD1`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "coarse_grained") COMMENT '' ) ENGINE=OLAP ); ```	2023-05-29 23:21:52 +08:00
zy-kkk	9eccbdbef3	[typo](docs) fix fqdn doc error (#20171 )	2023-05-29 21:39:42 +08:00
Mryange	8ca4f93067	[fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926 ) before mysql [test]>select cast(1 as DECIMALV3(16, 2)) / cast(3 as DECIMALV3(16, 2)); +-----------------------------------------------------------+ \| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) \| +-----------------------------------------------------------+ \| 0.00 \| +-----------------------------------------------------------+ mysql [test]>select * from divtest; +------+------+ \| id \| val \| +------+------+ \| 3 \| 5.00 \| \| 2 \| 4.00 \| \| 1 \| 3.00 \| +------+------+ mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest; +-------------------------------------+ \| CAST(1 AS DECIMALV3(16, 2)) / `val` \| +-------------------------------------+ \| 0 \| \| 0 \| \| 0 \| +-------------------------------------+ after mysql [test]>select cast(1 as DECIMALV3(16, 2)) / cast(3 as DECIMALV3(16, 2)); +-----------------------------------------------------------+ \| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) \| +-----------------------------------------------------------+ \| 0.33 \| +-----------------------------------------------------------+ mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest; +-------------------------------------+ \| CAST(1 AS DECIMALV3(16, 2)) / `val` \| +-------------------------------------+ \| 0.250000 \| \| 0.200000 \| \| 0.333333 \| +-------------------------------------+ This is because in the previous code, the constant 1.000 would be transformed into 1. remove "ReduceType	2023-05-29 19:51:12 +08:00
Long Zhao	d76be1315f	[BUG]storage_min_left_capacity_bytes default value has integer overflow #19943	2023-05-29 19:50:31 +08:00
Pxl	d1d0d9e5e8	[Chore](build) adjust some compile diagnostic (#20162 )	2023-05-29 19:19:01 +08:00
mch_ucchi	5f37396514	[Enhancement](Nerieds) add switch for developing Nereids DML (#20100 )	2023-05-29 19:06:55 +08:00
Pxl	5788214416	[Bug](function) fix equals implements not judge order by elements of function call expr (#20083 ) fix equals implements not judge order by elements of function call expr #19296	2023-05-29 19:03:05 +08:00
Xinyi Zou	f9478dbd9a	[fix](function) Fix VcompoundPred execute const column #20158 recurrent: ./run-regression-test.sh --run -suiteParallel 1 -actionParallel 1 -parallel 1 -d query_p0/sql_functions/window_functions select /+ SET_VAR(query_timeout = 600) / subq_0.`c1` as c0 from (select ref_1.`s_name` as c0, ref_1.`s_suppkey` as c1, ref_1.`s_address` as c2, ref_1.`s_address` as c3 from regression_test_query_p0_sql_functions_window_functions.tpch_tiny_supplier as ref_1 where (ref_1.`s_name` is NULL) or (ref_1.`s_acctbal` is not NULL)) as subq_0 where (subq_0.`c3` is NULL) or (subq_0.`c2` is not NULL) reason: FunctionIsNull and FunctionIsNotNull execute returns a const column, but their VectorizedFnCall::is_constant returns false, which causes problems with const handling when VCompoundPred::execute. This pr converts const column to full column in VCompoundPred execute. In the future, there will be a more thorough solution to such problems.	2023-05-29 18:16:58 +08:00
Pxl	e9917612f0	[Chore](gensrc) remove gen_vector_functions.py #20150	2023-05-29 18:16:31 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
FreeOnePlus	198433b131	[typo](config)Remove FE config max_conn_per_user (#20122 ) --------- Co-authored-by: Yijia Su <suyijia@selectdb.com>	2023-05-29 17:20:36 +08:00
zhannngchen	cc20c430f6	[fix](partial update) use correct tablet schema for rowset writer in publish task (#20117 )	2023-05-29 16:57:18 +08:00
amory	91dae8a5b6	[FIX](mysql_writer) fix mysql output binary object works (#20154 ) * fix struct_export out data * fix mysql writer output with binary true	2023-05-29 16:53:33 +08:00
AKIRA	cc47ee480c	[feat](stats) delete data size stat and Made task timeout configurable (#20090 ) 1. Delete the stats for data size, since it would cost too much time but useless 2. Make task time out configurable since when it's common to analyze a quite huge table that the default 10 min is not suitable	2023-05-29 16:40:59 +08:00
Gabriel	55ccddb62c	[Conf](decimalv3) enable decimalv3 by default	2023-05-29 15:38:31 +08:00
Bin	d7e0a52bde	[typo](doc)correct the misspelled word and the improper word (#20149 )	2023-05-29 15:07:30 +08:00
Jibing-Li	500995c442	[Fix](multi catalog)Fix Iceberg table missing column unique id bug (#20152 ) This pr is to fix the bug introduced by PR #19909 The bug failed to set column unique id for iceberg table, which will cause the query result for iceberg table are all NULL. ``` mysql> select * from iceberg_partition_lower_case_parquet limit 1; +------+------+------+---------+ \| k1 \| k2 \| k3 \| city \| +------+------+------+---------+ \| NULL \| NULL \| NULL \| Beijing \| +------+------+------+---------+ 1 row in set (0.60 sec) ``` After fix: ``` mysql> select * from iceberg_partition_lower_case_parquet limit 1; +------+------+------+---------+ \| k1 \| k2 \| k3 \| city \| +------+------+------+---------+ \| 1 \| k2_1 \| k3_1 \| Beijing \| +------+------+------+---------+ 1 row in set (0.35 sec) ```	2023-05-29 15:04:12 +08:00
Pxl	8376e5eefb	[Chore](build) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array (#20118 ) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array	2023-05-29 14:42:47 +08:00
Pxl	bbb3af6ce6	[Feature](agg_state) support agg_state combinators (#19969 ) support agg_state combinators state/merge/union	2023-05-29 13:07:29 +08:00
caiconghui	f217e052d3	[fix](dynamic_partition) fix dynamic partition not work when drop and recover olap table (#19031 ) when olap table is dynamic partition enable, if drop and recover olap table, the table should be added to DynamicPartitionScheduler again --------- Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-05-29 13:02:10 +08:00
airborne12	8378ab5e41	[Fix](inverted index) fix memeory leak when inverted index writer do not finish correctly (#20028 ) * [Fix](inverted index) fix memeory leak when inverted index writer do not finish correctly * [Update](inverted index) use smart pointer to avoid memeory leak * [Chore](format) code format --------- Co-authored-by: airborne12 <airborne12@gmail.com>	2023-05-29 12:18:14 +08:00
Mryange	a86134cb39	[fix](executor) Fixed an error with cast as time. #20144 before mysql [(none)]>select cast("10:10:10" as time); +-------------------------------+ \| CAST('10:10:10' AS TIMEV2(0)) \| +-------------------------------+ \| 00:00:00 \| +-------------------------------+ after mysql [(none)]>select cast("10:10:10" as time); +-------------------------------+ \| CAST('10:10:10' AS TIMEV2(0)) \| +-------------------------------+ \| 10:10:10 \| +-------------------------------+ In the past, we supported this syntax. mysql [(none)]>select cast("2023:05:01 13:14:15" as time); +------------------------------------------+ \| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) \| +------------------------------------------+ \| 13:14:15 \| +------------------------------------------+ However, "10:10:10" is also a valid datetime. mysql [(none)]>select cast("10:10:10" as datetime); +-----------------------------------+ \| CAST('10:10:10' AS DATETIMEV2(0)) \| +-----------------------------------+ \| 2010-10-10 00:00:00 \| +-----------------------------------+ So here, the order of parsing has been adjusted.	2023-05-29 12:17:21 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
zhengshiJ	970efdc1cb	[Feature](Nereids) support advanced materialized view (#19650 ) Increase the functionality of advanced materialized view This feature already supported by legacy planner with PR #19650 This PR implement it in Nereids. This PR implement the features as below: 1. Support multiple columns in aggregate function. eg: select sum(c1 + c2) from t1; 2. Supports complex expressions. eg: select abs(c1), sum(abc(c1+1) + 1) from t1; TODO: 1. Support adding where in materialized view	2023-05-29 10:37:44 +08:00
Kang	859b03dfdf	[Improvement](topn) prevent memory usage of key topn increasing unlimited (#19978 )	2023-05-29 10:16:15 +08:00
Yongqiang YANG	e0d9f7f955	[enhancement](load) add some profile items for load (#20141 )	2023-05-29 09:54:03 +08:00
yujun	344ca112af	[fix] (clone) fix drop biggest version replica during reblance step (#20107 ) * add check for rebalancer choose deleted replica * impr a compare	2023-05-29 09:00:51 +08:00
yujun	42239d635a	[fix](tablet_manager_lock) fix create tablet timeout #20067 (#20069 )	2023-05-28 23:05:13 +08:00
ZhangJian He	a5d73d47b6	[security] Don't print password in BaseController (#18862 )	2023-05-28 22:49:18 +08:00
AlexYue	4573ee9a49	[enhance](PrefetchReader) abort load task when data size returned by S3 is smaller than requested (#19947 ) We encountered one confusing situation where buffered reader were trapped in one endless loop when calling readat. Then we found out that it was all due to the return data size is less than requested. As the following picture shows, the actual data size is about 2M, and when we called readat it only retrieved about 1MB.	2023-05-28 21:48:17 +08:00
Changming Xiao	5f9c6e076f	[Fix](load)Make insert timeout accurate in `show load` statistics (#20068 )	2023-05-28 21:19:06 +08:00
yiguolei	13c80bdb10	[chore](toolchain) change doris default toolchain to clang (#20146 ) GCC is very slow during build and link. Change to clang as we discussed many times. Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-28 21:05:23 +08:00
amory	9d44918036	[Improve](data-type) Clean datatype uselesscode (#20145 ) * fix struct_export out data * delete useless code with data type	2023-05-28 20:48:29 +08:00
bobhan1	c45da40ed7	[refactor-WIP](TaskWorkerPool) add specific classes for ALTER_TABLE, CLONE, STORAGE_MEDIUM_MIGRATE task (#20140 )	2023-05-28 19:27:08 +08:00
Bin	142f884753	[typo](docs)Best usage document correction. #20142	2023-05-28 18:56:17 +08:00
YueW	ae352997b4	[Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063 )	2023-05-28 11:23:07 +08:00
AlexYue	da17c45c0b	[enhance](FileWriter)enhance s3 file writer bvar to avoid adding abort bytes (#20138 ) * don't add each time upload or it would add aborted bytes * alloca memory	2023-05-28 10:52:37 +08:00
luozenglin	f21bf11cf5	[fix](ldap) fix ldap related errors (#19959 ) 1. fix ldap user show grants return null pointer exception; 2. fix ldap user show databases return no authority db; 3. ldap authentication supports catalog level;	2023-05-27 23:51:32 +08:00
bobhan1	0434c6a738	[refactor-WIP](TaskWorkerPool) add specific classes for PUSH, PUBLIC_VERION, CLEAR_TRANSACTION tasks (#19822 )	2023-05-27 22:47:45 +08:00

1 2 3 4 5 ...

10838 Commits