doris

Author	SHA1	Message	Date
yangshijie	1aefc26ca0	[Bug](memtable) fix a bug occurred when we were inserting data into duplicate table without keys (#20233 )	2023-05-31 18:21:36 +08:00
Mingyu Chen	d963bf8d79	[deps](aws) upgrade to 1.9.272 to fix non-compliant RFC3986 encoding (#20252 )	2023-05-31 18:19:06 +08:00
YueW	6adb3fdf11	[fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258 ) if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.	2023-05-31 18:09:50 +08:00
minghong	5f591a6d12	[opt](nereids) generate in-bloom filter if target is local for pipeline mode (#20112 ) update in-filter usage in pipeline mode: 1. if the target is local, we use in-bloom filter. Let BE choose in or bloom according to actual distinctive number 2. set default runtime_filter_max_in_num to 1024	2023-05-31 17:24:38 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
mch_ucchi	b53c42636e	[Fix](Nereids) fold constant result is wrong on functions relative to timezone (#19863 )	2023-05-31 15:52:40 +08:00
luozenglin	a1e3f49fb5	[enhancement](ldap) Support refresh ldap cache (#20183 ) Support refreshing ldap cache: refresh ldap all; refresh ldap; refresh ldap for user1; Support for caching non-existent ldap users. When logging in with a doris user that does not exist in the Ldap service after ldap is enabled, avoid accessing the ldap service every time in scenarios such as show databases; that require a lot of authentication.	2023-05-31 15:38:12 +08:00
Lijia Liu	f9dfcb923d	[Enhancement] Change Create Resource Group Grammar (#20249 )	2023-05-31 15:23:24 +08:00
mch_ucchi	c39943f699	[Fix](Planner)fix incorrect pattern when format pattern contains %x%v (#19994 )	2023-05-31 14:55:33 +08:00
AKIRA	d93ff5d1ab	[fix](pipeline) Enable pipeline explicitly in the plan shape check cases. (#20221 ) enable pipeline explicitly in tpcds plan shape check	2023-05-31 14:40:24 +08:00
Mingyu Chen	6eb99d1219	[chore](arm) support build with hadoop libhdfs on arm (#20256 ) hadoop-3.3.4.3-for-doris already support build on arm	2023-05-31 13:57:48 +08:00
Xiangyu Wang	6d75d56e7b	[Fix](dynamic-partition) Try to avoid setting a zero-bucket-size partition. (#20177 ) A fallback to avoid BE crash problem when partition's bucket size is 0, but not resolved.	2023-05-31 13:09:03 +08:00
starocean999	1f22aa6961	[fix](nereids) like function's nullable property should be PropagateNullable (#20237 )	2023-05-31 12:13:38 +08:00
Gabriel	6a8fdb45c6	[Bug](runtimefilter) Fix waiting for runtime filter (#20155 )	2023-05-31 10:25:18 +08:00
Xin Liao	ca88425bee	[Enhancement](merge-on-write) optimize bloom filter for primary key index (#20182 )	2023-05-31 09:49:15 +08:00
Euporia	54d1b16116	[docs](spark-doris-connector): modify the link of spark-doris-connector (#20159 )	2023-05-31 09:42:00 +08:00
Adonis Ling	f43282e612	[chore](third-party) Bump the version of hadoop_libs (#20250 ) Fix the issues with the workflow Build Third Party Libraries. See https://github.com/apache/doris-thirdparty/actions/runs/5109407220/jobs/9184234534	2023-05-31 09:21:43 +08:00
Jibing-Li	3f91127854	[fix](regression)Update external Brown test case out file. #20232 Update external Brown test case out file to match the new precision.	2023-05-31 09:21:04 +08:00
luozenglin	8a54be3318	[feature-wip](workload-group) Support setting user default workload group (#20180 ) Issue Number: close #xxx SET PROPERTY 'default_workload_group' = 'group_name';	2023-05-31 09:18:25 +08:00
Jack Drogon	aae04d9680	[Chore](log) Remove some verbose log && Change log level (#20236 )	2023-05-31 09:15:01 +08:00
Gabriel	ff05217a1e	[regression](p0) fix test for `array_enumerate_uniq` (#20231 )	2023-05-30 22:14:19 +08:00
zy-kkk	56fa38de1d	[Enhencement](JDBC Catalog) refactor jdbc catalog insert logic (#19950 ) This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks 1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete 2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations 3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code ToDo: Binary type，like bit,binary, blob... Finally, special thanks to @BePPPower , @AshinGau for his work Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>	2023-05-30 22:03:39 +08:00
Chengpeng Yan	ccfc4978c1	[feature](nereids) support the rewrite rule for push-down filter through sort (#20161 ) Support the rewrite rule for push-down filter through sort. We can directly push-down the filter through sort without any conditions check. Before this PR: ``` mysql> explain select * from (select * from t1 order by a) t2 where t2.b > 2; +-------------------------------------------------------------+ \| Explain String \| +-------------------------------------------------------------+ \| PLAN FRAGMENT 0 \| \| OUTPUT EXPRS: \| \| a[#2] \| \| b[#3] \| \| PARTITION: UNPARTITIONED \| \| \| \| VRESULT SINK \| \| \| \| 3:VSELECT \| \| \| predicates: b[#3] > 2 \| \| \| \| \| 2:VMERGING-EXCHANGE \| \| offset: 0 \| \| \| \| PLAN FRAGMENT 1 \| \| \| \| PARTITION: HASH_PARTITIONED: a[#0] \| \| \| \| STREAM DATA SINK \| \| EXCHANGE ID: 02 \| \| UNPARTITIONED \| \| \| \| 1:VTOP-N \| \| \| order by: a[#2] ASC \| \| \| offset: 0 \| \| \| \| \| 0:VOlapScanNode \| \| TABLE: default_cluster:test.t1(t1), PREAGGREGATION: ON \| \| partitions=0/1, tablets=0/0, tabletList= \| \| cardinality=1, avgRowSize=0.0, numNodes=1 \| +-------------------------------------------------------------+ 30 rows in set (0.06 sec) ``` After this PR: ``` mysql> explain select * from (select * from t1 order by a) t2 where t2.b > 2; +-------------------------------------------------------------+ \| Explain String \| +-------------------------------------------------------------+ \| PLAN FRAGMENT 0 \| \| OUTPUT EXPRS: \| \| a[#2] \| \| b[#3] \| \| PARTITION: UNPARTITIONED \| \| \| \| VRESULT SINK \| \| \| \| 2:VMERGING-EXCHANGE \| \| offset: 0 \| \| \| \| PLAN FRAGMENT 1 \| \| \| \| PARTITION: HASH_PARTITIONED: a[#0] \| \| \| \| STREAM DATA SINK \| \| EXCHANGE ID: 02 \| \| UNPARTITIONED \| \| \| \| 1:VTOP-N \| \| \| order by: a[#2] ASC \| \| \| offset: 0 \| \| \| \| \| 0:VOlapScanNode \| \| TABLE: default_cluster:test.t1(t1), PREAGGREGATION: ON \| \| PREDICATES: b[#1] > 2 \| \| partitions=0/1, tablets=0/0, tabletList= \| \| cardinality=1, avgRowSize=0.0, numNodes=1 \| +-------------------------------------------------------------+ 28 rows in set (0.40 sec) ```	2023-05-30 21:38:16 +08:00
Jibing-Li	5c8e801761	[Fix](multi catalog, nereids)Fix text file required slot bug (#20214 ) required_slots in TFileScanRangeParams params for external hive table may be updated after FileQueryScanNode finalize. For text file, we need to use the origin required_slots in params so that the list could be updated later. Otherwise, query text file may get the following error: [INTERNAL_ERROR]Unknown source slot descriptor, slot_id=3	2023-05-30 21:29:33 +08:00
Ashin Gau	b7a69fbf4b	[test](regression) add regression test from materialized slot bug (#20207 ) The test query includes the conversion of string types to other types, and the processing of materialized columns for nested subqueries, which is the regression test for bug fix(#18783)	2023-05-30 21:23:05 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
Calvin Kirs	5e5f4ae9de	[Improve](CI)Check PR approve status (#20172 ) After discussion in the doris community @apache/doris-committers , we limit the PR to be merged only after at least two people approve it.↳ We can try to run it for a while first, and if everyone gives good feedback, we can use this as a mandatory check. Since the merge must be approved by at least one committer, we only need to judge whether there are two approves, and we don't need to care about the identity of the approve. When there is a request change, if the other party is a committer, the committer dismiss is required when merging, which is enforced by github, so we don't need to care.	2023-05-30 20:45:16 +08:00
Pxl	7415135ad4	[Enchancement](execute) make assert_cast can output derived class name (#20212 ) before: F0530 11:02:41.989699 1154607 assert_cast.h:54] Bad cast from type:doris::vectorized::IDataType const* to doris::vectorized::DataTypeAggState const* after: F0530 11:24:28.390286 1292475 assert_cast.h:46] Bad cast from type:doris::vectorized::DataTypeNullable* to doris::vectorized::DataTypeAggState const*	2023-05-30 20:23:04 +08:00
wangbo	6f68ec9de0	support query queue (#20048 ) support query queue (#20048)	2023-05-30 19:52:27 +08:00
zzzxl	1919355c04	[Feature](Inverted index) add MATCH_ PHRASE query (#20156 )	2023-05-30 19:28:57 +08:00
Chengpeng Yan	f505eed253	[opt](Nereids) refactor the PartitionTopN (#20102 ) Do some small refactoring for the `PartitionTopN` and also address the left comment in #18784	2023-05-30 17:34:47 +08:00
airborne12	3d8440a1b7	[Feature-WIP](inverted index) support phrase for inverted index writer (#20193 )	2023-05-30 17:07:45 +08:00
Chengpeng Yan	a855253543	[fix](Nereids) filter should not push through union to OneRowRelation (#20132 ) ## Problem summary When we want to push the filter through the union. We should check whether the union's children are `OneRowRelation` or not. If there are some `OneRowRelation`, we shouldn't push down the filter to that part Before this PR ``` mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1; +------+------+ \| a \| b \| +------+------+ \| 1 \| 2 \| \| 3 \| 3 \| +------+------+ 2 rows in set (0.01 sec) ``` After this PR ``` mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1; +------+------+ \| a \| b \| +------+------+ \| 1 \| 2 \| +------+------+ 1 row in set (0.38 sec) ```	2023-05-30 17:06:52 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
Mingyu Chen	3735c21ef9	[fix](session-variable) fix set global var on non-master FE return error (#20179 )	2023-05-30 16:26:28 +08:00
Mingyu Chen	49ce4e6fda	[fix](test) fix p2 broker load (#20196 )	2023-05-30 16:26:00 +08:00
Gabriel	631494e05d	[regression](decimalv3) Fix output for P1 regression (#20213 )	2023-05-30 15:21:29 +08:00
Qi Chen	4475a69c57	[Fix](multi-catalog) Fix q03 in `text_external_brown` regression test by handling correctly when text converter parsing error. (#20190 ) Issue Number: close #20189 Fix `q03` in `text_external_brown` regression test by handling correctly when text converter parsing error.	2023-05-30 15:08:28 +08:00
YueW	de08c4a57b	[enhance](match) Support match query without inverted index (#19936 )	2023-05-30 15:02:57 +08:00
minghong	e623f3fb9e	[runtimeFilter](nereids) use runtime filter default size for debug purpose (#20065 ) use rf default size for debug	2023-05-30 14:34:14 +08:00
zy-kkk	a220b1c34a	[typo](docs) fix oceanbase jdbc catalog error (#20197 )	2023-05-30 14:18:16 +08:00
bingquanzhao	5351153b04	[doc](fix)Modified the description about trino #20174	2023-05-30 13:14:45 +08:00
bobhan1	bb12a1cb49	[Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724 )	2023-05-30 13:09:19 +08:00
Gabriel	c7b8c83a7f	[Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER (#20166 ) * [Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER	2023-05-30 12:55:30 +08:00
lihangyu	945cb56fb6	[Bug](segment iterator) remove DCHECK for block row count (#20199 ) CHECK rows count of block at segment iterator is not ready when `enable_common_expr_pushdown`	2023-05-30 11:34:25 +08:00
Mryange	94e1072d14	Revert "[fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926 )" (#20204 ) This reverts commit 8ca4f9306763b5a18ffda27a07ab03cc77351e35.	2023-05-30 10:35:33 +08:00
AKIRA	72cfe5865a	[feat](optimizer) Support CTE reuse (#19934 ) Before this PR, new optimizer would inline CTE directly. However in many scenario a CTE could be referenced many times, such as in TPC-DS tests, for these cases materialize the result sets of CTE and reuse it would significantly agument performance. In our tests on tpc-ds related sqls, it would improve the performance by up to almost 4 times than before. We introduce belowing plan node in optimizer 1. CTEConsumer: which hold a reference to CTEProducer 2. CTEProducer: Plan defined by CTE stmt 3. CTEAnchor: the father node of CTEProducer, a CTEProducer could only be referenced from corresponding CTEAnchor's right child. A CTEConsumer would be converted to a inlined plan if corresponding CTE referenced less than or equal `inline_cte_referenced_threshold` (it's a session variable, by default is 1). For SQL: ```sql EXPLAIN REWRITTEN PLAN WITH cte AS (SELECT col2 FROM t1) SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1)) UNION ALL SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1)); ``` Rewritten plan before this PR: ``` +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false ) \| \| \|--LogicalJoin[559] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] ) \| \| \| \|--LogicalProject[551] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true ) \| \| \| \| +--LogicalFilter[549] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) ) \| \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| \| +--LogicalProject[555] ( distinct=false, projects=[col2#20 AS `col2`#8], excepts=[], canEliminate=true ) \| \| \| +--LogicalFilter[553] ( predicates=(__DORIS_DELETE_SIGN__#22 = 0) ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| +--LogicalProject[575] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false ) \| \| +--LogicalJoin[573] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[565] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true ) \| \| \| +--LogicalFilter[563] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| +--LogicalProject[569] ( distinct=false, projects=[col2#24 AS `col2`#13], excepts=[], canEliminate=true ) \| \| +--LogicalFilter[567] ( predicates=(__DORIS_DELETE_SIGN__#26 = 0) ) \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` After this PR ``` +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false ) \| \| \|--LOGICAL_CTE_ANCHOR#-1164890733 \| \| \| \|--LOGICAL_CTE_PRODUCER#-1164890733 \| \| \| \| +--LogicalProject[427] ( distinct=false, projects=[col2#1], excepts=[], canEliminate=true ) \| \| \| \| +--LogicalFilter[425] ( predicates=(__DORIS_DELETE_SIGN__#3 = 0) ) \| \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| \| +--LogicalJoin[373] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] ) \| \| \| \|--LogicalProject[370] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true ) \| \| \| \| +--LogicalFilter[368] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) ) \| \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| \| +--LOGICAL_CTE_CONSUMER#-1164890733#1038782805 \| \| +--LogicalProject[384] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false ) \| \| +--LogicalJoin[382] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[379] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true ) \| \| \| +--LogicalFilter[377] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON ) \| \| +--LOGICAL_CTE_CONSUMER#-1164890733#858618008 \| +------------------------------------------------------------------------------------------------------------------------------------------------------+ ```	2023-05-30 10:18:59 +08:00
Qi Chen	9b32d42ee4	[Fix](multi-catalog) fix all nested type test which introduced by #19518(support insert-only transactional table). (#20194 ) Fix `qt_nested_types_orc` in `test_tvf_p2` which introduced by #19518(support insert-only transactional table). ### Test case error `qt_nested_types_orc` in `test_tvf_p2` ``` select count(array0), count(array1), count(array2), count(array3), count(struct0), count(struct1), count(map0) from hdfs( "uri" = "hdfs://172.21.16.47:4007/catalog/tvf/orc/all_nested_types.orc", "format" = "orc", "fs.defaultFS" = "hdfs://172.21.16.47:4007") ``` Error Message： errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Wrong data type for colum 'struct1'	2023-05-30 09:55:40 +08:00
Qi Chen	2abbc9f921	[Fix](multi-catalog) Fix parquet bugs of #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'. (#20191 ) Fix some parquet reader bugs which introduced by #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'.	2023-05-30 09:55:12 +08:00
Jibing-Li	6f31ee9492	[fix](p0 regression)Update hive docker test case result data (#20176 ) Doris updated array type output format, using double quote for Strings. Before, it was using single quote. So we need to update the case out file using double quote.	2023-05-30 00:17:30 +08:00

1 2 3 4 5 ...

10877 Commits