Commit Graph

10897 Commits

Author SHA1 Message Date
24fcc2011f [Fix](Nereids) Fix function test case unstable by adding order by (#20295)
Nereids function case do not have a order by clause, so the result will be unstable, so order by is added to ensure stability.
2023-06-01 15:18:25 +08:00
a8b273ae31 [P2](test) Fix P2 output (#20311) 2023-06-01 15:11:12 +08:00
f0513a861d [Improve](Scan) add a session variable to make scan run serial (#20220)
Parallel scanning can result in some read amplification, for example, select * from xx where limit 1 actually requires only one row of data. However, due to parallel scanning of multiple tablets, read amplification occurs, leading to performance bottlenecks in high-concurrency scenarios. This PR Adding a SessionVariable to enforce serial scanning can help mitigate this issue.
2023-06-01 15:06:35 +08:00
0ff3073fc4 [improvement](Nereids): limit Memo groupExpression size. (#20272) 2023-06-01 13:30:19 +08:00
519f01133a [feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811) 2023-06-01 13:09:58 +08:00
04644c6dfa [fix](regression) regression test test_bitmap_filter_nereids could not run (#20293) 2023-06-01 12:56:32 +08:00
1b968c4ade [fix](multi catalog)Fix nereids planner text format include extra column index bug (#20260)
Nereids planner include all columns index in TFileScanRangeParams, this may cause the column projection incorrect for
 text format table. Because csv reader use the column index position to split a line. Extra column index will cause get 
wrong split result. This PR is to reset the column index after Projection, remove the useless column index.
2023-06-01 12:17:47 +08:00
cc41cb0e7e [Fix](Nereids) fix some insert into select bugs (#20052)
fix 3 bugs:

1. failed to insert into a table with mv.
```sql
create table t (
    id int,
   c1 int,
   c2 int,
   c3 int
) duplicate key(id)
distributed by hash(id) buckets 4

create materialized view k12s3m as select id, sum(c1), max(c3) from t group by id;

insert into t select -4, -4, -4, 'd';
```
insert will rise exception because mv column is not handled. now we will add a target column and value as defineExpr.

2. failed to insert into a table with not all the columns.
```sql
insert into t(c1, c2) select c1, c2 from t
```
and t(id ukey, c1, c2, c3), will insert too many data, we fix it by change the output partitions.

3. failed to insert into a table with complex select.
the select statement has join or agg, fix the bug by the way similar to the one at 2nd bug.
2023-06-01 12:15:19 +08:00
6befa53caa fix fe meta upgrade error (#20291)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-01 12:09:08 +08:00
4387f47fb5 [pipeline](load) support pipeline load (#20217) 2023-06-01 11:42:43 +08:00
e748b43d3d [bug](parse) fix can't create aggregate column with agg_state (#20235)
fix can't create aggregate column with agg_state
2023-06-01 11:18:40 +08:00
68e593fbf1 [fix](nereids)(planner) case when should return NullLiteral when all case result is NullLiteral (#20280) 2023-06-01 11:11:41 +08:00
4a682a0a46 [fix][regression-test] set timeout of curl in regression test to avoid hanged when be crashed. (#20222)
Currently in regression-test, when a be crash, because curl does not set a timeout, suite-thread will get stuck.
To solve this, encapsulate the call to be into a function, set the timeout uniformly, and avoid getting stuck
2023-06-01 11:00:09 +08:00
492154ee55 [fix](regression-test) add jdbc timeout (#20228)
In some cases ( or bugs), doris may returned query to jdbc, but jdbc can not recognized what doris sent back,
so hanged. To fix this, add a timeout of 30 minutes to jdbc connection.
2023-06-01 10:50:17 +08:00
9e21318834 [refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594)
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
2023-06-01 10:25:04 +08:00
5b6b1b38a6 [Enhancement](merge-on-write) Performance optimization of calculations of delete bitmap between segments (#20153)
1. Use heap sort to find duplicated keys between segments and update the delete-bitmap. The old implementation traversed all keys in all segments, used each key to search for duplicates in earlier segments, and then marked them for deletion.

2. Trick: Each time the heap top is popped as a key1, the new heap top is key2, allowing for jumping directly from key1 to key2 instead of advancing iteratively.

3. Effect: This technique works well when there are many segments within the same rowset and the imported data is relatively ordered.
2023-06-01 10:12:59 +08:00
90cd791789 [fix](tvf) s3 tvf specify region and s3.region params failed (#19921) 2023-06-01 10:00:49 +08:00
09e6b6580f [fix](checksum) delete predicates might be inconsistent with rowset readers in checksum task (#20251)
The BlockReader capture rowsets and init delete_handler in different place. If there is a base compaction, it may result in obtaining inconsistent delete handlers. Therefore, place these two operations under the same lock.
2023-06-01 09:06:51 +08:00
65a75abecb [Fix](Nereids) bitmap type should not be used in comparison predicate (#19807)
When using nereids, if we use compare operator of bitmap type, an analyze exception need to be throwed.

like: 
select id from (select BITMAP_EMPTY() as c0 from expr_test) as ref0 where c0 = 1 order by id

Which c0 in subq0 is a bitmap type, this scenario is not supported right now.
2023-05-31 23:09:36 +08:00
6ee99c4138 [fix](load_profile) fix rows stat and add close_wait in sink (#20181) 2023-05-31 18:23:30 +08:00
1aefc26ca0 [Bug](memtable) fix a bug occurred when we were inserting data into duplicate table without keys (#20233) 2023-05-31 18:21:36 +08:00
d963bf8d79 [deps](aws) upgrade to 1.9.272 to fix non-compliant RFC3986 encoding (#20252) 2023-05-31 18:19:06 +08:00
6adb3fdf11 [fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258)
if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.
2023-05-31 18:09:50 +08:00
5f591a6d12 [opt](nereids) generate in-bloom filter if target is local for pipeline mode (#20112)
update in-filter usage in pipeline mode:
1. if the target is local, we use in-bloom filter. Let BE choose in or bloom according to actual distinctive number
2. set default runtime_filter_max_in_num to 1024
2023-05-31 17:24:38 +08:00
c03a19ea23 [improvement](bitmap) Using set to store a small number of elements to improve performance (#19973)
Test on SSB 100g:

select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey;
exec time: 4.388s

create materialized view:

create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey;
select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey;
exec time: 12.908s

test with the patch, exec time: 5.790s
2023-05-31 16:13:42 +08:00
b53c42636e [Fix](Nereids) fold constant result is wrong on functions relative to timezone (#19863) 2023-05-31 15:52:40 +08:00
a1e3f49fb5 [enhancement](ldap) Support refresh ldap cache (#20183)
Support refreshing ldap cache:
refresh ldap all;
refresh ldap;
refresh ldap for user1;
Support for caching non-existent ldap users.
When logging in with a doris user that does not exist in the Ldap service after ldap is enabled, avoid accessing the ldap service every time in scenarios such as show databases; that require a lot of authentication.
2023-05-31 15:38:12 +08:00
f9dfcb923d [Enhancement] Change Create Resource Group Grammar (#20249) 2023-05-31 15:23:24 +08:00
c39943f699 [Fix](Planner)fix incorrect pattern when format pattern contains %x%v (#19994) 2023-05-31 14:55:33 +08:00
d93ff5d1ab [fix](pipeline) Enable pipeline explicitly in the plan shape check cases. (#20221)
enable pipeline explicitly in tpcds plan shape check
2023-05-31 14:40:24 +08:00
6eb99d1219 [chore](arm) support build with hadoop libhdfs on arm (#20256)
hadoop-3.3.4.3-for-doris already support build on arm
2023-05-31 13:57:48 +08:00
6d75d56e7b [Fix](dynamic-partition) Try to avoid setting a zero-bucket-size partition. (#20177)
A fallback to avoid BE crash problem when partition's bucket size is 0, but not resolved.
2023-05-31 13:09:03 +08:00
1f22aa6961 [fix](nereids) like function's nullable property should be PropagateNullable (#20237) 2023-05-31 12:13:38 +08:00
6a8fdb45c6 [Bug](runtimefilter) Fix waiting for runtime filter (#20155) 2023-05-31 10:25:18 +08:00
ca88425bee [Enhancement](merge-on-write) optimize bloom filter for primary key index (#20182) 2023-05-31 09:49:15 +08:00
54d1b16116 [docs](spark-doris-connector): modify the link of spark-doris-connector (#20159) 2023-05-31 09:42:00 +08:00
f43282e612 [chore](third-party) Bump the version of hadoop_libs (#20250)
Fix the issues with the workflow Build Third Party Libraries. See https://github.com/apache/doris-thirdparty/actions/runs/5109407220/jobs/9184234534
2023-05-31 09:21:43 +08:00
3f91127854 [fix](regression)Update external Brown test case out file. #20232
Update external Brown test case out file to match the new precision.
2023-05-31 09:21:04 +08:00
8a54be3318 [feature-wip](workload-group) Support setting user default workload group (#20180)
Issue Number: close #xxx

SET PROPERTY 'default_workload_group' = 'group_name';
2023-05-31 09:18:25 +08:00
aae04d9680 [Chore](log) Remove some verbose log && Change log level (#20236) 2023-05-31 09:15:01 +08:00
ff05217a1e [regression](p0) fix test for array_enumerate_uniq (#20231) 2023-05-30 22:14:19 +08:00
56fa38de1d [Enhencement](JDBC Catalog) refactor jdbc catalog insert logic (#19950)
This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks
1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete
2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations
3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code

ToDo: Binary type,like bit,binary, blob...

Finally, special thanks to @BePPPower , @AshinGau  for his work

Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
2023-05-30 22:03:39 +08:00
ccfc4978c1 [feature](nereids) support the rewrite rule for push-down filter through sort (#20161)
Support the rewrite rule for push-down filter through sort.
We can directly push-down the filter through sort without any conditions check.

Before this PR:
```
mysql> explain select * from (select * from t1 order by a) t2 where t2.b > 2;
+-------------------------------------------------------------+
| Explain String                                              |
+-------------------------------------------------------------+
| PLAN FRAGMENT 0                                             |
|   OUTPUT EXPRS:                                             |
|     a[#2]                                                   |
|     b[#3]                                                   |
|   PARTITION: UNPARTITIONED                                  |
|                                                             |
|   VRESULT SINK                                              |
|                                                             |
|   3:VSELECT                                                 |
|   |  predicates: b[#3] > 2                                  |
|   |                                                         |
|   2:VMERGING-EXCHANGE                                       |
|      offset: 0                                              |
|                                                             |
| PLAN FRAGMENT 1                                             |
|                                                             |
|   PARTITION: HASH_PARTITIONED: a[#0]                        |
|                                                             |
|   STREAM DATA SINK                                          |
|     EXCHANGE ID: 02                                         |
|     UNPARTITIONED                                           |
|                                                             |
|   1:VTOP-N                                                  |
|   |  order by: a[#2] ASC                                    |
|   |  offset: 0                                              |
|   |                                                         |
|   0:VOlapScanNode                                           |
|      TABLE: default_cluster:test.t1(t1), PREAGGREGATION: ON |
|      partitions=0/1, tablets=0/0, tabletList=               |
|      cardinality=1, avgRowSize=0.0, numNodes=1              |
+-------------------------------------------------------------+
30 rows in set (0.06 sec)
```

After this PR:
```
mysql> explain select * from (select * from t1 order by a) t2 where t2.b > 2;
+-------------------------------------------------------------+
| Explain String                                              |
+-------------------------------------------------------------+
| PLAN FRAGMENT 0                                             |
|   OUTPUT EXPRS:                                             |
|     a[#2]                                                   |
|     b[#3]                                                   |
|   PARTITION: UNPARTITIONED                                  |
|                                                             |
|   VRESULT SINK                                              |
|                                                             |
|   2:VMERGING-EXCHANGE                                       |
|      offset: 0                                              |
|                                                             |
| PLAN FRAGMENT 1                                             |
|                                                             |
|   PARTITION: HASH_PARTITIONED: a[#0]                        |
|                                                             |
|   STREAM DATA SINK                                          |
|     EXCHANGE ID: 02                                         |
|     UNPARTITIONED                                           |
|                                                             |
|   1:VTOP-N                                                  |
|   |  order by: a[#2] ASC                                    |
|   |  offset: 0                                              |
|   |                                                         |
|   0:VOlapScanNode                                           |
|      TABLE: default_cluster:test.t1(t1), PREAGGREGATION: ON |
|      PREDICATES: b[#1] > 2                                  |
|      partitions=0/1, tablets=0/0, tabletList=               |
|      cardinality=1, avgRowSize=0.0, numNodes=1              |
+-------------------------------------------------------------+
28 rows in set (0.40 sec)
```
2023-05-30 21:38:16 +08:00
5c8e801761 [Fix](multi catalog, nereids)Fix text file required slot bug (#20214)
required_slots in TFileScanRangeParams params for external hive table may be updated after FileQueryScanNode finalize. For text file, we need to use the origin required_slots in params so that the list could be updated later. Otherwise, query text file may get the following error:
[INTERNAL_ERROR]Unknown source slot descriptor, slot_id=3
2023-05-30 21:29:33 +08:00
b7a69fbf4b [test](regression) add regression test from materialized slot bug (#20207)
The test query includes the conversion of string types to other types, and the processing of materialized columns for nested subqueries, which is the regression test for bug fix(#18783)
2023-05-30 21:23:05 +08:00
accaff1026 [Feature](compaction) wip: single replica compaction (#19237)
Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica.

The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica.
The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool.
When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.
2023-05-30 21:12:48 +08:00
5e5f4ae9de [Improve](CI)Check PR approve status (#20172)
After discussion in the doris community @apache/doris-committers , we limit the PR to be merged only after at least two people approve it.↳

We can try to run it for a while first, and if everyone gives good feedback, we can use this as a mandatory check.

Since the merge must be approved by at least one committer, we only need to judge whether there are two approves, and we don't need to care about the identity of the approve.
When there is a request change, if the other party is a committer, the committer dismiss is required when merging, which is enforced by github, so we don't need to care.
2023-05-30 20:45:16 +08:00
Pxl
7415135ad4 [Enchancement](execute) make assert_cast can output derived class name (#20212)
before:
F0530 11:02:41.989699 1154607 assert_cast.h:54] Bad cast from type:doris::vectorized::IDataType const* to doris::vectorized::DataTypeAggState const*

after:
F0530 11:24:28.390286 1292475 assert_cast.h:46] Bad cast from type:doris::vectorized::DataTypeNullable* to doris::vectorized::DataTypeAggState const*
2023-05-30 20:23:04 +08:00
6f68ec9de0 support query queue (#20048)
support query queue (#20048)
2023-05-30 19:52:27 +08:00
1919355c04 [Feature](Inverted index) add MATCH_ PHRASE query (#20156) 2023-05-30 19:28:57 +08:00