Commit Graph

13721 Commits

Author SHA1 Message Date
d68f3f3b3d [Feature](array-functions)improve array functions for array_last_index (#20294)
Now we just support array_first_index for lambda input , but no array_last_index
2023-06-02 13:54:03 +08:00
8ff8705b3f [fix](olap) deletion statement with space conditions did not take effect (#20349)
Deletion statement like this:

delete from tb where k1 = '  ';
The rows whose k1's value is ' ' will not be deleted.
2023-06-02 13:52:57 +08:00
a869056567 [performance](load) support parallel memtable flush for unique key tables (#20308) 2023-06-02 13:49:53 +08:00
e32eba8fdf [refactor](stats) Persist status of analyze task to FE meta data (#20264)
1. In the past, we use a BE table named `analysis_jobs` to persist the status of analyze jobs/tasks, however there are many flaws such as, if BE crashed analyze job/task would failed however the status of analyze job/task couldn't get updated.
2. Support `DROP ANALYZE JOB [job_id]` to delete analyze job
3. Support `SHOW ANALYZE TASK STATUS [job_id] ` to  get the task status of specific job
4. Restrict the execute condition of auto analyze, only when  the  last execution of auto analyze job finished a while ago could be executed again
5. Support analyze whole DB
2023-06-02 12:33:31 +08:00
62c188d9a2 [typo](docs) fix release note 2.0 zh url (#20320) 2023-06-02 11:45:24 +08:00
dc43e65d06 [Bug](pipeline) Fix memory leak if query is canceled caused by memory limit (#20316) 2023-06-02 11:42:52 +08:00
576288cc89 [Profile](exec) Remove unless profile in pipeline exec engine (#20337) 2023-06-02 11:39:11 +08:00
c6b6dcdbc7 [Docs](inverted index) update docs for inverted index parser_mode and match_phrase support (#20266) 2023-06-02 11:38:04 +08:00
86d77084a4 [Fix](multi-catalog) fix oss access issue with aws s3 sdk (#20287) 2023-06-02 10:40:07 +08:00
9d8043e4c1 [Fix](Nereids) should not gather data when sink (#20330) 2023-06-02 10:33:11 +08:00
5a3b97bbf2 [enhancement](struct-type)support comment for struct field (#20200)
support comment for struct field
2023-06-02 10:29:56 +08:00
Bin
075635ee50 [typo](docs)Correct the getting started document (#20245) 2023-06-02 09:58:26 +08:00
937f04033f [Bug](runtime filter) fix NPE if runtime filter has no target (#20338) 2023-06-02 09:54:37 +08:00
8bec2b41db [pipeline](rpc) support closure reuse in pipeline exec engine (#20278) 2023-06-02 09:50:21 +08:00
a8a4da9b9e [fix](nereids)dphyper join reorder may cache wrong project list for project node (#20209)
* [fix](nereids)dphyper join reorder may cache wrong project list for project node
2023-06-02 09:35:28 +08:00
ecdc5124be [feature-wip](duplicate-no-keys) schame change support for duplicate no keys (#19326) 2023-06-02 09:22:41 +08:00
0df073699d [fix](planner)Fix missing kw for workload #20319
1 add usage docment for Workload Group query queue;
2 Fix missing KW for workload, this may cause create workload group failed.
2023-06-02 09:04:22 +08:00
01770ba68a [fix](regression-test) variable's scope returned by curl (#20347) 2023-06-01 23:38:39 +08:00
9b936049b6 [feature-wip](duplicate_no_keys) Add some test cases of all the duplicate tables in test case tpcds_sf100_without_key_p2 and make them duplicate tables without keys (#20332) 2023-06-01 22:29:51 +08:00
363e78f08f [enhancement](publish) print detailed info for failed publish (#20309) 2023-06-01 22:24:16 +08:00
34c1cda14a [bug](udaf) fix java-udaf test case failed with decimal (#20315)
java-udaf have some test case with decimal will be failed in P0, because the decimal of scale is not set correctly
2023-06-01 20:14:54 +08:00
05b7c65509 [fix](regression-test) fix multi-thread problem of regression-test #20322 2023-06-01 18:57:17 +08:00
608d2a3eca [Bug](exec) push down no group by agg min cause error result (#20289)
sql """
CREATE TABLE t1_int (
num int(11) NULL,
dgs_jkrq bigint(20) NULL
) ENGINE=OLAP
DUPLICATE KEY(num)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(num) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
"""
sql """insert into t1_int values(1,1),(1,2),(1,3),(1,4),(1,null);"""
qt_sql """
select min(dgs_jkrq) from t1_int;
"""

get the error result:4

after change we get the right result:1
2023-06-01 17:29:46 +08:00
e416c4d95f [fix](docs)Correct the year and month format placeholder to lower case (#20210) 2023-06-01 16:14:00 +08:00
24fcc2011f [Fix](Nereids) Fix function test case unstable by adding order by (#20295)
Nereids function case do not have a order by clause, so the result will be unstable, so order by is added to ensure stability.
2023-06-01 15:18:25 +08:00
a8b273ae31 [P2](test) Fix P2 output (#20311) 2023-06-01 15:11:12 +08:00
f0513a861d [Improve](Scan) add a session variable to make scan run serial (#20220)
Parallel scanning can result in some read amplification, for example, select * from xx where limit 1 actually requires only one row of data. However, due to parallel scanning of multiple tablets, read amplification occurs, leading to performance bottlenecks in high-concurrency scenarios. This PR Adding a SessionVariable to enforce serial scanning can help mitigate this issue.
2023-06-01 15:06:35 +08:00
0ff3073fc4 [improvement](Nereids): limit Memo groupExpression size. (#20272) 2023-06-01 13:30:19 +08:00
519f01133a [feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811) 2023-06-01 13:09:58 +08:00
04644c6dfa [fix](regression) regression test test_bitmap_filter_nereids could not run (#20293) 2023-06-01 12:56:32 +08:00
1b968c4ade [fix](multi catalog)Fix nereids planner text format include extra column index bug (#20260)
Nereids planner include all columns index in TFileScanRangeParams, this may cause the column projection incorrect for
 text format table. Because csv reader use the column index position to split a line. Extra column index will cause get 
wrong split result. This PR is to reset the column index after Projection, remove the useless column index.
2023-06-01 12:17:47 +08:00
cc41cb0e7e [Fix](Nereids) fix some insert into select bugs (#20052)
fix 3 bugs:

1. failed to insert into a table with mv.
```sql
create table t (
    id int,
   c1 int,
   c2 int,
   c3 int
) duplicate key(id)
distributed by hash(id) buckets 4

create materialized view k12s3m as select id, sum(c1), max(c3) from t group by id;

insert into t select -4, -4, -4, 'd';
```
insert will rise exception because mv column is not handled. now we will add a target column and value as defineExpr.

2. failed to insert into a table with not all the columns.
```sql
insert into t(c1, c2) select c1, c2 from t
```
and t(id ukey, c1, c2, c3), will insert too many data, we fix it by change the output partitions.

3. failed to insert into a table with complex select.
the select statement has join or agg, fix the bug by the way similar to the one at 2nd bug.
2023-06-01 12:15:19 +08:00
6befa53caa fix fe meta upgrade error (#20291)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-01 12:09:08 +08:00
4387f47fb5 [pipeline](load) support pipeline load (#20217) 2023-06-01 11:42:43 +08:00
e748b43d3d [bug](parse) fix can't create aggregate column with agg_state (#20235)
fix can't create aggregate column with agg_state
2023-06-01 11:18:40 +08:00
68e593fbf1 [fix](nereids)(planner) case when should return NullLiteral when all case result is NullLiteral (#20280) 2023-06-01 11:11:41 +08:00
4a682a0a46 [fix][regression-test] set timeout of curl in regression test to avoid hanged when be crashed. (#20222)
Currently in regression-test, when a be crash, because curl does not set a timeout, suite-thread will get stuck.
To solve this, encapsulate the call to be into a function, set the timeout uniformly, and avoid getting stuck
2023-06-01 11:00:09 +08:00
492154ee55 [fix](regression-test) add jdbc timeout (#20228)
In some cases ( or bugs), doris may returned query to jdbc, but jdbc can not recognized what doris sent back,
so hanged. To fix this, add a timeout of 30 minutes to jdbc connection.
2023-06-01 10:50:17 +08:00
9e21318834 [refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594)
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
2023-06-01 10:25:04 +08:00
5b6b1b38a6 [Enhancement](merge-on-write) Performance optimization of calculations of delete bitmap between segments (#20153)
1. Use heap sort to find duplicated keys between segments and update the delete-bitmap. The old implementation traversed all keys in all segments, used each key to search for duplicates in earlier segments, and then marked them for deletion.

2. Trick: Each time the heap top is popped as a key1, the new heap top is key2, allowing for jumping directly from key1 to key2 instead of advancing iteratively.

3. Effect: This technique works well when there are many segments within the same rowset and the imported data is relatively ordered.
2023-06-01 10:12:59 +08:00
90cd791789 [fix](tvf) s3 tvf specify region and s3.region params failed (#19921) 2023-06-01 10:00:49 +08:00
09e6b6580f [fix](checksum) delete predicates might be inconsistent with rowset readers in checksum task (#20251)
The BlockReader capture rowsets and init delete_handler in different place. If there is a base compaction, it may result in obtaining inconsistent delete handlers. Therefore, place these two operations under the same lock.
2023-06-01 09:06:51 +08:00
65a75abecb [Fix](Nereids) bitmap type should not be used in comparison predicate (#19807)
When using nereids, if we use compare operator of bitmap type, an analyze exception need to be throwed.

like: 
select id from (select BITMAP_EMPTY() as c0 from expr_test) as ref0 where c0 = 1 order by id

Which c0 in subq0 is a bitmap type, this scenario is not supported right now.
2023-05-31 23:09:36 +08:00
6ee99c4138 [fix](load_profile) fix rows stat and add close_wait in sink (#20181) 2023-05-31 18:23:30 +08:00
1aefc26ca0 [Bug](memtable) fix a bug occurred when we were inserting data into duplicate table without keys (#20233) 2023-05-31 18:21:36 +08:00
d963bf8d79 [deps](aws) upgrade to 1.9.272 to fix non-compliant RFC3986 encoding (#20252) 2023-05-31 18:19:06 +08:00
6adb3fdf11 [fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258)
if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.
2023-05-31 18:09:50 +08:00
5f591a6d12 [opt](nereids) generate in-bloom filter if target is local for pipeline mode (#20112)
update in-filter usage in pipeline mode:
1. if the target is local, we use in-bloom filter. Let BE choose in or bloom according to actual distinctive number
2. set default runtime_filter_max_in_num to 1024
2023-05-31 17:24:38 +08:00
c03a19ea23 [improvement](bitmap) Using set to store a small number of elements to improve performance (#19973)
Test on SSB 100g:

select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey;
exec time: 4.388s

create materialized view:

create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey;
select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey;
exec time: 12.908s

test with the patch, exec time: 5.790s
2023-05-31 16:13:42 +08:00
b53c42636e [Fix](Nereids) fold constant result is wrong on functions relative to timezone (#19863) 2023-05-31 15:52:40 +08:00