Commit Graph

2172 Commits

Author SHA1 Message Date
05adbfdb3d [feature](inverted index) match_phrase_prefix feature added (#27404)
select count() from test_index_match_phrase_prefix where request match_phrase_prefix 'xxx';
2023-12-05 20:15:13 +08:00
e79422addc [refactor](Nereids) compatible with all ability legacy planner (#27947)
refactor:
1. split InsertIntoTableCommand into three sub command
- InsertIntoTableCommand
- InsertOverwriteTableCommand
- BatchInsertIntoTableCommand

feature:
1. support DEFAULT keywords in values list
2. support empty values list
3. support temporary partition
4. support insert into values in txn model

fix:
1. should start transaction before release read lock on target table
2023-12-05 19:10:55 +08:00
Pxl
8a761dff84 [Bug](materialized-view) fix create mv failed on unique table (#27971)
fix create mv failed on unique table
2023-12-05 14:53:09 +08:00
c98b80ae6a [Feature](functions) support ignore and nullable functions (#27848)
support ignore and nullable functions
2023-12-05 14:09:32 +08:00
79f6f85cf1 [FIX](serde)fix datetimev2 serde parse from string with scale (#27965) 2023-12-05 13:58:32 +08:00
17016b9797 [improvement](decimal) use new way for decimal arithmetic precision promotion (#27787)
* [DNM](decimal) use new way for decimal arithmetic precision promotion

* [improvement](decimal) [DNM](decimal) use new way for decimal arithmetic precision promotion
1. [DNM](decimal) use new way for decimal arithmetic precision promotion
2. throw exception if it overflows for decimal arithmetics
3. throw exception if it overflows when casting among number types

* fix compile error of gcc

* improvement

---------

Co-authored-by: morrySnow <morrysnow@126.com>
2023-12-05 12:54:40 +08:00
2f63999066 [fix](Nereids): Preserve "" in single quote strings and '' in double quote strings. (#27959) 2023-12-05 12:30:03 +08:00
358d73a0ae [FIX](complextype) fix empty quote with complex type (#27942) 2023-12-05 12:25:26 +08:00
50ad40a7a8 [test](Nereids): add infer-predicates regression test (#27850) 2023-12-05 10:16:01 +08:00
4934f7ed8d [enhancement](Nereids) add test for some push down filter rule (#27757) 2023-12-04 20:57:57 +08:00
e19af1b2ed [regression](Nereids) add rule test for push down limit + sort test (#26642) 2023-12-04 14:18:55 +08:00
e62d19d90d [improve](partition) support auto list partition with more columns (#27817)
before the partition by column only have one column.
now remove those limit, could have more columns.
2023-12-04 11:33:18 +08:00
f2cfc87aca [fix](nereids) temporary partition is selected only if user manually specified (#27893)
q1: "select * from ut_p temporary partitions(tp1) where val > 0"
in q1, temporary partition tp1 is scaned

q2: "select * from ut_p where val > 0"
in q2, temporary partition tp1 is not scaned.
2023-12-04 09:44:27 +08:00
97d36b4f38 [fix](csv_reader) fix trim_double_quotes behavior change (#27882) 2023-12-03 22:57:55 +08:00
3ddc8211d1 [FIX](array )fix array<null> literal in fe (#27750) 2023-12-03 13:19:22 +08:00
43f2966889 [case](regression) using load_parallelism when load csv and json from s3 (#27525)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-12-03 09:56:47 +08:00
80d2c7ab41 [feature](parquet)support read parquet lzo compress. (#27706) 2023-12-03 09:55:52 +08:00
2e1ce758f1 [feature](function) support ip function ipv6numtostring(alias inet6_ntoa) (#27342) 2023-12-02 11:48:19 +08:00
b74388c3b1 [case](regression) Add backup restore test with specified partition (#27694) 2023-12-01 22:31:59 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
f4afcae452 [case](regression) Stream load 2pc exceptions (#27804)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-12-01 22:27:40 +08:00
8749e5208f [fix](jdbc catalog) fix insert into jdbc table column order (#27855) 2023-12-01 20:46:48 +08:00
007506ce42 [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode (#27842) 2023-12-01 17:32:46 +08:00
34c85c962f [opt](Nereids) improve semi/anti join estimation when column stats are unavailable #27793
this change improves performance of tpch q20. on sf500, improved from 6.3sec to 1.1 sec
this change has no impaction on tpcds

when column stats is unknown,
the basic algorithm to estimate left semi join output row count is its left child output row count.
q1: "A left semi join B on A.x=B.x"
the output row is estimated as A.rowCount.

But the basic algorithm is not good to following pattern:
q2: "A left semi join filter(B) on A.x=B.x"
Because there is a filter on B, usually this left semi join also reduce the row count of A, and we estimate
the output of q2 as A.rowCount * Filter.rowCount/B.rowCount
2023-12-01 15:48:33 +08:00
Pxl
64fad89eb1 [Chore](case) add case of join with big hashtable (#27825)
add case of join with big hashtable
2023-12-01 15:32:23 +08:00
776f0205f3 [Fix](test) Fix an auto partition conflict and add many testcases (#27730)
Fix an auto partition conflict and add many testcases
2023-12-01 09:58:44 +08:00
2afbece0b8 [Fix](type) fix wrong type transform for unix_timestamp (#27728)
fix wrong type transform for unix_timestamp
2023-12-01 09:58:20 +08:00
60bc3be8a2 [Opt](Compression) Opt zstd block decompression by ZSTD_decompressDCtx(). (#27534)
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory. 

Test result: 
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
2023-12-01 09:10:32 +08:00
6a614c3e7b [regression](nereids) add regression case for transposeSemiJoinAgg/transposeSemiJoinAggProject rules (#27664)
add case for transposeSemiJoinAgg/transposeSemiJoinAggProject rules
2023-12-01 08:19:16 +08:00
2b2c2dd772 [fix](sequence column) insert into should require sequence column in all scenario (#27780) 2023-11-30 23:27:58 +08:00
6c4ec3cb82 [FIX](complextype)fix array/map/struct impl hashcode and equals (#27717) 2023-11-30 22:08:15 +08:00
97105e9a16 [regression](compaction) Add case to test single replica compaction (#27199) 2023-11-30 21:27:13 +08:00
f10b7bf7e7 [test](Planner): add regression-test for eager-aggregate (#27732) 2023-11-30 14:42:26 +08:00
e4149c6e4c [Fix](parquet-reader) Fix null map issue in parquet reader. (#27777)
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.

In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
2023-11-30 13:55:37 +08:00
5739167142 [feature](window_function) support to secondary argument to ignore null values in first_value/last_value (#27623) 2023-11-30 09:56:43 +08:00
1f9aa8ab16 [fix](group commit) Fix some group commit problems (#27769) 2023-11-29 23:43:21 +08:00
acc14d7e4c [feature](Planner): Push down LimitDistinct through Union (#27745) 2023-11-29 21:12:42 +08:00
83ed8d3cba [Feat](Nereids) join hint support stage one (#27378)
support view as a independent unit of leading hint
add random test check of leading hint query
add more test with data of leading hint query
add random test check of distribute hint
2023-11-29 21:08:08 +08:00
ce271ff382 [fix](parquet)fix can not read parquet lz4 compress. (#27383)
Fixed the problem of not being able to read parquet lz4 compressed format. By default, it is decompressed according to the Hadoop lz4 format. If it fails, it will fall back to the standard lz4 compression format.
2023-11-29 19:04:53 +08:00
573f0eaad9 [fix](regression)fix parquet data page v2 unstable case (#27753) 2023-11-29 18:58:37 +08:00
498d27c905 [improve](json_reader) add prompt when all fields is null (#27630) 2023-11-29 18:26:42 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
d771f16b79 [fix](parquet)fix bug that can not read parquet data page v2 (#27655) 2023-11-28 22:43:46 +08:00
Pxl
d969047b50 [Refactor](join) refactor of hash join (#27557)
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table

Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
2023-11-28 19:46:00 +08:00
b93dd1d5f7 [enhancement](load) improve error msg for load when cancelled by mem gc (#26809)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-11-28 17:36:11 +08:00
7087250b4a [fix](insert) txn insert and group commit should write \N string corr… (#27637) 2023-11-28 17:32:50 +08:00
91f56cefc0 [feature](Nereids): Pushdown TopN-Distinct through Union (#27628)
```
  TopN-Distinct
  -> Union All
  -> child plan1
  -> child plan2
  -> child plan3
 
  rewritten to
 
  TopN-Distinct
  -> Union All
    -> TopN-Distinct
      -> child plan1
    -> TopN-Distinct
      -> child plan2
    -> TopN-Distinct
      -> child plan3
```
2023-11-28 15:23:46 +08:00
2ea1e9db44 [fix](nereids) temp partition is always pruned (#27636) 2023-11-28 14:18:14 +08:00
f329b90696 [fix](show_variables) fix default value for special variables (#27651) 2023-11-28 11:35:46 +08:00
9903c30591 [opt](nereids)adjust distribution cost for better choice of broadcast join and shuffle join (#27113)
add boundary to distribution cost factor
2023-11-28 10:41:16 +08:00