Commit Graph

1813 Commits

Author SHA1 Message Date
d1f4d69032 [regression-test](merge-on-write) Add cases for partial update using insert statement with schema change (#24902) 2023-10-05 22:09:22 +08:00
4ce5213b1c [fix](insert) Fix test_group_commit_stream_load and add more regression in test_group_commit_http_stream (#24954) 2023-10-03 20:56:24 +08:00
10f0c63896 [FIX](complex-type) fix agg table with complex type with replace state (#24873)
fix agg table with complex type with replace state
2023-10-03 16:32:58 +08:00
2c25e0a681 [test](load) add more s3 load regression test cases (#24906) 2023-09-28 22:01:36 +08:00
4c94820ff9 [opt](nereids) adjust column stats in filter estimation (#24973)
TPCDS before
query4  9335    8113    8070    8070
query13 3104    1386    1385    1385
query18 1704    1216    1151    1151
query48 840     840     839     839
query61 435     379     383     379
query71 715     570     579     570
query85 2822    2627    2612    2612
query88 1897    1816    1793    1793
Total cold run time: 20852 ms
Total hot run time: 16799 ms

after:
query4  9610    8287    8249    8249
query13 1721    1013    1042    1013
query18 1585    1186    1155    1155
query48 789     777     778     777
query61 384     387     381     381
query71 713     610     584     584
query85 2020    1867    1843    1843
query88 1859    1812    1805    1805
Total cold run time: 18681 ms
Total hot run time: 15807 ms
2023-09-28 21:34:17 +08:00
b50c1448df [fix](Nereids) should not replace slot by Alias when do NormalizeSlot (#24928)
when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example

Window(max(...), c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

But, in some cases, we remove some SlotReference by mistake, for example

Window(max(...), c1, c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get

Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
2023-09-28 14:51:08 +08:00
4ff1ab7a4d [fix](regression-test) regenerate test_http_stream_properties.out file (#24946) 2023-09-28 10:39:15 +08:00
671b5f0a0a [Bug](pipeline) Fix block reusing for union source operator (#24977)
[CANCELLED][INTERNAL_ERROR]Merge block not match, self:[String], input:[String, Nullable(String), Nullable(String), Nullable(String), Nullable(String), DateV2]
2023-09-27 19:41:56 +08:00
bb7f8d18a8 [fix](nereids) push down filter through partition topn (#24944)
support pushing down filter through partition topn if the filter can pass through window.
fix CreatePartitionTopNFromWindow bug which may generate two partition topn unexpectly.
case:
select * from (select c2, row_number() over (partition by c2) as rn from t1) T where rn<=1 and c2 = 1;
before this pr:
| PhysicalResultSink                       |
| --PhysicalDistribute                     |
| ----filter((rn <= 1))                    |
| ------PhysicalWindow                     |
| --------PhysicalQuickSort                |
| ----------PhysicalDistribute             |
| ------------PhysicalPartitionTopN        |
| --------------filter((T.c2 = 1))         |
| ----------------PhysicalPartitionTopN    |
| ------------------PhysicalProject        |
| --------------------PhysicalOlapScan[t1] |
+------------------------------------------+
after:

| PhysicalResultSink                     |
| --PhysicalDistribute                   |
| ----filter((rn <= 1))                  |
| ------PhysicalWindow                   |
| --------PhysicalQuickSort              |
| ----------PhysicalDistribute           |
| ------------PhysicalPartitionTopN      |
| --------------PhysicalProject          |
| ----------------filter((T.c2 = 1))     |
| ------------------PhysicalOlapScan[t1] |
+----------------------------------------+
2023-09-27 19:38:04 +08:00
00e8d1c3b4 [Fix](Planner) disable bitmap type in compare expression (#24792)
Problem:
be core because of bitmap calculation.

Reason:
when be check failed, it would core directly.

Example:
SELECT id_bitmap FROM test_bitmap WHERE id_bitmap IN (NULL) LIMIT 20;

Solved:
Forbidden this kind of expression in fe when analyze. And also forbid bitmap type comparing in other unsupported expressions.
2023-09-27 16:57:06 +08:00
9562e280af [enhancement](Nereids): remove stats derivation in CostAndEnforce job (#24945)
1. remove stats derivation in CostAndEnforce job
2. enforce valid for each stats after estimating
2023-09-27 16:31:03 +08:00
26818de9c8 [feature](jni) support complex types in jni framework (#24810)
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
2023-09-27 14:47:41 +08:00
452318a9fc [Enhancement](streamload) stream tvf support user specified label (#24219)
stream tvf support user specified label
example:

curl -v --location-trusted -u root: -H "sql: insert into test.t1 WITH LABEL label1 select c1,c2 from http_stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_http_stream
return:

{
    "TxnId": 2064,
    "Label": "label1",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 2,
    "NumberLoadedRows": 2,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 27,
    "LoadTimeMs": 152,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 83,
    "ReadDataTimeMs": 92,
    "WriteDataTimeMs": 41,
    "CommitAndPublishTimeMs": 24
}
2023-09-27 12:09:35 +08:00
Pxl
18b5f70a7c [Bug](materialized-view) enable rewrite on select materialized index with aggregate mode (#24691)
enable rewrite on select materialized index with aggregate mode
2023-09-27 11:30:36 +08:00
a8f312794e [feature](nereids)support stats estimation for is-null predicate (#24764)
1. condition order: filter/hashCondition/otherCondition,
2. update regression out
3. remove tpch_sf500 shape case(covered by tpch sf1000)
4. implement is-null stats estimation
5. update ssb shape
2023-09-27 10:04:35 +08:00
6d27a016b9 [Improvement](regression-test) add http_stream case (#24930) 2023-09-27 09:55:52 +08:00
9a78681d6f [fix](pipelineX) remove cases with conflicting table names (#24922) 2023-09-26 22:24:48 +08:00
90c5461ad2 [fix](Nereids) let dml work well (#24748)
Co-authored-by: sohardforaname <organic_chemistry@foxmail.com>

TODO:
1. support agg_state type
2. support implicit cast literal exception
3. use nereids execute dml for these regression cases:

- test_agg_state_nereids (for TODO 1)
- test_array_insert_overflow (for TODO 2)
- nereids_p0/json_p0/test_json_load_and_function (for TODO 2)
- nereids_p0/json_p0/test_json_unique_load_and_function (for TODO 2)
- nereids_p0/jsonb_p0/test_jsonb_load_and_function (for TODO 2)
- nereids_p0/jsonb_p0/test_jsonb_unique_load_and_function (for TODO 2)
- json_p0/test_json_load_and_function (for TODO 2)
- json_p0/test_json_unique_load_and_function (for TODO 2)
- jsonb_p0/test_jsonb_load_and_function (for TODO 2)
- jsonb_p0/test_jsonb_unique_load_and_function (for TODO 2)
- test_multi_partition_key (for TODO 2)
2023-09-26 21:08:24 +08:00
a6a0e78f32 [Enhancement](streamload) stream tvf support compress (#24303) 2023-09-26 20:58:20 +08:00
55d1090137 [feature](insert) Support group commit stream load (#24304) 2023-09-26 20:57:02 +08:00
c9cf9499b6 [impro](regression test) Add case for time cast #24895 2023-09-26 19:47:38 +08:00
04bf9bce54 [fix](planner)update explode slot's nullable info in analyze phase (#24879) 2023-09-26 18:14:04 +08:00
94082ae59c [Fix](inverted index) fix tokenize function coredump (#24896) 2023-09-26 17:31:10 +08:00
1abda1c446 [Fix](merge-on-write) Correct the alignment process when the existing rows with same key has marked delete sign (#24877) 2023-09-26 16:09:20 +08:00
bc747be511 [Improvement](regression-test) add stream load case (#24396) 2023-09-26 15:35:19 +08:00
733b71828c [fix](pipelineX) fix do not set per_fragment_instance_idx (#24890) 2023-09-26 13:10:30 +08:00
dae0dc1652 [test](load) add some S3 TVF load regression tests (#24719) 2023-09-26 12:21:42 +08:00
e4c0c98efa [fix](Nereids): round microsecond when specify scale of microsecond (#24854) 2023-09-26 10:11:53 +08:00
8d4fd76a16 [Feature](StreamLoad2PC) Support commit and abort streamload2PC by label (#24613) 2023-09-25 22:21:27 +08:00
12bee54cf7 [fix](Nereids) function implict cast for json is not work well (#24852)
for sql like:

SELECT JSONB_EXTRACT('{"k1":"v31","k2":300}','$.k1');

the result should be

+------------------------------------------------+
| jsonb_extract('{"k1":"v31","k2":300}', '$.k1') |
+------------------------------------------------+
| "v31"                                          |
+------------------------------------------------+

but curent result is

+------------------------------------------------+
| jsonb_extract('{"k1":"v31","k2":300}', '$.k1') |
+------------------------------------------------+
| <null>                                         |
+------------------------------------------------+
2023-09-25 21:10:04 +08:00
3e686f4306 [feature](javaudf)support no input java udf (#24457)
* support no input java udf

* add license
2023-09-25 18:56:44 +08:00
90791f0b19 [fix](nereids)fix bug of exists subquery with limit clause (#24630)
create table t1(c1 int, c2 int);
create table t2(c1 int, c2 int);
insert into t1 values (1,1);
insert into t2 values (1,1);

select * from t1 where exists (select * from t2 where t1.c1 = t2.c1 limit 0);

the result should be empty set.
2023-09-25 17:15:08 +08:00
xfz
1b95ce1d93 [feature](json-function) add json_insert, json_replace, json_set functions (#24384)
[feature](json-function) add three json funcitons
2023-09-25 12:52:29 +08:00
cdc8189697 [Impro](regression-test) More test cases for function round (#24791) 2023-09-25 10:39:45 +08:00
9cd9e195d8 [test](load) add some s3 load regression test (#24399) 2023-09-23 22:40:45 +08:00
ce79711b0d [FIX](serde) fix map/array deserialize string with quote pair (#24808) 2023-09-23 01:12:20 +08:00
ac55d45f79 [Fix](topn opt) fix heap use after free when shrink in fetch phase (#24774) 2023-09-22 19:48:05 +08:00
35c0697a60 [fix](planner)SelectStmt's constructor should initialize originSelectList member (#24755)
resetSelectList method will use originSelectList to recover the origin select list. If the originSelectList is lost in constructor, the resetSelectList will fail to recover and make the analyze process fail.
2023-09-22 19:31:29 +08:00
263506f8ab [refactor](pipelineX) add MultiCast operator (#24656) 2023-09-22 15:41:14 +08:00
74bba4bdaf [enhancement](regression-test) Add routine load case (#24536) 2023-09-22 14:55:01 +08:00
320fc1481a [fix](Nereids) some expression not cast in in predicate (#24680)
1. should use castIfNotSameType in InPredicate and CaseWhen
2. StringLikeLiteral should override equals to ignore type
2023-09-22 12:58:33 +08:00
22616d125d [function](bitmap) add function alias bitmap_andnot and bitmap_andnot_count (#24771) 2023-09-22 12:18:31 +08:00
090be20ca4 [cases](regresstests) add negative case for agg table and fix agg table support replace typ… #24715
add negative case for agg table
fix agg table support replace agg type for complex type , and Now We only support complex type with agg state for replace only
fix test output
2023-09-22 09:05:20 +08:00
58ab25ccaa Revert "[Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773)" (#24731)
This reverts commit 3ee89aea35726197cb7e94bb4f2c36bc9d50da84.
2023-09-21 21:01:28 +08:00
a48b19ceb6 [feature](Outfile) select into outfile supports to export struct/map/array type data to orc file format (#24350)
We do not support nested complex type in this pr.
2023-09-21 20:15:18 +08:00
7630fe7b7b [bug](node)fix dense_rank function in partition sort node return wrong rows (#24727) 2023-09-21 19:13:30 +08:00
5b590bbfcf [feat](Nereids) add lambda func array_last and array_first (#24682) 2023-09-21 12:23:34 +08:00
Pxl
01b7cb8db7 [Bug](materialized-view) fix failed insert into mv meet default null (#24545)
fix failed insert into mv meet default null
2023-09-21 10:47:29 +08:00
0d20a61587 [fix](nereids)left outer join estimation (#24462)
A left join B on A.x=B.x and A.y=B.y
B.x and B.y make result tuple number scale out.
suppose A is scaled out by B.x N1 time, and scaled out by B.y N2 time, and N1 < N2.
we should choose N1 as the final scale out factor, not N2.

this pr impact on tpcds_sf100 59/17/29/25/47/40/54

before
query59 77295 75279 75230 75230
query17 22642 21566 21599 21566
query29 16508 16092 16006 16006
query25 20262 20571 21171 20571
query47 23571 23264 23107 23107
query40 3305 2849 3064 2849
query54 9052 8882 8715 8715
Total cold run time: 172635 ms
Total hot run time: 168044 ms

after
query59 56435 54717 53919 53919
query17 24167 22377 23237 22377
query29 16950 18325 16333 16333
query25 21478 22975 21358 21358
query47 24412 24611 23920 23920
query40 5491 4779 5176 4779
query54 8671 8664 8658 8658
Total cold run time: 157604 ms
Total hot run time: 151344 ms
2023-09-21 09:45:01 +08:00
4ca650f306 [improvement](jdbc catalog) Adjust function replacement order and add new function support (#24685) 2023-09-21 08:45:27 +08:00