Commit Graph

3441 Commits

Author SHA1 Message Date
59f34be41f [fix](having-clause) having clause do not works correct with same alias name (#15143) 2023-01-05 10:15:15 +08:00
5ff5b8fc98 [feature](mark join) Support mark join for hash join node (#15569)
* [feature](mark join) Support mark join for hash join node
2023-01-05 09:32:26 +08:00
61d538c713 [improvement](storage-policy) Add check validity when create storage policy. (#14405) 2023-01-04 22:24:49 +08:00
7ef3940809 [fix](storage-policy) fix some bug (#15585)
1. fix datetime ms transfer to s bug
2. fix alter storage policy notify be missing field(datetime, ttl)
3. support alter storage policy use "h, hour, d, day" as ttl filed
2023-01-04 16:49:51 +08:00
c42c61dcad [fix](bitmapfilter) fix bitmap filter not pushing down (#15532) 2023-01-04 14:33:53 +08:00
a4af1fbf90 [fix](inbitmap) forbid having clause to include in bitmap. (#15494) 2023-01-04 14:33:18 +08:00
wxy
e0c56bcd20 [Feature](export) Support cancel export statement (#15128)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-01-04 14:08:25 +08:00
7728794b4a [fix](Nereids) SimplifyArithmeticRule generate wrong expression after process (#15580)
in the case of 'a / b', if a is constant, after apple SimplifyArithmeticRule, expression will be convert to 'b * a' by mistake.
2023-01-04 11:10:15 +08:00
f2f06c1acc [feature](nereids) Support select temp partition (#15579)
Support such grammer:
    select * from t_p temporary partition(tp1);
    select * from t_p temporary partitions(tp1);
    select * from t_p temporary partition tp1;
2023-01-04 11:04:36 +08:00
eef1f432dd [Bug](datetimev2/decimalv3) Fix wrong predicate infer rule (#15574) 2023-01-04 10:03:43 +08:00
a97f582b93 [fix](nereids) use DAYS as default unit for DATE_ADD and DATE_SUB function (#15559) 2023-01-04 01:55:15 +08:00
18bc354c06 [fix](Nereids) use correct column unique id when read data from non-base index (#15534)
When light schema change is enabled by default, a column in OLAP scan is retrieved by column unique id instead of the column name. Columns with the same name would use different unique IDs among materialized indexes.
This PR ensures that the column in the OLAP scan node could use the correct column unique id.
2023-01-04 01:41:25 +08:00
8d0c06c897 [fix](nereids) binding priority in agg-sort, having, group_by_key (#15240)
This PR defines order_key and having_key binding priority.

1. order key priority
 ```
                select
                        col1 * -1 as col1    # inner_col1 * -1 as alias_col1
                from
                        t
                order by col1;     # order by order_col1
```
to bind `order_col1`, `alias_col1` has higher priority than `inner_col1`

2. having key priority
```
       select (a-1) as a  # inner_a - 1 as alias_a
       from bind_priority_tbl 
       group by a 
       having a=1;
```
to bind having key, `inner_a` has higher priority than `alias_a`

3. group by key binding priority
```
SELECT date_format(b.k10,
         '%Y%m%d') AS k10
FROM test a
LEFT JOIN 
    (SELECT k10
    FROM baseall) b
    ON a.k10 = b.k10
GROUP BY  k10;
```
group_by_key (k10) binding priority:

- agg.child.output
- agg.output
if binding with agg.child.output failed(the slot not found, or more than one candidate slot found in agg.child.output), nereids try to bind group_by_key with agg.output.
In above example, nereids found 2 candidate slots (a.k10, b.k10) in agg.child.output for group_by_key (k10), binding with agg.child.output failed. Then nereids try to bind group_by_key with agg.output, that is `date_format(b.k10, '%Y%m%d') AS k10`. and finally, group_by_key is bound with `alias k10`
2023-01-03 22:09:28 +08:00
55dc541c90 [Fix](Nereids) aggregate function except COUNT should nullable without group by expr (#15547)
Co-authored-by: mch_ucchi
2023-01-03 21:28:07 +08:00
a365486a25 [fix](Nereids) get datatype for binary arithmetic (#15548)
it is just a temporary fix for binary arithmetic. Next we will refactor the TypeCoercion rule to make the behavior exactly same with Lagecy planner.
2023-01-03 19:09:48 +08:00
1dabcb0111 [Fix](Nereids) fix except and intersect error for statsCalculator (#15557)
When calculating the statsCalculator of except and intersect, the slotId of the corresponding column was not replaced with the slotId of output, resulting in NPE.
2023-01-03 17:06:57 +08:00
8748f65a1b [fix](nereids)support nulls first/last in order by clause (#15530) 2023-01-03 14:56:00 +08:00
893f5f9345 [feature-wip](multi-catalog) support automatic sync hive metastore events (#15401)
Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency.
By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove)
so that represents the latest information available in metastore.
We keep track of the last synced event id in each polling
iteration so the next batch can be requested appropriately.
2023-01-03 13:59:14 +08:00
ada72b055f [feature](Nereids): Support any_value/any function. (#15450) 2023-01-03 12:21:13 +08:00
02d035466b [refactor] remove partition pruner v1 (#15552)
partition pruner v1 is no longer used.
Also remove session variable partition_prune_algorithm_version
2023-01-03 11:35:30 +08:00
31548cfe2a [fix](nereids) check failed that exchange node under agg must from PhysicalDistribute (#15473)
when nereids translates PhysicalHashAggreg node to original plan, if the input fragment root is exchange node, nereids assumes that this exchanged node is generated from PhyscialDistirbute node.
But this assumption is not true. For example, sort node could be translated to exchange(merge phase)+sort(local phase).
2023-01-03 11:19:25 +08:00
238ae54620 [fix](merge-on-write) unique key mow tables should require distribution columns be key column (#15535)
* [fix](merge-on-write) unique key mow tables should require distribution columns be key column

* fix code style
2023-01-01 15:53:21 +08:00
e89adc6e1d [fix](create-table) wrong judgement about partition column type (#15542)
The following stmt should be success, but return error: `complex type cannt be partition column:ARRAY<VARCHAR(64)>`

```
create table test_array( 
task_insert_time BIGINT NOT NULL DEFAULT "0" COMMENT "" , 
task_project ARRAY<VARCHAR(64)>  DEFAULT NULL COMMENT "" ,
route_key DATEV2 NOT NULL COMMENT "range分区键"
) 
DUPLICATE KEY(`task_insert_time`)  
 COMMENT ""
PARTITION BY RANGE(route_key) 
(PARTITION `p202209` VALUES LESS THAN ("2022-10-01"),
PARTITION `p202210` VALUES LESS THAN ("2022-11-01"),
PARTITION `p202211` VALUES LESS THAN ("2022-12-01")) 
DISTRIBUTED BY HASH(`task_insert_time` ) BUCKETS 32 
PROPERTIES
(
    "replication_num" = "1",    
    "light_schema_change" = "true"    
);
```

This PR fix this
2022-12-31 13:10:39 +08:00
c47bdf6606 [vectorized](jdbc) fix external table of oracle have keyworld column (#15487)
if column name is keyword of oracle, the query will report error
2022-12-31 12:48:26 +08:00
781fa17993 [fix](Nereids) round function return type should be double (#15502) 2022-12-30 23:36:15 +08:00
96518db263 [enhencement](Nereids) remove constant expr constraint on OneRowRelation (#15506) 2022-12-30 23:35:15 +08:00
100834df8b [fix](nereids) fix some arrgregate bugs in Nereids (#15326)
1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct
2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function
3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case
4. AggregateExpression's nullable method should call inner function's nullable method.
5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))"
6. don't remove project node in PhysicalPlanTranslator
7. add a cast to bigint expr when count( distinct datelike type )
8. fallback to old optimizer if bitmap runtime filter is enabled.
9. fix exchange node mem leak
2022-12-30 23:07:37 +08:00
9c3c9db49b [enhancement](fuzzy test) support fuzzy test of RewriteOrToInPredicateThreshold #15469
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-12-30 22:59:59 +08:00
edecc2e706 [feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211)
* [feature-wip](inverted index)inverted index api: reader

* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL

* [feature-wip](inverted index) Adapt to index meta

* [enhance] add more metrics

* [enhance] add fulltext match query check for column type and index parser

* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
2022-12-30 21:48:14 +08:00
aeaa319203 [fix](fe)change session variable group_concat_max_len from int to long (#15515) 2022-12-30 20:45:44 +08:00
a7895ba169 [feature](Nereids): Support variance_samp function. (#15500) 2022-12-30 17:32:06 +08:00
93a25e1af5 [fix](nereids) the project node is lost when creating PhysicalStorageLayerAggregate node (#15467) 2022-12-30 16:33:24 +08:00
2704651fde [fix](nereids) hll and bitmap type can't be used as order by and group by exprs (#15471)
hll, bitmap, array and quantile state type can't be used in order by, group by and some agg exprs.
2022-12-30 14:26:21 +08:00
520b6d7910 [Improvement](decimalv3) Add a config to check overflow for DECIMALV3 (#15463) 2022-12-30 14:02:24 +08:00
5db8b52441 [Fix](SparkLoad): fix the timeout aborted loadtasks are not cleaned up. (#15480)
Co-authored-by: spaces-x <weixiang06@meituan.com>
2022-12-30 14:02:00 +08:00
5c5b7a5c6f [Broker](bos) suppoert baidu bos object storage for broker (#15448) 2022-12-30 12:39:10 +08:00
917b266799 [fix](planner) table valued function could not used in subquery (#15496) 2022-12-30 10:01:25 +08:00
2c8de30cce [optimize](multi-catalog) use dictionary encode&filter to process delete files (#15441)
**Optimize**
PR #14470 has used `Expr` to filter delete rows to match current data file,
but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files)
to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows.

In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path`
columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column.

After testing, the performance of iceberg v2's MOR is improved by 30%+.

**Fix Bug**
Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr`
and the batch_eof is generated from next batch.
2022-12-30 08:57:55 +08:00
85c7c531f1 [vectorized](jdbc) support array type in jdbc external table (#15303) 2022-12-30 00:29:08 +08:00
9a517d6a8f [DataType](Deciamlv3) change the avg function scale of decimalv3 (#15445) 2022-12-30 00:27:51 +08:00
3ff01ca799 [feature-wip](multi-catalog) support Iceberg time travel in external table (#15418)
For example
SELECT* FROM tbl FOR VERSION AS OF 10963874102873;
SELECT* FROM tbl FOR TIME AS OF '1986-10-26 01:21:00';
2022-12-30 00:25:21 +08:00
6c847daba0 [Feature](Nereids) Support grouping set for materialized index. (#15383)
This PR adds support for materialized index selecting when the query has grouping sets.
2022-12-29 23:17:02 +08:00
dda505487c [fix](nereids) SimplifyArithmeticRuleTest ut failed (#15486)
this PR remove typeCoercion on expected expr in ExpressionRewriteTestHelper. Because we should not rewrite expected expr at all. It will change the expected expr unexpectedly.
2022-12-29 22:53:27 +08:00
c54c2f8035 [fix](statistics) fix npe when __internal_schema not created (#15464) 2022-12-29 21:24:33 +08:00
79113b0cd1 [Fix](storage) Fix bug that cooldown time is error (#15444)
Cooldown time is wrong for data in SSD, because cooldown time for all `table/partitionis`
is only calculated once when class `DataProperty` loaded and that cannot be updated later.
This patch is to ensure that cooldown time for each table/partition can be calculated in real time
when table/partition is created.
Co-authored-by: weizuo <weizuo@xiaomi.com>
2022-12-29 21:01:36 +08:00
e651a9bb11 [feature](nereids) add variance function for nereids (#15370)
support variance function. currently, it dose not support decimalV3 type
2022-12-29 18:33:52 +08:00
25b257e37c [enhancement](session var) varariable to control whether to rewrite OR to IN or not (#15437) 2022-12-29 14:50:32 +08:00
d95be84629 [enhancement](profile) add session variable parallel_fragment_exec_instance_num to profile (#15457) 2022-12-29 14:46:07 +08:00
657f3e6318 [fix](pipeline) disable sharing hashtable for broadcast join for pipeline engine (#15432) 2022-12-29 14:19:57 +08:00
7ab6ea684b [Improvement](meta) hide password of show catalog xxx stmt and for es catalog (#15410)
* [Improvement](meta) hide password of show catalog xxx

* hide es password in show create ctlg and show ctlg xx stmt
2022-12-29 14:16:32 +08:00