dd567fa774
[fix](function) support return JsonType for If function ( #35199 )
...
add a FunctionSignature for If to support return Type is JsonType.
2024-05-24 16:23:58 +08:00
98b2bda660
[opt](Nereids) remove restrict for count(*) in window ( #35220 )
...
support count(*) used for window function
CREATE TABLE `t1` (
`id` INT NULL,
`dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
select *, count(*) over() from t1;
2024-05-24 16:23:58 +08:00
473e14ca82
[chore](backup) log backup/restore job during replay ( #35234 )
2024-05-24 16:23:57 +08:00
edb276ad92
[fix](typo)fix show backend typo ( #35198 )
2024-05-24 16:23:57 +08:00
cf46ebe31d
[improve](jdbc catalog) Remove all property checks during create ( #35194 ) ( #35354 )
2024-05-24 16:12:02 +08:00
f062506b22
[fix](nereids)the preagg state for count(*) is wrong ( #35326 )
2024-05-24 15:23:04 +08:00
0b90e37227
[fix](Nereids) string literal coercion of in predicate ( #35337 )
...
pick from master #35200
Description:
The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。
```
mysql> set enable_nereids_planner = false;
Query OK, 0 rows affected (0.03 sec)
mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+---------------+
| id | sum(`clicks`) |
+------------+---------------+
| 787934713 | 2838 |
| 306960695 | 339 |
+------------+---------------+
2 rows in set (1.81 sec)
mysql> set enable_nereids_planner = true;
Query OK, 0 rows affected (0.02 sec)
mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+-------------+
| id | sum(clicks) |
+------------+-------------+
| 787934713 | 2838 |
| 306960695 | 339 |
+------------+-------------+
2 rows in set (28.14 sec)
```
Reason:
In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。
Solved:
do process string literal with numeric in `in predicate` like in `comparison predicate`;
test table:
```
create table a_table(
k1 BIGINT NOT NULL,
k2 VARCHAR(100) NOT NULL,
v1 INT SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(k1,k2)
distributed BY hash(k1) buckets 2
properties("replication_num" = "1");
insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
explain plan select * from a_table where k1 in ('10', '20001');
```
before optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 1ms) ========== |
| UnboundResultSink[4] ( ) |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) |
| +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) |
| +--LogicalCheckPolicy ( ) |
| +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) |
| |
| ========== ANALYZED PLAN (time: 2ms) ========== |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) |
| +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) |
| |
| ========== REWRITTEN PLAN (time: 6ms) ========== |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
| |
| ========== OPTIMIZED PLAN (time: 6ms) ========== |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather ) |
| +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--PhysicalOlapScan[a_table]@0 ( stats=1 ) |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
after optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 15ms) ========== |
| UnboundResultSink[4] ( ) |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) |
| +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) |
| +--LogicalCheckPolicy ( ) |
| +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) |
| |
| ========== ANALYZED PLAN (time: 11ms) ========== |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) |
| +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) |
| |
| ========== REWRITTEN PLAN (time: 12ms) ========== |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
| |
| ========== OPTIMIZED PLAN (time: 4ms) ========== |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather ) |
| +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) ) |
| +--PhysicalOlapScan[a_table]@0 ( stats=2 ) |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
2024-05-24 14:26:52 +08:00
bb3a0fd30e
[fix](nereids)should use nereids expr's nullable info when call Expr's toThrift method ( #35274 )
2024-05-24 02:24:40 +08:00
9277480f00
[fix](nereids)days_diff should match datetimev2 function sigature in higher priority ( #35295 )
2024-05-24 02:21:55 +08:00
a52ee6e9b9
[opt](mtmv) generate bi-map between base table and materialized view partitions ( #35131 )
2024-05-23 19:11:33 +08:00
9ba995317a
[fix](routineload) fix data source properties do not persist in edit log ( #35137 )
2024-05-23 19:09:41 +08:00
bf37e5c905
[feature](Nereids) support select distinct with aggregate ( #35300 )
...
(cherry picked from commit adcbc8cce57aaec507174f39536a028db803a2e5)
2024-05-23 19:01:10 +08:00
4075408b84
[feature](mtmv)Support single table mv rewrite ( #34185 ) ( #35242 )
...
Support Single table query rewrite with out group by
this is useful for complex filter or expresission
the mv def and query is as following
which can be query rewritten
mv def:
```
select *
from lineitem where l_comment like '%xx%'
```
query:
```
select l_linenumber, l_receiptdate
from lineitem where l_comment like '%xx%'
```
Co-authored-by: zfr9527 <qhu15zhang3294197@163.com >
2024-05-23 19:00:36 +08:00
82887cc2b3
[improvement](mtmv)Split expression get cherry pick21 ( #35240 )
...
* [improvement](mtmv) Split the expression mapping in LogicalCompatibilityContext for performance (#34646 )
Need query to view expression mapping when check the logic of hyper graph is equals or not.
Getting all expression mapping one-time may affect performance. So split the expresson to three type
JOIN_EDGE, NODE, FILTER_EDGE and get them step by step.
* fix code style
2024-05-23 18:59:56 +08:00
acf741fa80
[feature](binlog) Support gc binlogs by history nums and size ( #35250 )
...
* [chore](binlog) Add logs about binlog gc (#34359 )
* [feature](binlog) Support gc binlogs by history nums and size (#34888 )
2024-05-23 14:39:57 +08:00
0b440685d9
[fix](nereids): fix PlanPostProcessor use visitor ( #35244 )
...
(cherry picked from commit 46e004a358b9e13adb492d376f77e4317e558a6a)
2024-05-23 14:12:25 +08:00
adc364a6fd
[feature](Paimon) support deletion vector for Paimon naive reader ( #34743 ) ( #35241 )
...
bp #34743
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com >
2024-05-23 00:01:30 +08:00
3a5fb6265a
[refactor](jdbc catalog) split trino jdbc executor ( #34932 ) ( #35176 )
...
pick #34932
2024-05-22 19:09:57 +08:00
05a390e050
[refactor](jdbc catalog) split oceanbase jdbc executor ( #34869 ) ( #35175 )
...
pick #34869
2024-05-22 19:09:35 +08:00
291cf57c54
[Configurations](multi-catalog) Add enable_parquet_filter_by_min_max and enable_orc_filter_by_min_max Session variables. ( #35012 ) ( #35164 )
...
backport #35012
2024-05-22 19:06:12 +08:00
05cedfca4e
[fix](hudi) catch exception when getting hudi partition ( #35027 ) ( #35159 )
...
bp #35027
2024-05-22 18:44:19 +08:00
9ed4a2023b
[fix](Nereids) DatetimeV2 round floor and round ceiling is wrong ( #35153 ) ( #35155 )
...
pick from master #35153
1. round floor was incorrectly implemented as round
2. round ceiling not really round because use double type when divide
2024-05-22 16:23:20 +08:00
15f70c8183
[Feat](planner)create table stmt offer default distribution attribute :random distribution and auto bucket ( #35189 )
...
Co-authored-by: feiniaofeiafei <moailing@selectdb.com >
2024-05-22 15:18:29 +08:00
dbf7a76592
Revert "[Chore](rollup) check duplicate column name when create table with rollup ( #34827 )"
...
This reverts commit 4a8df535537e8eab8fa2ad54934a185e17d4e660.
2024-05-22 10:19:51 +08:00
af7b16f213
[optimize](desc) display the correct data type of aggStateType ( #34968 )
...
If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement.
create table a_table(
k1 int null,
k2 agg_state<max_by(int not null,int)> generic,
k3 agg_state<group_concat(string)> generic
)
aggregate key (k1)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");
before optimize:
mysql> desc a_table;
+-------+------------------------------------------------+------+-------+---------+---------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------------------------------------+------+-------+---------+---------+
| k1 | INT | Yes | true | NULL | |
| k2 | org.apache.doris.catalog.AggStateType@239f771c | No | false | NULL | GENERIC |
| k3 | org.apache.doris.catalog.AggStateType@2e535f50 | No | false | NULL | GENERIC |
+-------+------------------------------------------------+------+-------+---------+---------+
3 rows in set (0.00 sec)
after optimize:
mysql> desc a_table;
+-------+------------------------------------+------+-------+---------+---------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------------------------+------+-------+---------+---------+
| k1 | INT | Yes | true | NULL | |
| k2 | AGG_STATE<max_by(INT, INT NULL)> | No | false | NULL | GENERIC |
| k3 | AGG_STATE<group_concat(TEXT NULL)> | No | false | NULL | GENERIC |
+-------+------------------------------------+------+-------+---------+---------+
Co-authored-by: duanxujian <duanxujian@jd.com >
2024-05-22 10:03:31 +08:00
e8fb47bec1
[fix](broker load) Make Config.enable_pipeline_load works as expected for BrokerLoad ( #35105 )
...
* FIX LOAD PROFILE
* FIX
2024-05-22 10:02:02 +08:00
7ae83b60fd
[opt](Nereids) opt locality under multi-replica ( #34927 )
...
Make tablet locality fixed under multi-replica cases.
Session variable: set enable_ordered_scan_range_locations = true, default false;
3 replica tpcds 100g: 7% improvement
2024-05-22 10:00:13 +08:00
37f1bf317c
[fix](statistics)Disable fetch min/max column stats through HMS, because the value may inaccurate and misleading. ( #35124 ) ( #35145 )
...
backport #35124
2024-05-21 22:58:12 +08:00
009ab77c25
[feature](iceberg)Support write to iceberg for 2.1 ( #35103 ) #34257 #33629
...
bp: #34257 #33629
2024-05-21 22:46:37 +08:00
903ff32021
[opt](fe) exit FE when transfer to (non)master failed ( #34809 ) ( #35158 )
...
bp #34809
2024-05-21 22:31:47 +08:00
98f8eb5c43
[opt](split) get file splits in batch mode ( #34032 ) ( #35107 )
...
bp #34032
2024-05-21 22:27:07 +08:00
0599cb2efd
fix replica's remote data size set to data size ( #35098 )
...
fix replica's remote data size set to data size
2024-05-21 16:48:08 +08:00
706c9c473b
[fix](autobucket) calc bucket num exclude today's partition #34304 #35129
2024-05-21 15:49:16 +08:00
44bb2bb639
[opt](routine-load) do not schedule invalid task ( #34918 )
2024-05-21 13:02:42 +08:00
c0fd98abe5
[Fix](tvf) Fix that tvf reading empty files in compressed formats. ( #34926 )
...
1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
2024-05-21 12:59:31 +08:00
f3762322c8
[opt](nereids)new way to set pre-agg status ( #34738 )
2024-05-21 12:54:49 +08:00
518b143caa
[feat](Nereids)choose agg mv in cbo #35020
2024-05-21 12:54:10 +08:00
45c145fdf7
[fix](Nereids) LogicalPlanDeepCopier copy scan conjuncts in wrong way ( #35077 )
...
pick from master #35076
intro by PR #34933
This PR attempts to address the issue of losing conjuncts
when performing a deep copy of the outer structure.
However, the timing of copying the conjuncts is incorrect,
resulting in the inability to map slots within the conjuncts
to the output of the outer structure.
2024-05-20 21:49:53 +08:00
42425808a1
[Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788 " ( #35056 )
...
* [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788 )
* **Problem:** For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas.
**Cause:** Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE.
**Solution:** Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs.
* 2
* [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940 )
2024-05-20 15:43:46 +08:00
a43c6eca22
[chore](femetaversion) add a check in fe code to avoid fe meta version changed during pick PR ( #35039 )
...
* [chore](femetaversion) add a check in fe code to avoid fe meta version changed during pick PR
* f
* f
---------
Co-authored-by: yiguolei <yiguolei@gmail.com >
2024-05-20 13:29:17 +08:00
be50139eb1
[Fix](Nereids) fix leading with cte and same subqueryalias name ( #34838 ) ( #35047 )
...
fix leading with cte and same subqueryalias name
Example:
with tbl1 as select t1.c1 from t1
select tbl2.c2 from (select / * + leading(t2 tbl1) * / tbl1.c1, t2.c2 from tbl1 join t2) as tbl2 join t3;
Reason:
in this case, before getting analyzed preprocess would change subquery tbl2 to cte plan, and this cte plan should be in upper level cte plan, but not in logical result sink plan
2024-05-20 10:44:22 +08:00
5ac4ea2cd9
[Fix](Nereids) fix leading hint with update of alias name ( #34434 ) ( #35046 )
...
Problem:
when using leading like leading(tbl1 tbl2) in
"select * from (select tbl1.c1 from t1 as tbl1 join t2 as tbl2) join t3 as tbl2 on tbl2.c3 != 101;",
in which tbl2.c3 means t3.c3 but not t2.c3
Causes and solved:
when finding columns in condition, leading hint would find tbl2.c3's RelationId, and when we collect RelationId and aliasName
we should update it if aliasName is repeat
2024-05-20 10:40:10 +08:00
7c29a964e5
[Fix](Nereids) fix leading with multi level of brace pairs ( #34169 ) ( #35043 )
...
fix leading with multi level of brace pairs
example:
leading(t1 {{t2 t3} {t4 t5}} t6) can be reduced to leading(t1 {t2 t3 {t4 t5}} t6)
also update cases which remove project node from explain shape plan
2024-05-20 10:28:22 +08:00
a6a398d7a4
[Fix](function) remove datev2 signature of microsecond #35017
2024-05-19 19:58:02 +08:00
22f85be712
[fix](hive-ctas) support create hive table with full quolified name ( #34984 )
...
Before, when executing `create table hive.db.table as select` to create table in hive catalog,
if current catalog is not hive catalog, the default engine name will be filled with `olap`, which is wrong.
This PR will fill the default engine name base on specified catalog.
2024-05-18 18:42:43 +08:00
89d5f2e816
[fix](multi-catalog)remove http scheme in oss endpoint ( #34907 )
...
remove http scheme in oss endpoint, scheme maybe appear in url (http://bucket.http//.region.aliyuncs.com ) if use http client
2024-05-18 18:42:33 +08:00
a59f9c3fa1
[fix](planner) fix unrequired slot bug when join node introduced by #25204 ( #34923 )
...
before fix, join node will retain some slots, which are not materialized and unrequired.
join node need remove these slots and not make them be output slots.
Signed-off-by: nextdreamblue <zxw520blue1@163.com >
2024-05-18 18:40:56 +08:00
435147d449
[enhance](mtmv) MTMV deal partition use name instead of id ( #34910 )
...
partition id will change when insert overwrite
When the materialized view runs a task, if the base table is in insert overwrite, the materialized view task may report an error: partition not found by partitionId
Upgrade compatibility: Hive currently does not support automatic refresh, so it has no impact
2024-05-18 18:40:29 +08:00
81bcb9d490
[opt](planner)(Nereids) support auto aggregation for random distributed table ( #33630 )
...
support auto aggregation for querying detail data of random distributed table:
the same key column will return only one row.
2024-05-18 18:40:16 +08:00
bfd875eae3
[opt](nereids) lazy get expression map when comparing hypergraph ( #34753 )
2024-05-18 18:38:19 +08:00