Commit Graph

18748 Commits

Author SHA1 Message Date
682d72bf4d [fix](noexcept) Remove incorrect noexcept #35230 2024-05-24 16:23:58 +08:00
98b2bda660 [opt](Nereids) remove restrict for count(*) in window (#35220)
support count(*) used for window function

CREATE TABLE `t1` (
  `id` INT NULL,
  `dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

select *, count(*) over() from t1;
2024-05-24 16:23:58 +08:00
8c594c6959 [Fix](regression) fix show data regression case (#35218) 2024-05-24 16:23:58 +08:00
473e14ca82 [chore](backup) log backup/restore job during replay (#35234) 2024-05-24 16:23:57 +08:00
edb276ad92 [fix](typo)fix show backend typo (#35198) 2024-05-24 16:23:57 +08:00
cf46ebe31d [improve](jdbc catalog) Remove all property checks during create (#35194) (#35354) 2024-05-24 16:12:02 +08:00
f062506b22 [fix](nereids)the preagg state for count(*) is wrong (#35326) 2024-05-24 15:23:04 +08:00
0b90e37227 [fix](Nereids) string literal coercion of in predicate (#35337)
pick from master #35200

Description:
   The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。
```
mysql> set enable_nereids_planner = false;
Query OK, 0 rows affected (0.03 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+---------------+
| id | sum(`clicks`) |
+------------+---------------+
|  787934713 |          2838 |
|  306960695 |           339 |
+------------+---------------+
2 rows in set (1.81 sec)

mysql> set enable_nereids_planner = true;
Query OK, 0 rows affected (0.02 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+-------------+
| id | sum(clicks) |
+------------+-------------+
|  787934713 |        2838 |
|  306960695 |         339 |
+------------+-------------+
2 rows in set (28.14 sec)
```

Reason:
In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。

Solved:
do process string literal with numeric in `in predicate` like in `comparison predicate`;
test table:
```
create table a_table(
    k1 BIGINT NOT NULL,
    k2 VARCHAR(100) NOT NULL,
    v1 INT SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(k1,k2)
distributed BY hash(k1) buckets 2
properties("replication_num" = "1");
insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
explain plan select * from a_table where k1 in ('10', '20001');
```
before optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 1ms) ==========                                                                                        |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 2ms) ==========                                                                                      |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                      |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 6ms) ==========                                                                                     |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                         |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 6ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather )                                                  |
|    +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                       |
|       +--PhysicalOlapScan[a_table]@0 ( stats=1 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
after optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 15ms) ==========                                                                                       |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 11ms) ==========                                                                                     |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) )                                                                        |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 12ms) ==========                                                                                    |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) )                                                                           |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 4ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather )                                                     |
|    +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) )                                                            |
|       +--PhysicalOlapScan[a_table]@0 ( stats=2 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
2024-05-24 14:26:52 +08:00
bb3a0fd30e [fix](nereids)should use nereids expr's nullable info when call Expr's toThrift method (#35274) 2024-05-24 02:24:40 +08:00
9277480f00 [fix](nereids)days_diff should match datetimev2 function sigature in higher priority (#35295) 2024-05-24 02:21:55 +08:00
4b7608c2bf [fix](inverted index)Change index_id from int32 to int64 to avoid overflow (#35206)
Co-authored-by: Luennng <luennng@gmail.com>
2024-05-23 19:12:55 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
a52ee6e9b9 [opt](mtmv) generate bi-map between base table and materialized view partitions (#35131) 2024-05-23 19:11:33 +08:00
9ba995317a [fix](routineload) fix data source properties do not persist in edit log (#35137) 2024-05-23 19:09:41 +08:00
4008dc03cf [Fix](regression) fix test_user_var.groovy by add set disable_nereids_rules=PRUNE_EMPTY_PARTITION (#35151) 2024-05-23 19:06:38 +08:00
eb49cd839b [refactor](datalake) return the error status instead of static_cast<void> (#34873)
Followup #34797
`static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>`  with `RETURN_IF_ERROR`.

The following three scenarios need to be handled separately and cannot be simply replaced:
1. The outer function returns void;
2. Call status function inner constructors or destructors;
3. Call status function with best effort, and should ignore the wrong status.
2024-05-23 19:06:21 +08:00
b3f6668464 fix case: test_create_table_without_distribution 2024-05-23 19:03:30 +08:00
bf37e5c905 [feature](Nereids) support select distinct with aggregate (#35300)
(cherry picked from commit adcbc8cce57aaec507174f39536a028db803a2e5)
2024-05-23 19:01:10 +08:00
4075408b84 [feature](mtmv)Support single table mv rewrite (#34185) (#35242)
Support Single table  query rewrite with out group by
this is useful for complex filter or expresission

the mv def and query is as following
which can be query rewritten

mv def:
```
          select *
            from lineitem where l_comment like '%xx%'
```

query:
```
            select l_linenumber, l_receiptdate
            from lineitem where l_comment like '%xx%'
```

Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>
2024-05-23 19:00:36 +08:00
82887cc2b3 [improvement](mtmv)Split expression get cherry pick21 (#35240)
* [improvement](mtmv) Split the expression mapping in LogicalCompatibilityContext for performance (#34646)

Need query to view expression mapping when check the logic of hyper graph is equals or not.
Getting all expression mapping one-time may affect performance. So split the expresson to three type
JOIN_EDGE, NODE, FILTER_EDGE and get them step by step.

* fix code style
2024-05-23 18:59:56 +08:00
ed464ac24c [branch-2.1] remove dlf dependencies (#35292)
followup #35241
In #35241, we update the doris-shade version to 2.1.0, which already contains dlf dependencies.

pick part of #34749, to remove dlf dependencies in fe/pom.xml
2024-05-23 17:48:04 +08:00
0d2ab9d5c3 [fix](clean trash) Fix clean trash lost submit task (#35271) 2024-05-23 16:27:20 +08:00
acf741fa80 [feature](binlog) Support gc binlogs by history nums and size (#35250)
* [chore](binlog) Add logs about binlog gc (#34359)

* [feature](binlog) Support gc binlogs by history nums and size (#34888)
2024-05-23 14:39:57 +08:00
0b440685d9 [fix](nereids): fix PlanPostProcessor use visitor (#35244)
(cherry picked from commit 46e004a358b9e13adb492d376f77e4317e558a6a)
2024-05-23 14:12:25 +08:00
Pxl
e962a7309b [Chore](runtime-filter) adjust some check and error msg on runtime filter (#35018) (#35251)
adjust some check and error msg on runtime filter
2024-05-23 11:20:02 +08:00
adc364a6fd [feature](Paimon) support deletion vector for Paimon naive reader (#34743) (#35241)
bp #34743
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-05-23 00:01:30 +08:00
50f50cf8cc Revert "[fix][docker] fix kafka test scritps (#33417)" (#35229)
This reverts commit c35b2becdd08ab9255b3a0c2a19d74970f621388.
2024-05-22 20:33:14 +08:00
bc70968019 [chore](regression) Modify character encoding to be consistent with Doris (#35228) 2024-05-22 20:04:50 +08:00
3a5fb6265a [refactor](jdbc catalog) split trino jdbc executor (#34932) (#35176)
pick #34932
2024-05-22 19:09:57 +08:00
05a390e050 [refactor](jdbc catalog) split oceanbase jdbc executor (#34869) (#35175)
pick #34869
2024-05-22 19:09:35 +08:00
24990383ff [refactor](jdbc catalog) split clickhouse jdbc executor (#34794) (#35174)
pick master #34794
2024-05-22 19:09:05 +08:00
291cf57c54 [Configurations](multi-catalog) Add enable_parquet_filter_by_min_max and enable_orc_filter_by_min_max Session variables. (#35012) (#35164)
backport #35012
2024-05-22 19:06:12 +08:00
05cedfca4e [fix](hudi) catch exception when getting hudi partition (#35027) (#35159)
bp #35027
2024-05-22 18:44:19 +08:00
d63c3ae2d4 [bugfix](hive)fix testcase for viewfs for 2.1 #35178 2024-05-22 18:13:09 +08:00
72f2d0d449 [fix](memory) Allow flush memtable failed when process exceed memlimit #35150 2024-05-22 18:11:59 +08:00
9ed4a2023b [fix](Nereids) DatetimeV2 round floor and round ceiling is wrong (#35153) (#35155)
pick from master #35153

1.  round floor was incorrectly implemented as round
2. round ceiling not really round because use double type when divide
2024-05-22 16:23:20 +08:00
30a66a4f9d [regression-test](fix) fix case bug #35201 2024-05-22 15:58:37 +08:00
15f70c8183 [Feat](planner)create table stmt offer default distribution attribute :random distribution and auto bucket (#35189)
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-22 15:18:29 +08:00
c23384ff07 [fix](decimal) Fix long string casting to decimalv2 (#35121) 2024-05-22 14:32:29 +08:00
Pxl
84f7bfffe2 [Bug](bitmap-filter) fix empty bitmap when rf do merge (#34182)
fix empty bitmap when rf do merge
2024-05-22 14:29:50 +08:00
9d7c65b4d8 [fix](memory) Avoid frequently refresh cgroup memory info (#35083) (#35182)
pick #35083
2024-05-22 11:42:08 +08:00
f0b2f5ba36 [Fix](bug) agg limit contains null values may cause error result (#35180) 2024-05-22 10:57:57 +08:00
7ca7458f44 [branch-2.1](routine-load) fix routine load case fail (#35173)
* fix routine load case error
2024-05-22 10:38:55 +08:00
dbf7a76592 Revert "[Chore](rollup) check duplicate column name when create table with rollup (#34827)"
This reverts commit 4a8df535537e8eab8fa2ad54934a185e17d4e660.
2024-05-22 10:19:51 +08:00
af7b16f213 [optimize](desc) display the correct data type of aggStateType (#34968)
If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement.

create table a_table(
    k1 int null,
    k2 agg_state<max_by(int not null,int)> generic,
    k3 agg_state<group_concat(string)> generic
)
aggregate key (k1)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");

before optimize:

mysql> desc a_table;
+-------+------------------------------------------------+------+-------+---------+---------+
| Field | Type                                           | Null | Key   | Default | Extra   |
+-------+------------------------------------------------+------+-------+---------+---------+
| k1    | INT                                            | Yes  | true  | NULL    |         |
| k2    | org.apache.doris.catalog.AggStateType@239f771c | No   | false | NULL    | GENERIC |
| k3    | org.apache.doris.catalog.AggStateType@2e535f50 | No   | false | NULL    | GENERIC |
+-------+------------------------------------------------+------+-------+---------+---------+
3 rows in set (0.00 sec)


after optimize:

mysql> desc a_table;
+-------+------------------------------------+------+-------+---------+---------+
| Field | Type                               | Null | Key   | Default | Extra   |
+-------+------------------------------------+------+-------+---------+---------+
| k1    | INT                                | Yes  | true  | NULL    |         |
| k2    | AGG_STATE<max_by(INT, INT NULL)>   | No   | false | NULL    | GENERIC |
| k3    | AGG_STATE<group_concat(TEXT NULL)> | No   | false | NULL    | GENERIC |
+-------+------------------------------------+------+-------+---------+---------+


Co-authored-by: duanxujian <duanxujian@jd.com>
2024-05-22 10:03:31 +08:00
a8c24d7698 [Fix](function) fix overflow of date_add function (#35080)
fix overflow of date_add function
2024-05-22 10:02:59 +08:00
ced0093d74 [fix](mem_tracker] attach mem tracker in FragmentMgr::apply_filter (#35128) 2024-05-22 10:02:46 +08:00
e8fb47bec1 [fix](broker load) Make Config.enable_pipeline_load works as expected for BrokerLoad (#35105)
* FIX LOAD PROFILE

* FIX
2024-05-22 10:02:02 +08:00
b96148c9cd [Fix](function) fix days/weeks_diff result wrong on BE #35104
select days_diff('2024-01-01 00:00:00', '2023-12-31 23:59:59');
should be 0 but got 1 on BE.
2024-05-22 10:00:26 +08:00
7ae83b60fd [opt](Nereids) opt locality under multi-replica (#34927)
Make tablet locality fixed under multi-replica cases.
Session variable: set enable_ordered_scan_range_locations = true, default false;
3 replica tpcds 100g: 7% improvement
2024-05-22 10:00:13 +08:00