Commit Graph

6973 Commits

Author SHA1 Message Date
1a52e4f7db [chore](mtmv)Optimize mtmv logs and exception information (#34957) (#35446)
pick from master #34957

1. Change some logs to debug.
2. Error prompt changed from MTMV to async materialized view
2024-05-27 16:35:13 +08:00
a32db25070 [enhance](mtmv) allow add index for MTMV (#34225) (#35443)
Previously, the limitation on whether operations can be performed on materialized views was to determine `opType`.

Now, a `allowOpMTMV()` method is implemented through various `clauses`.

Because some operations have the same `opType`, but some operations allow and some do not.

For example, the `opType` for both `add column` and `create index` is `SCHEMA-CHANGE`, but `add column` is not allowed and `create index` is allowed.
2024-05-27 16:22:16 +08:00
d71e9d34fe [Bugfix] Fix mv column type is not changed when do schema change (#34598) 2024-05-27 15:28:12 +08:00
c44affb43f Add downgrade scan thread num by column num (#35351) 2024-05-27 15:27:12 +08:00
Pxl
82ff29faea [Chore](materialized-view) forbid create mv on row store table (#35360)
forbid create mv on row store table
2024-05-27 15:25:16 +08:00
f98ed4e4c5 [bugfix](hive)Misspelling of class names (#34981) 2024-05-27 15:24:38 +08:00
a82c6e869e [fix](Nereids) LogicalEmptyRelation type is wrong (#35382) 2024-05-27 15:23:46 +08:00
2e20e38523 [improvement](jdbc catalog) remove useless jdbc catalog code (#34986) (#35418) 2024-05-27 14:25:26 +08:00
af986c370b [feat](Nereids): Put the Child with Least Row Count in the First Position of Intersect (#34290) (#35339)
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.

The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.
2024-05-27 11:52:35 +08:00
952875b437 [chore](restore) Add logs about the restore table state (#35363) 2024-05-25 17:47:38 +08:00
41c3a27bce [minor](nereids): remove useless code (#35325) 2024-05-25 17:44:39 +08:00
9c6a6893d9 [fix](mtmv) Fix npe when the id of base table in mv is lager than Integer.MAX_VALUE (#35294) (#35384)
This brought by #34768
2024-05-24 23:27:08 +08:00
9af493f3f9 [fix](mtmv) Fix table id overturn and optimize get table qualifier method (#34768) (#35381)
commitid: 806e241
pr: #34768

Table id may be the same but actually they are different tables. so we optimize the
org.apache.doris.nereids.rules.exploration.mv.mapping.RelationMapping#getTableQualifier with following code:

Objects.hash(table.getDatabase().getCatalog().getId(), table.getDatabase().getId(), table.getId())

table id is long, we identify the table used in mv rewrite is bitSet. the bitSet can only use int, so we mapping the long id to init id in every query when mv rewrite
2024-05-24 21:19:15 +08:00
62998719df [opt](mtmv) Add threshold for relation mapping num when query rewrite (#34694) (#35378)
if query and mv def is as following:

    def mv1_1 = """
        select  t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY
        from lineitem t1
        inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
    """
    def query1_1 = """
        select  t1.L_LINENUMBER, t2.L_ORDERKEY
        from lineitem t1
        inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
    """

this will generate relation mapping  by Cartesian, if the num of self join is too much, this will cause the performance problem
so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.
2024-05-24 20:36:29 +08:00
cf84998711 Revert "[fix](broker load) Make Config.enable_pipeline_load works as expected for BrokerLoad (#35105)"
This reverts commit e8fb47bec1a1cfc7b07a6ed4eb36283407a4a9fe.
2024-05-24 19:28:34 +08:00
88e2753e40 [fix](Nereids) fix ShowProcedureStatusCommand sendResultSet (#35355) 2024-05-24 17:22:07 +08:00
639c7ee7fb [fix](decimalv2) fix scale of decimalv2 to string (#35222) (#35359)
* [fix](decimalv2) fix scale of decimalv2 to string
2024-05-24 17:20:43 +08:00
ca86ee7b15 [fix](load) fix wrong assert and cancel load error (#35362) 2024-05-24 17:11:01 +08:00
1e07971a98 [Feat](nereids)when dealing insert into stmt with empty table source, fe returns directly (#35333)
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)

When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.

* [Fix](nereids) fix when insert into select empty table

---------

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-24 16:25:00 +08:00
bfe293c725 [fix](nereids) AdjustNullable rule should handle union node with no children (#35074)
The output slot's nullable info is not correctly calculated in union node.
Because old code only get correct result if union node has children.
But the union node may have no children but only have constantExprList.
So in that case, we should calculate output's nullable info byboth children and constantExprList.
2024-05-24 16:23:58 +08:00
f6beeb1ddd [Enhencement](tvf) select tvf supports using resource (#35139)
Create an S3/HDFS resource that TVF can use it directly to access the data source.
2024-05-24 16:23:58 +08:00
d6e8fb7d77 [feature](mtmv) Support agg state roll up and optimize the roll up code (#35026)
agg_state is agg  intermediate state, detail see 
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state

this support agg function roll up as following
 
+---------------------+---------------------------------------------+---------------------+
| query               | materialized view                           | roll up             |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion()       | agg_funtion_unoin()  or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+

for example which can be rewritten by mv sucessfully as following

MV defination is

```
            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;
```

Query is

```
            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;
```
2024-05-24 16:23:58 +08:00
bbf502dfcf [fix](create-table)The CREATE TABLE IF NOT EXISTS AS SELECT statement should refrain from performing any INSERT operations if the table already exists (#35210) 2024-05-24 16:23:58 +08:00
bd4dd94c24 [Fix](nereids) add checkBlockRules() check for create view and alter view (#34104) 2024-05-24 16:23:58 +08:00
78fab91d6b [fix](overflow) show backends overflow for backend ids (#35245) 2024-05-24 16:23:58 +08:00
dd567fa774 [fix](function) support return JsonType for If function (#35199)
add a FunctionSignature for If to support return Type is JsonType.
2024-05-24 16:23:58 +08:00
98b2bda660 [opt](Nereids) remove restrict for count(*) in window (#35220)
support count(*) used for window function

CREATE TABLE `t1` (
  `id` INT NULL,
  `dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

select *, count(*) over() from t1;
2024-05-24 16:23:58 +08:00
473e14ca82 [chore](backup) log backup/restore job during replay (#35234) 2024-05-24 16:23:57 +08:00
edb276ad92 [fix](typo)fix show backend typo (#35198) 2024-05-24 16:23:57 +08:00
cf46ebe31d [improve](jdbc catalog) Remove all property checks during create (#35194) (#35354) 2024-05-24 16:12:02 +08:00
f062506b22 [fix](nereids)the preagg state for count(*) is wrong (#35326) 2024-05-24 15:23:04 +08:00
0b90e37227 [fix](Nereids) string literal coercion of in predicate (#35337)
pick from master #35200

Description:
   The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。
```
mysql> set enable_nereids_planner = false;
Query OK, 0 rows affected (0.03 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+---------------+
| id | sum(`clicks`) |
+------------+---------------+
|  787934713 |          2838 |
|  306960695 |           339 |
+------------+---------------+
2 rows in set (1.81 sec)

mysql> set enable_nereids_planner = true;
Query OK, 0 rows affected (0.02 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+-------------+
| id | sum(clicks) |
+------------+-------------+
|  787934713 |        2838 |
|  306960695 |         339 |
+------------+-------------+
2 rows in set (28.14 sec)
```

Reason:
In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。

Solved:
do process string literal with numeric in `in predicate` like in `comparison predicate`;
test table:
```
create table a_table(
    k1 BIGINT NOT NULL,
    k2 VARCHAR(100) NOT NULL,
    v1 INT SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(k1,k2)
distributed BY hash(k1) buckets 2
properties("replication_num" = "1");
insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
explain plan select * from a_table where k1 in ('10', '20001');
```
before optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 1ms) ==========                                                                                        |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 2ms) ==========                                                                                      |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                      |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 6ms) ==========                                                                                     |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                         |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 6ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather )                                                  |
|    +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                       |
|       +--PhysicalOlapScan[a_table]@0 ( stats=1 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
after optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 15ms) ==========                                                                                       |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 11ms) ==========                                                                                     |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) )                                                                        |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 12ms) ==========                                                                                    |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) )                                                                           |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 4ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather )                                                     |
|    +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) )                                                            |
|       +--PhysicalOlapScan[a_table]@0 ( stats=2 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
2024-05-24 14:26:52 +08:00
bb3a0fd30e [fix](nereids)should use nereids expr's nullable info when call Expr's toThrift method (#35274) 2024-05-24 02:24:40 +08:00
9277480f00 [fix](nereids)days_diff should match datetimev2 function sigature in higher priority (#35295) 2024-05-24 02:21:55 +08:00
a52ee6e9b9 [opt](mtmv) generate bi-map between base table and materialized view partitions (#35131) 2024-05-23 19:11:33 +08:00
9ba995317a [fix](routineload) fix data source properties do not persist in edit log (#35137) 2024-05-23 19:09:41 +08:00
bf37e5c905 [feature](Nereids) support select distinct with aggregate (#35300)
(cherry picked from commit adcbc8cce57aaec507174f39536a028db803a2e5)
2024-05-23 19:01:10 +08:00
4075408b84 [feature](mtmv)Support single table mv rewrite (#34185) (#35242)
Support Single table  query rewrite with out group by
this is useful for complex filter or expresission

the mv def and query is as following
which can be query rewritten

mv def:
```
          select *
            from lineitem where l_comment like '%xx%'
```

query:
```
            select l_linenumber, l_receiptdate
            from lineitem where l_comment like '%xx%'
```

Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>
2024-05-23 19:00:36 +08:00
82887cc2b3 [improvement](mtmv)Split expression get cherry pick21 (#35240)
* [improvement](mtmv) Split the expression mapping in LogicalCompatibilityContext for performance (#34646)

Need query to view expression mapping when check the logic of hyper graph is equals or not.
Getting all expression mapping one-time may affect performance. So split the expresson to three type
JOIN_EDGE, NODE, FILTER_EDGE and get them step by step.

* fix code style
2024-05-23 18:59:56 +08:00
acf741fa80 [feature](binlog) Support gc binlogs by history nums and size (#35250)
* [chore](binlog) Add logs about binlog gc (#34359)

* [feature](binlog) Support gc binlogs by history nums and size (#34888)
2024-05-23 14:39:57 +08:00
0b440685d9 [fix](nereids): fix PlanPostProcessor use visitor (#35244)
(cherry picked from commit 46e004a358b9e13adb492d376f77e4317e558a6a)
2024-05-23 14:12:25 +08:00
adc364a6fd [feature](Paimon) support deletion vector for Paimon naive reader (#34743) (#35241)
bp #34743
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-05-23 00:01:30 +08:00
3a5fb6265a [refactor](jdbc catalog) split trino jdbc executor (#34932) (#35176)
pick #34932
2024-05-22 19:09:57 +08:00
05a390e050 [refactor](jdbc catalog) split oceanbase jdbc executor (#34869) (#35175)
pick #34869
2024-05-22 19:09:35 +08:00
291cf57c54 [Configurations](multi-catalog) Add enable_parquet_filter_by_min_max and enable_orc_filter_by_min_max Session variables. (#35012) (#35164)
backport #35012
2024-05-22 19:06:12 +08:00
05cedfca4e [fix](hudi) catch exception when getting hudi partition (#35027) (#35159)
bp #35027
2024-05-22 18:44:19 +08:00
9ed4a2023b [fix](Nereids) DatetimeV2 round floor and round ceiling is wrong (#35153) (#35155)
pick from master #35153

1.  round floor was incorrectly implemented as round
2. round ceiling not really round because use double type when divide
2024-05-22 16:23:20 +08:00
15f70c8183 [Feat](planner)create table stmt offer default distribution attribute :random distribution and auto bucket (#35189)
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-22 15:18:29 +08:00
dbf7a76592 Revert "[Chore](rollup) check duplicate column name when create table with rollup (#34827)"
This reverts commit 4a8df535537e8eab8fa2ad54934a185e17d4e660.
2024-05-22 10:19:51 +08:00
af7b16f213 [optimize](desc) display the correct data type of aggStateType (#34968)
If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement.

create table a_table(
    k1 int null,
    k2 agg_state<max_by(int not null,int)> generic,
    k3 agg_state<group_concat(string)> generic
)
aggregate key (k1)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");

before optimize:

mysql> desc a_table;
+-------+------------------------------------------------+------+-------+---------+---------+
| Field | Type                                           | Null | Key   | Default | Extra   |
+-------+------------------------------------------------+------+-------+---------+---------+
| k1    | INT                                            | Yes  | true  | NULL    |         |
| k2    | org.apache.doris.catalog.AggStateType@239f771c | No   | false | NULL    | GENERIC |
| k3    | org.apache.doris.catalog.AggStateType@2e535f50 | No   | false | NULL    | GENERIC |
+-------+------------------------------------------------+------+-------+---------+---------+
3 rows in set (0.00 sec)


after optimize:

mysql> desc a_table;
+-------+------------------------------------+------+-------+---------+---------+
| Field | Type                               | Null | Key   | Default | Extra   |
+-------+------------------------------------+------+-------+---------+---------+
| k1    | INT                                | Yes  | true  | NULL    |         |
| k2    | AGG_STATE<max_by(INT, INT NULL)>   | No   | false | NULL    | GENERIC |
| k3    | AGG_STATE<group_concat(TEXT NULL)> | No   | false | NULL    | GENERIC |
+-------+------------------------------------+------+-------+---------+---------+


Co-authored-by: duanxujian <duanxujian@jd.com>
2024-05-22 10:03:31 +08:00