Commit Graph

2848 Commits

Author SHA1 Message Date
7248420cfd [chore](session_variable) Add 'data_queue_max_blocks' to prevent the DataQueue from occupying too much memory. (#34017) (#34395) 2024-05-05 21:20:33 +08:00
8da260ee0d [fix](hdfs)read 'fs.defaultFS' from core-site.xml for hdfs load which has no default fs (#34217) (#34372)
bp #34217
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-05-01 00:31:49 +08:00
35f8563a75 [feature](iceberg) support iceberg equality delete (#34223) (#34327)
bp #34223

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-30 11:51:29 +08:00
7cb00a8e54 [Feature](hive-writer) Implements s3 file committer. (#34307)
Backport #33937.
2024-04-29 19:56:49 +08:00
1bfe0f0393 [feature](iceberg)support read iceberg complex type,iceberg.orc format and position delete. (#33935) (#34256)
master #33935
2024-04-29 14:40:12 +08:00
20bd0c2987 [FIX](cases )fix ipv6 value for regress case 2024-04-29 13:37:29 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
11039ade7b [opt](paimon) support mapping Paimon column type "Row" to Doris type "Struct" (#34239)
backport: #33786
2024-04-28 19:38:50 +08:00
1fda68f738 [feature](planner) Support select constant from dual syntax sugar (#34200) (#34232)
In MySQL, it's common to use a simplified syntax like `SELECT constant FROM dual`
which is equivalent to just `SELECT constant`.
This syntax is often used by BI tools when utilizing MySQL connectors to verify connection validity.
To enhance compatibility and ensure seamless integration with such tools,
we have now implemented this feature in Doris.

### Key Changes:
- Doris now interprets `SELECT constant FROM dual` as `SELECT constant`, aligning with MySQL's behavior.
- This update ensures that BI tools can use standard MySQL connectors without modifications or errors when connecting to Doris.
2024-04-28 15:56:16 +08:00
45556686ea [fix](test) fix some external test cases (#34209)
Fix some test cases and enable `test_information_schema_external` suite
2024-04-27 23:25:33 +08:00
cd1c9edd71 [fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072) (#34204)
Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>
2024-04-27 18:19:08 +08:00
36e80af327 [fix](schema change) fix the defineName field is not the same when copying column (#34201)
* [fix](schema change) fix the defineName field is not the same when copying column

* fix
2024-04-27 11:59:07 +08:00
414fbd353e [fix](ES catalog)Make col != '' behavior consistent with SQL (#34151)
In SQL syntax, `col != ''` equals `col.length() > 0`.
It means that this column must exist in ES doc fields and its content is not empty.
In this PR, we make a special translation for this binary predicate to keep the behavior of both consistent.

---------

Co-authored-by: Luennng <luennng@gmail.com>
2024-04-27 02:29:33 +08:00
c125148deb [opt](Nereids) bucket shuffle downgrade expansion (#34088)
Expand bucket shuffle downgrade condition, which originally requiring a single partition after pruning, basic table and bucket number < para number. Currently, we expect this option can be used for disabling bucket shuffle more efficiently, without above restrictions.

Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
2024-04-27 02:29:33 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
acc2b532e7 [Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026) 2024-04-26 15:05:47 +08:00
b24ff9953d [fix](Nereids) column pruning should prune map in cte consumer (#34079)
we save bi-map in cte consumer to get the maping between producer and consumer.
the consumer's output is decided by the map in it.
so, cte consumer should be output prunable, and should remove useless entry from map when do column pruning
2024-04-26 15:05:19 +08:00
b41a5339d3 [Fix](nereids) fix rule merge_aggregate when has project (#33892) 2024-04-26 15:05:09 +08:00
50f9d47e96 [test](hive) run suite cases both in hive2 and hive3 (#33874) (#34156)
bp #33874

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:48:09 +08:00
55d5ed9ab6 [test](streamload) add load empty file regression test (#34110) 2024-04-26 07:42:09 +08:00
75644392f4 [fix](Nereids) support aggregate function only in having statement (#34086)
SQL like

> SELECT 1 AS c1 FROM t HAVING count(1) > 0 OR c1 IS NOT NULL
2024-04-26 07:41:45 +08:00
2b4f4ca796 [Fix](nereids) fix cases unstable of hint (#34101)
fix cases unstable of hint, remove unused cases and project nodes and use string contains in order to avoid unstable problem.
2024-04-26 07:41:30 +08:00
9083bf7e14 revert "[Improvementation](join) empty_block shall be set true when build blo… (#33977)"
This reverts commit e3ed861e4b6a602ea874b6501998578952291f38.
2024-04-25 23:33:11 +08:00
Pxl
e3ed861e4b [Improvementation](join) empty_block shall be set true when build blo… (#33977)
empty_block shall be set true when build block only one row
2024-04-25 15:07:56 +08:00
987f755206 [Fix](nereids) fix rule SimplifyWindowExpression (#34099)
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-04-25 15:07:09 +08:00
47b54d4bd5 Fix remote scan pool (#33976) 2024-04-25 15:04:43 +08:00
450f443413 [fix](decommission) fix cann't decommission mtmv (#33823) 2024-04-25 12:01:44 +08:00
a15a8e119f [fix](mtmv) Fix exception when create materialized view with cte (#33988)
Fix exception when create materialized view with cte, after this fix, can create materialized view with following
```
        CREATE MATERIALIZED VIEW mv_with_cte
            BUILD IMMEDIATE REFRESH AUTO ON MANUAL
            DISTRIBUTED BY RANDOM BUCKETS 2
            PROPERTIES ('replication_num' = '1')
            AS
            with `test_with` AS (
            select l_partkey, l_suppkey
            from lineitem
            union
            select
              ps_partkey, ps_suppkey
            from
            partsupp)
            select * from test_with;
```

this is brought from https://github.com/apache/doris/pull/28144
2024-04-25 12:01:44 +08:00
0faae45537 [opt](nereids)project sub expression in other condition for nested loop join (#32697)
1. project sub expression in other condition for nested loop join
2. fix a bug in ut framework which may gennerate duplicated ExprId
2024-04-25 12:01:44 +08:00
ef73533e27 [Feat](nereids) add transform rule SimplifyWindowExpression (#33647)
rewrite func(para) over (partition by unique_keys)
1. func() is count(non-null) or rank/dense_rank/row_number -> 1
2. func(para) is min/max/sum/avg/first_value/last_value -> para
 e.g
select max(c1) over(partition by pk) from t1;
-> select c1 from t1;
2024-04-25 12:01:44 +08:00
800bb3d4ba [Feat](nereids) add expression rewrite rule LikeToEqualRewrite (#33803)
like expressions without fuzzy matching are rewritten into equivalent expressions
2024-04-25 12:01:44 +08:00
2f996a574f [Feat](nereids) nereids add alter view (#33970)
nereids support alter view stmt.
e.g. ALTER VIEW example_db.example_view
(
c1 COMMENT "column 1",
c2 COMMENT "column 2",
c3 COMMENT "column 3"
)
AS SELECT k1, k2, SUM(v1) FROM example_table
GROUP BY k1, k2
2024-04-25 12:01:44 +08:00
080c07ad87 [bug](random distribution) fix data loss and incorrect in random distribution table #33962 2024-04-24 17:13:50 +08:00
8d98c71079 [FIX]fix cidr func with const param (#33968) 2024-04-24 17:13:50 +08:00
2f60dcf890 [test](hll) fix unstable case without order by clause (#33947) 2024-04-24 17:13:50 +08:00
6531e4c540 [improve](regression test)Add test for time series compact empty rowset (#29509) 2024-04-24 17:13:49 +08:00
Pxl
5a5063be20 [bug](fix) heap use after free when json parse failed (#33955) 2024-04-22 22:33:24 +08:00
299d069da9 Fix alter policy failed (#33910) 2024-04-22 22:33:24 +08:00
98e90dd47e [fix](auth)fix missing authentication (#33347) (#33956)
bp #33347

Co-authored-by: zhangdong <493738387@qq.com>
2024-04-22 13:52:36 +08:00
8096753367 [improvement](mtmv) Support union rewrite when the materialized view is not enough to provide all the data for the query (#33800)
When the materialized view is not enough to provide all the data for the query, if the materialized view is increment update by partition. we can union materialized view and origin query to reponse the query.

this depends on https://github.com/apache/doris/pull/33362

such as materialized view def is as following:

>         CREATE MATERIALIZED VIEW mv_10086
>         BUILD IMMEDIATE REFRESH AUTO ON MANUAL
>         partition by(l_shipdate)
>         DISTRIBUTED BY RANDOM BUCKETS 2
>         PROPERTIES ('replication_num' = '1') 
>         AS 
>     select l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total
>     from lineitem
>     left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
>     group by
>     l_shipdate,
>     o_orderdate,
>     l_partkey,
>     l_suppkey;

the materialized view data is as following:
+------------+-------------+-----------+-----------+-----------+
| l_shipdate | o_orderdate | l_partkey | l_suppkey | sum_total |
+------------+-------------+-----------+-----------+-----------+
| 2023-10-18 | 2023-10-18  |         2 |         3 |    109.20 |
| 2023-10-17 | 2023-10-17  |         2 |         3 |     99.50 |
| 2023-10-19 | 2023-10-19  |         2 |         3 |     99.50 |
+------------+-------------+-----------+-----------+-----------+

when we insert data to partition `2023-10-17`,  if we run query as following
```
    select l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total
    from lineitem
    left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
    group by
    l_shipdate,
    o_orderdate,
    l_partkey,
    l_suppkey;
```
query rewrite by materialzied view will fail with message   `Check partition query used validation fail`
if we turn on the switch `SET enable_materialized_view_union_rewrite = true;` default true
we run the query above again, it will success and will use union all  materialized view and origin query to response the query correctly. the plan is as following:


```
| Explain String(Nereids Planner)                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                                                                                                    |
|   OUTPUT EXPRS:                                                                                                                                                                                    |
|     l_shipdate[#52]                                                                                                                                                                                |
|     o_orderdate[#53]                                                                                                                                                                               |
|     l_partkey[#54]                                                                                                                                                                                 |
|     l_suppkey[#55]                                                                                                                                                                                 |
|     sum_total[#56]                                                                                                                                                                                 |
|   PARTITION: UNPARTITIONED                                                                                                                                                                         |
|                                                                                                                                                                                                    |
|   HAS_COLO_PLAN_NODE: false                                                                                                                                                                        |
|                                                                                                                                                                                                    |
|   VRESULT SINK                                                                                                                                                                                     |
|      MYSQL_PROTOCAL                                                                                                                                                                                |
|                                                                                                                                                                                                    |
|   11:VEXCHANGE                                                                                                                                                                                     |
|      offset: 0                                                                                                                                                                                     |
|      distribute expr lists:                                                                                                                                                                        |
|                                                                                                                                                                                                    |
| PLAN FRAGMENT 1                                                                                                                                                                                    |
|                                                                                                                                                                                                    |
|   PARTITION: HASH_PARTITIONED: l_shipdate[#42], o_orderdate[#43], l_partkey[#44], l_suppkey[#45]                                                                                                   |
|                                                                                                                                                                                                    |
|   HAS_COLO_PLAN_NODE: false                                                                                                                                                                        |
|                                                                                                                                                                                                    |
|   STREAM DATA SINK                                                                                                                                                                                 |
|     EXCHANGE ID: 11                                                                                                                                                                                |
|     UNPARTITIONED                                                                                                                                                                                  |
|                                                                                                                                                                                                    |
|   10:VUNION(756)                                                                                                                                                                                   |
|   |                                                                                                                                                                                                |
|   |----9:VAGGREGATE (merge finalize)(753)                                                                                                                                                          |
|   |    |  output: sum(partial_sum(o_totalprice)[#46])[#51]                                                                                                                                         |
|   |    |  group by: l_shipdate[#42], o_orderdate[#43], l_partkey[#44], l_suppkey[#45]                                                                                                              |
|   |    |  cardinality=2                                                                                                                                                                            |
|   |    |  distribute expr lists: l_shipdate[#42], o_orderdate[#43], l_partkey[#44], l_suppkey[#45]                                                                                                 |
|   |    |                                                                                                                                                                                           |
|   |    8:VEXCHANGE                                                                                                                                                                                 |
|   |       offset: 0                                                                                                                                                                                |
|   |       distribute expr lists: l_shipdate[#42]                                                                                                                                                   |
|   |                                                                                                                                                                                                |
|   1:VEXCHANGE                                                                                                                                                                                      |
|      offset: 0                                                                                                                                                                                     |
|      distribute expr lists:                                                                                                                                                                        |
|                                                                                                                                                                                                    |
| PLAN FRAGMENT 2                                                                                                                                                                                    |
|                                                                                                                                                                                                    |
|   PARTITION: HASH_PARTITIONED: o_orderkey[#21], o_orderdate[#25]                                                                                                                                   |
|                                                                                                                                                                                                    |
|   HAS_COLO_PLAN_NODE: false                                                                                                                                                                        |
|                                                                                                                                                                                                    |
|   STREAM DATA SINK                                                                                                                                                                                 |
|     EXCHANGE ID: 08                                                                                                                                                                                |
|     HASH_PARTITIONED: l_shipdate[#42], o_orderdate[#43], l_partkey[#44], l_suppkey[#45]                                                                                                            |
|                                                                                                                                                                                                    |
|   7:VAGGREGATE (update serialize)(747)                                                                                                                                                             |
|   |  STREAMING                                                                                                                                                                                     |
|   |  output: partial_sum(o_totalprice[#41])[#46]                                                                                                                                                   |
|   |  group by: l_shipdate[#37], o_orderdate[#38], l_partkey[#39], l_suppkey[#40]                                                                                                                   |
|   |  cardinality=2                                                                                                                                                                                 |
|   |  distribute expr lists: l_shipdate[#37]                                                                                                                                                        |
|   |                                                                                                                                                                                                |
|   6:VHASH JOIN(741)                                                                                                                                                                                |
|   |  join op: RIGHT OUTER JOIN(PARTITIONED)[]                                                                                                                                                      |
|   |  equal join conjunct: (o_orderkey[#21] = l_orderkey[#5])                                                                                                                                       |
|   |  equal join conjunct: (o_orderdate[#25] = l_shipdate[#15])                                                                                                                                     |
|   |  runtime filters: RF000[min_max] <- l_orderkey[#5](2/2/2048), RF001[bloom] <- l_orderkey[#5](2/2/2048), RF002[min_max] <- l_shipdate[#15](1/1/2048), RF003[bloom] <- l_shipdate[#15](1/1/2048) |
|   |  cardinality=2                                                                                                                                                                                 |
|   |  vec output tuple id: 4                                                                                                                                                                        |
|   |  output tuple id: 4                                                                                                                                                                            |
|   |  vIntermediate tuple ids: 3                                                                                                                                                                    |
|   |  hash output slot ids: 6 7 24 25 15                                                                                                                                                            |
|   |  final projections: l_shipdate[#36], o_orderdate[#32], l_partkey[#34], l_suppkey[#35], o_totalprice[#31]                                                                                       |
|   |  final project output tuple id: 4                                                                                                                                                              |
|   |  distribute expr lists: o_orderkey[#21], o_orderdate[#25]                                                                                                                                      |
|   |  distribute expr lists: l_orderkey[#5], l_shipdate[#15]                                                                                                                                        |
|   |                                                                                                                                                                                                |
|   |----3:VEXCHANGE                                                                                                                                                                                 |
|   |       offset: 0                                                                                                                                                                                |
|   |       distribute expr lists: l_orderkey[#5]                                                                                                                                                    |
|   |                                                                                                                                                                                                |
|   5:VEXCHANGE                                                                                                                                                                                      |
|      offset: 0                                                                                                                                                                                     |
|      distribute expr lists:                                                                                                                                                                        |
|                                                                                                                                                                                                    |
| PLAN FRAGMENT 3                                                                                                                                                                                    |
|                                                                                                                                                                                                    |
|   PARTITION: RANDOM                                                                                                                                                                                |
|                                                                                                                                                                                                    |
|   HAS_COLO_PLAN_NODE: false                                                                                                                                                                        |
|                                                                                                                                                                                                    |
|   STREAM DATA SINK                                                                                                                                                                                 |
|     EXCHANGE ID: 05                                                                                                                                                                                |
|     HASH_PARTITIONED: o_orderkey[#21], o_orderdate[#25]                                                                                                                                            |
|                                                                                                                                                                                                    |
|   4:VOlapScanNode(722)                                                                                                                                                                             |
|      TABLE: union_db.orders(orders), PREAGGREGATION: ON                                                                                                                                            |
|      runtime filters: RF000[min_max] -> o_orderkey[#21], RF001[bloom] -> o_orderkey[#21], RF002[min_max] -> o_orderdate[#25], RF003[bloom] -> o_orderdate[#25]                                     |
|      partitions=3/3 (p_20231017,p_20231018,p_20231019), tablets=9/9, tabletList=161188,161190,161192 ...                                                                                           |
|      cardinality=3, avgRowSize=0.0, numNodes=1                                                                                                                                                     |
|      pushAggOp=NONE                                                                                                                                                                                |
|                                                                                                                                                                                                    |
| PLAN FRAGMENT 4                                                                                                                                                                                    |
|                                                                                                                                                                                                    |
|   PARTITION: HASH_PARTITIONED: l_orderkey[#5]                                                                                                                                                      |
|                                                                                                                                                                                                    |
|   HAS_COLO_PLAN_NODE: false                                                                                                                                                                        |
|                                                                                                                                                                                                    |
|   STREAM DATA SINK                                                                                                                                                                                 |
|     EXCHANGE ID: 03                                                                                                                                                                                |
|     HASH_PARTITIONED: l_orderkey[#5], l_shipdate[#15]                                                                                                                                              |
|                                                                                                                                                                                                    |
|   2:VOlapScanNode(729)                                                                                                                                                                             |
|      TABLE: union_db.lineitem(lineitem), PREAGGREGATION: ON                                                                                                                                        |
|      PREDICATES: (l_shipdate[#15] >= '2023-10-17') AND (l_shipdate[#15] < '2023-10-18')                                                                                                            |
|      partitions=1/3 (p_20231017), tablets=3/3, tabletList=161223,161225,161227                                                                                                                     |
|      cardinality=2, avgRowSize=0.0, numNodes=1                                                                                                                                                     |
|      pushAggOp=NONE                                                                                                                                                                                |
|                                                                                                                                                                                                    |
| PLAN FRAGMENT 5                                                                                                                                                                                    |
|                                                                                                                                                                                                    |
|   PARTITION: RANDOM                                                                                                                                                                                |
|                                                                                                                                                                                                    |
|   HAS_COLO_PLAN_NODE: false                                                                                                                                                                        |
|                                                                                                                                                                                                    |
|   STREAM DATA SINK                                                                                                                                                                                 |
|     EXCHANGE ID: 01                                                                                                                                                                                |
|     RANDOM                                                                                                                                                                                         |
|                                                                                                                                                                                                    |
|   0:VOlapScanNode(718)                                                                                                                                                                             |
|      TABLE: union_db.mv_10086(mv_10086), PREAGGREGATION: ON                                                                                                                                        |
|      partitions=2/3 (p_20231018_20231019,p_20231019_20231020), tablets=4/4, tabletList=161251,161253,161265 ...                                                                                    |
|      cardinality=2, avgRowSize=0.0, numNodes=1                                                                                                                                                     |
|      pushAggOp=NONE                                                                                                                                                                                |
|                                                                                                                                                                                                    |
| MaterializedView                                                                                                                                                                                   |
| MaterializedViewRewriteSuccessAndChose:                                                                                                                                                            |
|   Names: mv_10086                                                                                                                                                                                  |
| MaterializedViewRewriteSuccessButNotChose:                                                                                                                                                         |
|                                                                                                                                                                                                    |
| MaterializedViewRewriteFail:                                                                                                                                                                       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2024-04-21 13:22:26 +08:00
e6a6b82201 [nereids](mtmv) Support rewrite by mv nested materialized view (#33362)
Support query rewritting by nested materialized view.
Such as `inner_mv` def is as following

>             select
>             l_linenumber,
>             o_custkey,
>             o_orderkey,
>             o_orderstatus,
>             l_partkey,
>             l_suppkey,
>             l_orderkey
>             from lineitem
>             inner join orders on lineitem.l_orderkey = orders.o_orderkey;

the mv1_0 def is as following:

>             select
>             l_linenumber,
>             o_custkey,
>             o_orderkey,
>             o_orderstatus,
>             l_partkey,
>             l_suppkey,
>             l_orderkey,
>             ps_availqty
>             from inner_mv
>             inner join partsupp on l_partkey = ps_partkey AND l_suppkey = ps_suppkey;


for the following query, both inner_mv and mv1_0 can be successful when query rewritting by materialized view,and cbo will chose `mv1_0` finally.

>            select lineitem.l_linenumber
>             from lineitem
>             inner join orders on l_orderkey = o_orderkey
>             inner join partsupp on  l_partkey = ps_partkey AND l_suppkey = ps_suppkey
>             where o_orderstatus = 'o' AND l_linenumber in (1, 2, 3, 4, 5)
2024-04-21 09:55:34 +08:00
60253c827c [fix](nereids) do not push RF into nested cte (#33769) 2024-04-20 20:08:00 +08:00
36a70ba1e7 [Fix](Csv-Reader)Fix the issue of BE core dump caused by improper configuration of column_seperator and line_delimiter. (#33693) 2024-04-20 20:06:48 +08:00
03c3419265 [Refactor](executor)Add workload schedule policy table (#33729) 2024-04-20 20:06:34 +08:00
0e3ad5cd9d [fix](parquet) fix time zone error(isAdjustedToUTC=true) in parquet reader (#33675) (#33924)
bp (#33675)

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-20 19:06:54 +08:00
80307354b2 [fix](Nereids): add whole tree rewriter when root is not CTEAnchor (#33591) (#33906) 2024-04-20 01:05:57 +08:00
ee687a43fd [fix](plsql) Fix regression test for routine select (#33860)
fix #33608, more comprehensive test
2024-04-19 23:41:46 +08:00
Pxl
175e85d616 [Bug](runtime-filter) fix coredump on no null string type rf (#33869)
fix coredump on no null string type rf
2024-04-19 15:03:06 +08:00
659900040f [Fix](inverted index) fix wrong need read data opt when encounters columnA > columnB predicate (#33855) 2024-04-19 15:03:06 +08:00
6776a3ad1b [Fix](planner) fix create view star except and modify cast to sql (#33726) 2024-04-19 15:02:49 +08:00