Commit Graph

18773 Commits

Author SHA1 Message Date
3eeb83ff11 [test](fix) Fix test check fail when test nested mv hit (#34293) (#35375)
pick from master commit id: d20b18f pr: #34293

if mv3 is def as following:
select c1, c2, c3 from t1;

mv4 is def as following:
select c1, c2 from mv3;

when query is
select c1, c2 from t1;

the mv3 and mv4 both can be rewritten successfully
2024-05-24 19:47:16 +08:00
cf84998711 Revert "[fix](broker load) Make Config.enable_pipeline_load works as expected for BrokerLoad (#35105)"
This reverts commit e8fb47bec1a1cfc7b07a6ed4eb36283407a4a9fe.
2024-05-24 19:28:34 +08:00
c4b2ddd688 [Fix](Variant) clear block after a flush complete (#35226) (#35372)
Otherwise result in crash

```
*** SIGSEGV address not mapped to object (@0x0) received by PID 4149909 (TID 4152328 OR 0x7efefc60d700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F031AD0E090 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::vectorized::MutableBlock::merge_impl<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:586
 5# doris::Status doris::vectorized::MutableBlock::merge<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:521
```
2024-05-24 19:10:07 +08:00
41f29cf4cd [fix](decompress)(review) context leaked in failure path (#33622) (#35364)
* [fix](decompress)(review) context leaked in failure path

* [fix](decompress)(review) context leaked in failure path review fix

Co-authored-by: Vallish Pai <vallishpai@gmail.com>
2024-05-24 17:40:13 +08:00
88e2753e40 [fix](Nereids) fix ShowProcedureStatusCommand sendResultSet (#35355) 2024-05-24 17:22:07 +08:00
639c7ee7fb [fix](decimalv2) fix scale of decimalv2 to string (#35222) (#35359)
* [fix](decimalv2) fix scale of decimalv2 to string
2024-05-24 17:20:43 +08:00
ca86ee7b15 [fix](load) fix wrong assert and cancel load error (#35362) 2024-05-24 17:11:01 +08:00
1e07971a98 [Feat](nereids)when dealing insert into stmt with empty table source, fe returns directly (#35333)
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)

When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.

* [Fix](nereids) fix when insert into select empty table

---------

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-24 16:25:00 +08:00
bfe293c725 [fix](nereids) AdjustNullable rule should handle union node with no children (#35074)
The output slot's nullable info is not correctly calculated in union node.
Because old code only get correct result if union node has children.
But the union node may have no children but only have constantExprList.
So in that case, we should calculate output's nullable info byboth children and constantExprList.
2024-05-24 16:23:58 +08:00
f6beeb1ddd [Enhencement](tvf) select tvf supports using resource (#35139)
Create an S3/HDFS resource that TVF can use it directly to access the data source.
2024-05-24 16:23:58 +08:00
d6e8fb7d77 [feature](mtmv) Support agg state roll up and optimize the roll up code (#35026)
agg_state is agg  intermediate state, detail see 
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state

this support agg function roll up as following
 
+---------------------+---------------------------------------------+---------------------+
| query               | materialized view                           | roll up             |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion()       | agg_funtion_unoin()  or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+

for example which can be rewritten by mv sucessfully as following

MV defination is

```
            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;
```

Query is

```
            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;
```
2024-05-24 16:23:58 +08:00
4b91ad003f [opt](memory) avoid allocate memory in agg operator constructor (#35301)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-24 16:23:58 +08:00
c4776a48f2 [fix](regression-test) fix test_tvf_view_count_p2 regression test (#35216)
coused by: #34642

it must set verbose true
2024-05-24 16:23:58 +08:00
e6027ca9d7 [fix](p2-test) fix test_export_with_parallelism case (#35283) 2024-05-24 16:23:58 +08:00
bbf502dfcf [fix](create-table)The CREATE TABLE IF NOT EXISTS AS SELECT statement should refrain from performing any INSERT operations if the table already exists (#35210) 2024-05-24 16:23:58 +08:00
708b5b548c [fix](ui): fix data preview error (#34521) 2024-05-24 16:23:58 +08:00
bd4dd94c24 [Fix](nereids) add checkBlockRules() check for create view and alter view (#34104) 2024-05-24 16:23:58 +08:00
d85ea83b73 [test](case) Remove sensitive information in k8s deploy test (#35185)
Remove sensitive information from the k8s deployment test, otherwise the code base security check fails.
2024-05-24 16:23:58 +08:00
0e2b7480b7 [fix](regression-test) line_delimiter parse error in regression_test test_tvf_based_broker_load (#35001) 2024-05-24 16:23:58 +08:00
309503855e [Fix](bloom filter) Fix bloom filter memory leak (#34871)
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
2024-05-24 16:23:58 +08:00
e02dcecb0a [optimize](regression)Add retry for curl request (#35260)
Co-authored-by: Luennng <luennng@gmail.com>
2024-05-24 16:23:58 +08:00
07cd18962a [test](inverted index) nonConcurrent is added to the test case (#35259) 2024-05-24 16:23:58 +08:00
78fab91d6b [fix](overflow) show backends overflow for backend ids (#35245) 2024-05-24 16:23:58 +08:00
dd567fa774 [fix](function) support return JsonType for If function (#35199)
add a FunctionSignature for If to support return Type is JsonType.
2024-05-24 16:23:58 +08:00
9427942245 [opt](thrift)update thrift to support pushing limit to local Agg (#35204)
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
2024-05-24 16:23:58 +08:00
682d72bf4d [fix](noexcept) Remove incorrect noexcept #35230 2024-05-24 16:23:58 +08:00
98b2bda660 [opt](Nereids) remove restrict for count(*) in window (#35220)
support count(*) used for window function

CREATE TABLE `t1` (
  `id` INT NULL,
  `dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

select *, count(*) over() from t1;
2024-05-24 16:23:58 +08:00
8c594c6959 [Fix](regression) fix show data regression case (#35218) 2024-05-24 16:23:58 +08:00
473e14ca82 [chore](backup) log backup/restore job during replay (#35234) 2024-05-24 16:23:57 +08:00
edb276ad92 [fix](typo)fix show backend typo (#35198) 2024-05-24 16:23:57 +08:00
cf46ebe31d [improve](jdbc catalog) Remove all property checks during create (#35194) (#35354) 2024-05-24 16:12:02 +08:00
f062506b22 [fix](nereids)the preagg state for count(*) is wrong (#35326) 2024-05-24 15:23:04 +08:00
0b90e37227 [fix](Nereids) string literal coercion of in predicate (#35337)
pick from master #35200

Description:
   The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。
```
mysql> set enable_nereids_planner = false;
Query OK, 0 rows affected (0.03 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+---------------+
| id | sum(`clicks`) |
+------------+---------------+
|  787934713 |          2838 |
|  306960695 |           339 |
+------------+---------------+
2 rows in set (1.81 sec)

mysql> set enable_nereids_planner = true;
Query OK, 0 rows affected (0.02 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+-------------+
| id | sum(clicks) |
+------------+-------------+
|  787934713 |        2838 |
|  306960695 |         339 |
+------------+-------------+
2 rows in set (28.14 sec)
```

Reason:
In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。

Solved:
do process string literal with numeric in `in predicate` like in `comparison predicate`;
test table:
```
create table a_table(
    k1 BIGINT NOT NULL,
    k2 VARCHAR(100) NOT NULL,
    v1 INT SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(k1,k2)
distributed BY hash(k1) buckets 2
properties("replication_num" = "1");
insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
explain plan select * from a_table where k1 in ('10', '20001');
```
before optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 1ms) ==========                                                                                        |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 2ms) ==========                                                                                      |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                      |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 6ms) ==========                                                                                     |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                         |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 6ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather )                                                  |
|    +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                       |
|       +--PhysicalOlapScan[a_table]@0 ( stats=1 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
after optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 15ms) ==========                                                                                       |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 11ms) ==========                                                                                     |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) )                                                                        |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 12ms) ==========                                                                                    |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) )                                                                           |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 4ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather )                                                     |
|    +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) )                                                            |
|       +--PhysicalOlapScan[a_table]@0 ( stats=2 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
2024-05-24 14:26:52 +08:00
bb3a0fd30e [fix](nereids)should use nereids expr's nullable info when call Expr's toThrift method (#35274) 2024-05-24 02:24:40 +08:00
9277480f00 [fix](nereids)days_diff should match datetimev2 function sigature in higher priority (#35295) 2024-05-24 02:21:55 +08:00
4b7608c2bf [fix](inverted index)Change index_id from int32 to int64 to avoid overflow (#35206)
Co-authored-by: Luennng <luennng@gmail.com>
2024-05-23 19:12:55 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
a52ee6e9b9 [opt](mtmv) generate bi-map between base table and materialized view partitions (#35131) 2024-05-23 19:11:33 +08:00
9ba995317a [fix](routineload) fix data source properties do not persist in edit log (#35137) 2024-05-23 19:09:41 +08:00
4008dc03cf [Fix](regression) fix test_user_var.groovy by add set disable_nereids_rules=PRUNE_EMPTY_PARTITION (#35151) 2024-05-23 19:06:38 +08:00
eb49cd839b [refactor](datalake) return the error status instead of static_cast<void> (#34873)
Followup #34797
`static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>`  with `RETURN_IF_ERROR`.

The following three scenarios need to be handled separately and cannot be simply replaced:
1. The outer function returns void;
2. Call status function inner constructors or destructors;
3. Call status function with best effort, and should ignore the wrong status.
2024-05-23 19:06:21 +08:00
b3f6668464 fix case: test_create_table_without_distribution 2024-05-23 19:03:30 +08:00
bf37e5c905 [feature](Nereids) support select distinct with aggregate (#35300)
(cherry picked from commit adcbc8cce57aaec507174f39536a028db803a2e5)
2024-05-23 19:01:10 +08:00
4075408b84 [feature](mtmv)Support single table mv rewrite (#34185) (#35242)
Support Single table  query rewrite with out group by
this is useful for complex filter or expresission

the mv def and query is as following
which can be query rewritten

mv def:
```
          select *
            from lineitem where l_comment like '%xx%'
```

query:
```
            select l_linenumber, l_receiptdate
            from lineitem where l_comment like '%xx%'
```

Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>
2024-05-23 19:00:36 +08:00
82887cc2b3 [improvement](mtmv)Split expression get cherry pick21 (#35240)
* [improvement](mtmv) Split the expression mapping in LogicalCompatibilityContext for performance (#34646)

Need query to view expression mapping when check the logic of hyper graph is equals or not.
Getting all expression mapping one-time may affect performance. So split the expresson to three type
JOIN_EDGE, NODE, FILTER_EDGE and get them step by step.

* fix code style
2024-05-23 18:59:56 +08:00
ed464ac24c [branch-2.1] remove dlf dependencies (#35292)
followup #35241
In #35241, we update the doris-shade version to 2.1.0, which already contains dlf dependencies.

pick part of #34749, to remove dlf dependencies in fe/pom.xml
2024-05-23 17:48:04 +08:00
0d2ab9d5c3 [fix](clean trash) Fix clean trash lost submit task (#35271) 2024-05-23 16:27:20 +08:00
acf741fa80 [feature](binlog) Support gc binlogs by history nums and size (#35250)
* [chore](binlog) Add logs about binlog gc (#34359)

* [feature](binlog) Support gc binlogs by history nums and size (#34888)
2024-05-23 14:39:57 +08:00
0b440685d9 [fix](nereids): fix PlanPostProcessor use visitor (#35244)
(cherry picked from commit 46e004a358b9e13adb492d376f77e4317e558a6a)
2024-05-23 14:12:25 +08:00
Pxl
e962a7309b [Chore](runtime-filter) adjust some check and error msg on runtime filter (#35018) (#35251)
adjust some check and error msg on runtime filter
2024-05-23 11:20:02 +08:00