Commit Graph

3009 Commits

Author SHA1 Message Date
fcc26cc671 [test](migrate) move some cases from p2 to p0 (#36750)(#36787) (#36922)
bp #36750 and #36787
2024-06-27 20:59:50 +08:00
5c1eef5f06 [feature](tvf) support max_filter_ratio (#35431) (#36911)
bp #35431

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-06-27 20:58:53 +08:00
a05d5cc75e [refactor](variant) refactor sub path push down on variant type (#36478) (#36923)
pick from master #36478

intro a new rule VARIANT_SUB_PATH_PRUNING to prune variant sub path.

for example, variant slot v in table t has two sub path: 'c1' and 'c2',
after this rule, select v['c1'] from t will only scan one sub path 'c1'
of v to reduce scan time.

This rule accomplishes all the work using two components. The Collector
traverses from the top down, collecting all the element_at functions on
the variant types, and recording the required path from the original
variant slot to the current element_at. The Replacer traverses from the
bottom up, generating the slots for the required sub path on scan,
union, and cte consumer. Then, it replaces the element_at with the
corresponding slot.
2024-06-27 17:48:43 +08:00
22cb7b8fcb [improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222)
cherry pick from #28082
2024-06-27 13:45:21 +08:00
a8e9c89dc6 [Fix](nereids) fix NormalizeAgg, change the upper project projections rewrite logic (#36161) (#36622)
cherry-pick #36161 to branch-2.1

NormalizeAggregate rewrite logic has a bug, for sql like this:

SELECT
	CASE
		1 WHEN CAST( NULL AS SIGNED ) THEN NULL
		WHEN COUNT( DISTINCT CAST( NULL AS SIGNED ) ) THEN NULL
		ELSE null
	END ;

This is the plan after NormalizeAggregate, the LogicalAggregate only
output `count(DISTINCT cast(NULL as SIGNED))`#3, do not output cast(NULL
as SIGNED)#2, but the upper project use cast(NULL as SIGNED)#2, so Doris
report error "cast(NULL as SIGNED) not in aggregate's output".

LogicalResultSink[29] ( outputExprs=[__case_when_0#1] ) +--LogicalProject[26] ( distinct=false, projects=[CASE WHEN (1 = cast(NULL as SIGNED)#2) THEN NULL WHEN (1 = count(DISTINCT cast(NULL as SIGNED))#3) THEN NULL ELSE NULL END AS `CASE WHEN (1 = cast(NULL as SIGNED)) THEN NULL WHEN (1 = count(DISTINCT cast(NULL as SIGNED))) THEN NULL ELSE NULL END`#1], excepts=[] )
   +--LogicalAggregate[25] ( groupByExpr=[], outputExpr=[count(DISTINCT cast(NULL as SIGNED)#2) AS `count(DISTINCT cast(NULL as SIGNED))`#3], hasRepeat=false )
      +--LogicalProject[24] ( distinct=false, projects=[cast(NULL as SIGNED) AS `cast(NULL as SIGNED)`#2], excepts=[] )
         +--LogicalOneRowRelation ( projects=[0 AS `0`#0] )

The problem is that the cast(NULL as SIGNED)#2 should not outputted by
LogicalAggregate, cast(NULL as SIGNED) should be computed in
LogicalProject.
This pr change the upper project projections rewrite logic:
aggregateOutputs is rewritten and become the upper-level LogicalProject
projections. During the rewriting process, the expressions inside the
agg function can be rewritten with expressions in aggregate function
arguments and group by expressions, but the ones outside the agg
function can only be rewritten with group by expressions.

---------

Co-authored-by: moailing <moailing@selectdb.com>
2024-06-27 12:17:18 +08:00
a6a84b8ecc [improvement](stream load)(cherry-pick) support hll_from_base64 for stream load column mapping (#36819)
picked from https://github.com/apache/doris/pull/35923
2024-06-26 20:12:40 +08:00
25fb30c723 [fix](intersect) fix coredump caused by intersect of nullable and not nullable children #36401 (#36441)
## Proposed changes

Pick #36765
2024-06-26 17:45:21 +08:00
6ec9a731e8 [branch-2.1](cherry-pick) partial update should not read old fileds from rows with delete sign (#36210) (#36755)
cherry-pick #36210
2024-06-24 21:13:24 +08:00
90a4dd09f3 [Fix](func) CoreDump and Result Error in percentile function (#36647)
cherry pick #36643
2024-06-21 23:42:45 +08:00
c939781411 [Pick 2.1](inverted index) fix wrong no need read data when need_remaining_after_evaluate (#36684)
When using an equal predicate on a column that applies an inverted index
with a parser, it requires remaining_after_evaluate. In this situation,
we cannot optimize the column without reading the data.

## Proposed changes

From (#36637)
2024-06-21 22:01:39 +08:00
0cff539810 [feature](function) support new function replace_empty (#36283) (#36656)
#36283
2024-06-21 16:46:22 +08:00
8105dc7de8 [Pick 2.1](inverted index) fix wrong opt for pk no need read data (#36634)
## Proposed changes
 
Pick from #36618
2024-06-21 00:57:23 +08:00
ac0f6e75d2 [bugfix](iceberg)Read error when timestamp does not have time zone for 2.1 (#36435)
bp: #36141
2024-06-20 18:32:31 +08:00
fbcf63e1f5 [cherry-pick] (branch-2.1)fix variant index (#36577)
pick from master #36163
2024-06-20 17:57:26 +08:00
bd47d5a681 [branch-2.1](auto-partition) Fix auto partition load failure in multi replica (#36586)
this pr
1. picked #35630, which was reverted #36098 before.
2. picked #36344 from master

these two pr fixed existing bug about auto partition load.

---------

Co-authored-by: Kaijie Chen <ckj@apache.org>
2024-06-20 17:51:18 +08:00
cbaff8a700 [fix](nereids)change the decimal's precision and scale for cast(xx as decimal) (#36540)
pick from master #36316

expression cast( xx as decimal )'s datatype maybe decimalv3 or decimalv2
depending on enable_decimal_conversion value in fe conf file. if
enable_decimal_conversion is true, the datatype is decimalv3(9, 0), but
the datatype was decimalv3(38, 9) in 2.0 releases. So this pr change the
datatype same as 2.0 releases to keep the behavior consistent.
2024-06-20 17:46:11 +08:00
c5bb0e3a21 [bug](prepared statement) fix prepared statement throw exception when inserting null value (#36484)
## Proposed changes

bp #36426

<!--Describe your changes.-->
2024-06-20 11:31:59 +08:00
dabd27edd2 [opt](inverted index) performance optimization for need_read_data in compound #35346 #36292 (#36404)
pick from master
https://github.com/apache/doris/pull/35346
https://github.com/apache/doris/pull/36292
2024-06-20 08:43:16 +08:00
5b7d93df5e [Pick](Variant) pick 2 PRs to correct tmp column name to go fast execute #36277 #36313 (#36527) 2024-06-19 19:07:47 +08:00
349b943e12 [opt](Nereids) Optimize Join Penalty Calculation Based on Build Side Data Volume (#36107)
pick from master #35773

This PR introduces an optimization that adjusts the penalty applied
during join operations based on the volume of data on the build side.
Specifically, when the number of rows and width of the tables being
joined are equal, the materialization costs are now considered more
accurately. The update ensures that joins with a larger dataset on the
build side incur a higher penalty, improving overall query performance
and resource allocation.
2024-06-19 14:49:09 +08:00
1e54a5a66e [Fix](Nereids) fix leading with brace can not generate correct plan (#36328)
cherry-pick #36193

Problem:
when using leading like:
leading(t1 {t2 t3} {t4 t5} t6)
it would not generate correct plan because levellist can not express
enough message of braces
Solved:
remove levellist express of leading levels and use reverse polish
expression
Algorithm:
leading(t1 {t2 t3} {t4 t5} t6)
==>
stack top to down(t1 t2 t3 join join t4 t5 join t6 join) when generate
leading join, we can pop items in stack, when it's a table, make
logicalscan when it's a join
operator, make logical join and push back to stack
2024-06-19 14:47:55 +08:00
38d750a7e0 [Fix](Row Store) all filter should match key columns condition (#36400) (#36443)
Queries like `select * from tbl` will pass
`LogicalResultSinkToShortCircuitPointQuery` rule in the previous.
Introduced by #35823
2024-06-19 14:06:53 +08:00
bdba954e1f [Fix](nereids)make agg output unchanged after normalized repeat (#36367)
cherry-pick #36207 to branch-2.1

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-06-19 12:23:56 +08:00
da0138a412 [Pick 2.1](segment iterator) fix shrink non-char column coredump #36275 (#36468) 2024-06-18 21:59:15 +08:00
e2350403a6 [fix](plan) fix wrong result for random distributed agg table with all keys not null (#36271) 2024-06-18 11:25:31 +08:00
4a117800ca [Bug](Function) fix json contains with empty value (#36320) (#36418) 2024-06-18 10:20:45 +08:00
e68834158c [fix](inverted index)Support Chinese column name with inverted index #36321 (#36374)
1. `std::string` to `std::wstring` conversion only supports ASCII
characters. For non-ASCII characters, we need to use
`StringUtil::string_to_wstring`
2. Fix index_tool check_terms_stats_v2 and add field info to print

pick from master #36321
2024-06-17 19:41:09 +08:00
4008a04da7 [bugfix](paimon)Fix field case issues for 2.1 (#36288)
bp:  #36239
2024-06-17 18:38:00 +08:00
845dcce7f0 Revert "[opt](inverted index) performance optimization for need_read_data in …" (#36260)
Reverts apache/doris#36192
2024-06-13 21:31:20 +08:00
d8eac07178 [branch-2.1](test) fix external p0 unstable test (#36262)
Fix some unstable external p0 tests
2024-06-13 20:55:41 +08:00
226775f059 [Feature](Point Query) fully support in nereids #35823 (#36205) 2024-06-13 08:37:31 +08:00
f1e83f5656 [opt](inverted index) performance optimization for need_read_data in compound #35346 (#36192) 2024-06-12 20:02:00 +08:00
9708ca8fcb [Feature](Prepared Statment) Implement in nereids planner (#35318) (#36172) 2024-06-12 19:54:17 +08:00
acbfcf7ad9 [fix](Nereids) fix four phase aggregation compute wrong result (#36131)
cherry pick from #36128
2024-06-11 20:40:18 +08:00
3b23eee37c Revert "[fix](auto-partition) fix auto partition load lost data in multi sender (#35287)" (#36098)
Reverts apache/doris#35630 because it brought some more damaging bugs.
we will fix it and merge in next version
2024-06-11 17:11:42 +08:00
0dccc4e6e4 [cherry-pick](branch-2.1)fix http error when downloading varaint inverted index file #35668 (#36061)
pick from master[#35668](https://github.com/apache/doris/pull/35668)
2024-06-11 14:09:05 +08:00
4a277affdc [fix](scan) In-predicate should not be pushed down for non-key column(#35913) (#35968)
pick #35913
2024-06-11 11:13:34 +08:00
1916891725 [fix](regression): fix nereids_hint_tpcds_p0 query64 shape (#35906)
only for 2.1
2024-06-09 14:20:34 +08:00
9e972cb0b9 [bugfix](iceberg)Fix the datafile path error issue for 2.1 (#36066)
bp: #35957
2024-06-08 21:51:46 +08:00
075481faf1 [opt](Nereids) use date signature for date arithmetic as far as possible (#36060)
pick from master #35863
2024-06-08 09:05:34 +08:00
16fcdcd4b7 [fix](Nereids) not do distinct when aggregate with distinct project (#36057)
pick from master #35899
2024-06-08 09:04:56 +08:00
bd6b913e00 [bugfix](paimon)paimon's field length judgment error for 2.1 (#36049)
bp #35981
2024-06-07 21:13:08 +08:00
67f4d88988 [enhancement](Nereids) support 4 phases distinct aggregate with full distribution (#36016)
cherry pick from #35871
2024-06-07 21:08:33 +08:00
9f3fe3e57c [fix](DDL) not set table type as default comment when create table (#36025)
pick from master #35855
2024-06-07 15:29:10 +08:00
c794ea18c8 [fix](multi-catalog)put java udf to custom lib (#35984)
bp #34990
2024-06-06 22:54:24 +08:00
9efc7b63ec [fix](mtmv)Mtmv support row column (#35860) (#35956)
pick from master: #35860
2024-06-06 22:53:08 +08:00
5966354165 [FIX](cases)fix cases for test_ip_in_inverted_index (#35971)
bp #35881
2024-06-06 21:52:53 +08:00
b5a35b9cef [FIX] Pick array inverted index bugfix (#35837)
here with some array with inverted index bugfix:
see also: 
https://github.com/apache/doris/pull/34766
https://github.com/apache/doris/pull/35086
https://github.com/apache/doris/pull/34683
https://github.com/apache/doris/pull/34076
2024-06-06 09:54:14 +08:00
4b5163c905 [Feat](nereids) add transform rule MergePercentileToArray (#35809)
cherry-pick #34313 to branch-2.1

MergePercentileToArray is to perform a transformation in this case:
select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3)
from store_sales group by ss_item_sk;
==>
select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
2024-06-04 17:50:36 +08:00
c23ab25474 [fix](nereids)keep equal predicate as join conjunct even if it can be fold to null literal (#35842)
pick from master https://github.com/apache/doris/pull/35811

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-06-04 14:46:58 +08:00