Commit Graph

4679 Commits

Author SHA1 Message Date
f6540d52cb [regression-test](fix) fix schema_change_p2/test_schema_change.groovy case (#35470) 2024-05-28 13:14:27 +08:00
2f7280be7d [regression-test](fix) fix sql_block_rule_p0/test_sql_block_rule.groovy case bug (#35471) 2024-05-28 13:14:27 +08:00
dfcabf8d47 [fix](nereids) set mark join reference for bitmap-in-apply (#35435)
bitmap filter is implemented before mark-join. When support mark-join, we forgot to update the bitmap-filter branch.
when convert a bitmap-apply-in to join, we should set markjoinReference to the join if there are markJoinRefereneces
2024-05-28 13:13:41 +08:00
ac49576229 [Fix](nereids) fix merge aggregate setting top projection bug (#35348)
introduced by #31811

sql like this:

    select col1, col2 from  (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;

Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:

LogicalAggregate
+-- LogicalPrject
    +-- LogicalAggregate

After Transformation:

LogicalProject
+-- LogicalAggregate

Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.

Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.

Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
2024-05-28 13:13:31 +08:00
7c808fcecf [bugfix] Fix the case is unstable because Table[tbl_scalar_types_dup]'s state(ROLLUP) is not NORMAL (#35460) 2024-05-28 13:12:27 +08:00
3aab6b1d61 [chore](regression) add debug log for flaky case of test_stream_load_cast (#35441) 2024-05-28 13:12:15 +08:00
d8eefd0be8 [fix] fix wrong result of spill agg with limit (#35403) 2024-05-28 13:12:03 +08:00
238e218312 [fix](httpapi) restore compaction/run_status api can show be's overall compaction status and refactor code (#35409) 2024-05-28 09:43:43 +08:00
8ff95a00f3 [Fix](test) fix test case output for inverted_index_p0.test_tokenize (#35464) 2024-05-27 19:19:24 +08:00
a32db25070 [enhance](mtmv) allow add index for MTMV (#34225) (#35443)
Previously, the limitation on whether operations can be performed on materialized views was to determine `opType`.

Now, a `allowOpMTMV()` method is implemented through various `clauses`.

Because some operations have the same `opType`, but some operations allow and some do not.

For example, the `opType` for both `add column` and `create index` is `SCHEMA-CHANGE`, but `add column` is not allowed and `create index` is allowed.
2024-05-27 16:22:16 +08:00
d71e9d34fe [Bugfix] Fix mv column type is not changed when do schema change (#34598) 2024-05-27 15:28:12 +08:00
6d362c1061 [fix](hint) fix hint tests with different be instances (#35188)
Problem:
When using multiple be to test hint with distribute hint, the result would be unstable
Solved:
Add ordered hint to every distribute hint and move some leading hint cases to check containing of hint infomation
2024-05-27 15:27:05 +08:00
68eda58a8c [Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335)
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
Pxl
82ff29faea [Chore](materialized-view) forbid create mv on row store table (#35360)
forbid create mv on row store table
2024-05-27 15:25:16 +08:00
f98ed4e4c5 [bugfix](hive)Misspelling of class names (#34981) 2024-05-27 15:24:38 +08:00
b1795d44ec [bugfix](hive)fix testcase for test_hive_write_different_path (#35209)
Hive's test environment uses docker, so when using 127.0.0.1,
BE will write the file to the docker of its own machine.
But if FE and are not on the same machine,
FE cannot read this file because it can only read docker on its own machine. 
Therefore, the address 127.0.0.1 cannot be used in the test environment.
2024-05-27 15:24:30 +08:00
2422439e45 [Update](regression) add case for inverted index (#35305)
Co-authored-by: Kang <kxiao.tiger@gmail.com>
2024-05-27 15:24:09 +08:00
2e20e38523 [improvement](jdbc catalog) remove useless jdbc catalog code (#34986) (#35418) 2024-05-27 14:25:26 +08:00
e3b4d4e630 Reset workload_group_max_num for regression test (#35430) 2024-05-27 14:10:25 +08:00
af986c370b [feat](Nereids): Put the Child with Least Row Count in the First Position of Intersect (#34290) (#35339)
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.

The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.
2024-05-27 11:52:35 +08:00
83cbb4e255 fix cloud mode 2024-05-27 09:56:26 +08:00
6e17dc1e87 (cherry-pick)[branch-2.1] add calc tablet file crc and fix single compaction test #33076 #34915 (#35215)
* [fix](compaction test) show single replica compaction status and fix test (#33076)
* [improve](http action) add http interface to calculate the crc of all files in tablet (#34915)
2024-05-26 17:15:09 +08:00
a79b436b12 remove iscloud mode 2024-05-25 19:29:47 +08:00
fff6ab933c [fix](clean trash) Add clean trash regression case (#35330) 2024-05-25 17:47:51 +08:00
806b7d68e4 [regression-test](fix) runtime_filter.groovy case bug (#35368) 2024-05-25 17:47:29 +08:00
80ba873d84 [regression-test](fix) test_date_diff case bug (#35356) 2024-05-25 17:46:57 +08:00
62998719df [opt](mtmv) Add threshold for relation mapping num when query rewrite (#34694) (#35378)
if query and mv def is as following:

    def mv1_1 = """
        select  t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY
        from lineitem t1
        inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
    """
    def query1_1 = """
        select  t1.L_LINENUMBER, t2.L_ORDERKEY
        from lineitem t1
        inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
    """

this will generate relation mapping  by Cartesian, if the num of self join is too much, this will cause the performance problem
so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.
2024-05-24 20:36:29 +08:00
3eeb83ff11 [test](fix) Fix test check fail when test nested mv hit (#34293) (#35375)
pick from master commit id: d20b18f pr: #34293

if mv3 is def as following:
select c1, c2, c3 from t1;

mv4 is def as following:
select c1, c2 from mv3;

when query is
select c1, c2 from t1;

the mv3 and mv4 both can be rewritten successfully
2024-05-24 19:47:16 +08:00
639c7ee7fb [fix](decimalv2) fix scale of decimalv2 to string (#35222) (#35359)
* [fix](decimalv2) fix scale of decimalv2 to string
2024-05-24 17:20:43 +08:00
1e07971a98 [Feat](nereids)when dealing insert into stmt with empty table source, fe returns directly (#35333)
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)

When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.

* [Fix](nereids) fix when insert into select empty table

---------

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-24 16:25:00 +08:00
bfe293c725 [fix](nereids) AdjustNullable rule should handle union node with no children (#35074)
The output slot's nullable info is not correctly calculated in union node.
Because old code only get correct result if union node has children.
But the union node may have no children but only have constantExprList.
So in that case, we should calculate output's nullable info byboth children and constantExprList.
2024-05-24 16:23:58 +08:00
f6beeb1ddd [Enhencement](tvf) select tvf supports using resource (#35139)
Create an S3/HDFS resource that TVF can use it directly to access the data source.
2024-05-24 16:23:58 +08:00
d6e8fb7d77 [feature](mtmv) Support agg state roll up and optimize the roll up code (#35026)
agg_state is agg  intermediate state, detail see 
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state

this support agg function roll up as following
 
+---------------------+---------------------------------------------+---------------------+
| query               | materialized view                           | roll up             |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion()       | agg_funtion_unoin()  or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state()  | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+

for example which can be rewritten by mv sucessfully as following

MV defination is

```
            select
            o_orderstatus,
            l_partkey,
            l_suppkey,
            sum_union(sum_state(o_shippriority)),
            group_concat_union(group_concat_state(l_shipinstruct)),
            avg_union(avg_state(l_linenumber)),
            max_by_union(max_by_state(l_shipmode, l_suppkey)),
            count_union(count_state(l_orderkey)),
            multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
            from lineitem
            left join orders
            on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_partkey,
            l_suppkey;
```

Query is

```
            select
            o_orderstatus,
            l_suppkey,
            sum(o_shippriority),
            group_concat(l_shipinstruct),
            avg(l_linenumber),
            max_by(l_shipmode,l_suppkey),
            count(l_orderkey),
            multi_distinct_count(l_shipmode)
            from lineitem
            left join orders 
            on l_orderkey = o_orderkey and l_shipdate = o_orderdate
            group by
            o_orderstatus,
            l_suppkey;
```
2024-05-24 16:23:58 +08:00
c4776a48f2 [fix](regression-test) fix test_tvf_view_count_p2 regression test (#35216)
coused by: #34642

it must set verbose true
2024-05-24 16:23:58 +08:00
e6027ca9d7 [fix](p2-test) fix test_export_with_parallelism case (#35283) 2024-05-24 16:23:58 +08:00
bbf502dfcf [fix](create-table)The CREATE TABLE IF NOT EXISTS AS SELECT statement should refrain from performing any INSERT operations if the table already exists (#35210) 2024-05-24 16:23:58 +08:00
bd4dd94c24 [Fix](nereids) add checkBlockRules() check for create view and alter view (#34104) 2024-05-24 16:23:58 +08:00
0e2b7480b7 [fix](regression-test) line_delimiter parse error in regression_test test_tvf_based_broker_load (#35001) 2024-05-24 16:23:58 +08:00
e02dcecb0a [optimize](regression)Add retry for curl request (#35260)
Co-authored-by: Luennng <luennng@gmail.com>
2024-05-24 16:23:58 +08:00
07cd18962a [test](inverted index) nonConcurrent is added to the test case (#35259) 2024-05-24 16:23:58 +08:00
dd567fa774 [fix](function) support return JsonType for If function (#35199)
add a FunctionSignature for If to support return Type is JsonType.
2024-05-24 16:23:58 +08:00
98b2bda660 [opt](Nereids) remove restrict for count(*) in window (#35220)
support count(*) used for window function

CREATE TABLE `t1` (
  `id` INT NULL,
  `dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

select *, count(*) over() from t1;
2024-05-24 16:23:58 +08:00
8c594c6959 [Fix](regression) fix show data regression case (#35218) 2024-05-24 16:23:58 +08:00
473e14ca82 [chore](backup) log backup/restore job during replay (#35234) 2024-05-24 16:23:57 +08:00
4008dc03cf [Fix](regression) fix test_user_var.groovy by add set disable_nereids_rules=PRUNE_EMPTY_PARTITION (#35151) 2024-05-23 19:06:38 +08:00
b3f6668464 fix case: test_create_table_without_distribution 2024-05-23 19:03:30 +08:00
bf37e5c905 [feature](Nereids) support select distinct with aggregate (#35300)
(cherry picked from commit adcbc8cce57aaec507174f39536a028db803a2e5)
2024-05-23 19:01:10 +08:00
4075408b84 [feature](mtmv)Support single table mv rewrite (#34185) (#35242)
Support Single table  query rewrite with out group by
this is useful for complex filter or expresission

the mv def and query is as following
which can be query rewritten

mv def:
```
          select *
            from lineitem where l_comment like '%xx%'
```

query:
```
            select l_linenumber, l_receiptdate
            from lineitem where l_comment like '%xx%'
```

Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>
2024-05-23 19:00:36 +08:00
adc364a6fd [feature](Paimon) support deletion vector for Paimon naive reader (#34743) (#35241)
bp #34743
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-05-23 00:01:30 +08:00
24990383ff [refactor](jdbc catalog) split clickhouse jdbc executor (#34794) (#35174)
pick master #34794
2024-05-22 19:09:05 +08:00