Commit Graph

3252 Commits

Author SHA1 Message Date
424ad2384a [opt](nereids) refine left semi/anti cost under short-cut opt (#39636)
## Proposed changes

pick from https://github.com/apache/doris/pull/37951

---------

Co-authored-by: xiongzhongjian <xiongzhongjian@selectdb.com>
2024-08-23 17:26:56 +08:00
c40246efa9 [bugfix](iceberg)Fixed random core with writing iceberg partitioned table for 2.1 (#39808)(#39569) (#39832)
## Proposed changes

bp: #39808 #39569
2024-08-23 17:19:48 +08:00
8f15efdbb8 [cherry-pick](branch-2.1) fix delete random distributed tbl (#39830)
## Proposed changes

cherry-pick #37985

<!--Describe your changes.-->
2024-08-23 17:17:05 +08:00
67a8099991 [fix](multi-catalog)fix max compute array and map type read offset (#39822)
bp #39680
2024-08-23 16:53:52 +08:00
1f16daa5f6 Revert "[bugfix](iceberg)clear block for partition values for 2.1 (#39569)" (#39815)
Reverts apache/doris#39729
2024-08-23 11:58:42 +08:00
6c10c47f79 [fix](fe) LIST partition table support modify default bucket num (#39688)
## Proposed changes
bp #39696

Issue Number: close #39684

```sql
CREATE TABLE `test1` (
    `id1` VARCHAR(255) NULL COMMENT 'id1',
    `id2` VARCHAR(255) NULL COMMENT 'id2',
    `event_time` VARCHAR(255) NULL COMMENT '事件时间',
    `event_date` VARCHAR(255) NULL COMMENT '事件日期',
    `event_ts` VARCHAR(256) NULL COMMENT '事件发生时间戳(毫秒)',
    `dt` VARCHAR(255) NOT NULL COMMENT '日期分区',
    `hr` VARCHAR(255) NOT NULL COMMENT '小时分区'
  ) ENGINE = OLAP DUPLICATE KEY(`id1`) COMMENT 'xxx' PARTITION BY LIST(`dt`, `hr`) (
    PARTITION p2024082021 VALUES IN (("2024-08-20", "21"))
  ) DISTRIBUTED BY HASH(`dt`, `hr`) BUCKETS 2 PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "min_load_replica_num" = "-1",
    "is_being_synced" = "false",
    "storage_medium" = "hdd",
    "storage_format" = "V2",
    "inverted_index_storage_format" = "V1",
    "light_schema_change" = "true",
    "disable_auto_compaction" = "false",
    "enable_single_replica_compaction" = "false",
    "group_commit_interval_ms" = "10000",
    "group_commit_data_bytes" = "134217728"
  );
```

1. 修改前表已有分区的bucket num为2

![image](https://github.com/user-attachments/assets/77efdd0c-f845-41a4-9a31-e454808ffe67)

2. 修改List分区表bucket num(从2 -> 4)

![image](https://github.com/user-attachments/assets/53b19918-2879-4cb3-b2bd-84ba35a7fc59)

3. 修改List分区后,添加的新分区bucket num为4

![image](https://github.com/user-attachments/assets/1f41f73f-d70f-433e-a7b6-8346b7dfcc4e)

Co-authored-by: tongyang.han <tongyang.han@jiduauto.com>
2024-08-23 11:52:16 +08:00
40a58b9e42 [branch-2.1][regression test](jdbc catalog) Enable CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS for clickhouse docker (#39667)
pick (#39425) #39693
2024-08-23 09:59:03 +08:00
0f8bd33077 [fix](scan) fix predicate contains cast that results in null, the pr… (#39809)
…edicate will be miss. (#39550)
https://github.com/apache/doris/pull/39550
```
drop table datetest;

create table datetest (
  id int,
  dt date
)
DUPLICATE key (id)
distributed by hash(id) buckets 1
properties(
  "replication_num" = "1"
);
insert into datetest values (1, '2024-01-01');

mysql [test10]>select dt from datetest  WHERE dt = 1 ;
+------------+
| dt         |
+------------+
| 2024-01-01 |
+------------+
```

now

```
mysql [test10]>select dt from datetest  WHERE dt = 1 ;
Empty set (0.16 sec)
```

<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-23 01:46:22 +08:00
dc732fe33f [bugfix](iceberg)clear block for partition values for 2.1 (#39569) (#39729)
## Proposed changes

bp: #39569

clear block, or we will get wrong partition values.
2024-08-22 22:43:02 +08:00
f553645a71 [fix](mtmv) transfer col in mysql varchar to text when create MTMV (#37668) (#39727)
pick from master #37668
2024-08-22 15:20:59 +08:00
10f3e88f7a [fix](nereids) fix distribution expr list (#39435)
pick from #39148
2024-08-22 15:19:51 +08:00
fd13962015 [chore](nereids) Added compatibility with mysql alias conflict (#38104) (#38440)
throw table name/alias conflict exception to keep same behavior with mysql

for example:
```sql
select * from test.a b, test.b
```

error:
```
Not unique table/alias: 'b'
```
2024-08-22 14:37:49 +08:00
50f440e653 [chore](nereids) Added compatibility with mysql alias filter (#39738)
qt_filter_select4 """
       select * from filter_alias_test.test b where filter_alias_test.b.id = 1;
    """

    qt_filter_select5 """
         select * from internal.filter_alias_test.test b where internal.filter_alias_test.b.id = 1;
    """
2024-08-22 14:36:14 +08:00
021982fc71 [fix](mtmv) Fix some pr to 21, prs are (#39041)(#38958)(#39541) (#39678)
## Proposed changes

pr: https://github.com/apache/doris/pull/39041
commitId: 22562985

pr: https://github.com/apache/doris/pull/38958
commitId: c365cb64

pr: https://github.com/apache/doris/pull/39541
commitId: 89bb669c
2024-08-22 10:27:55 +08:00
a55e109e97 [pick][Improment]Add schema table workload_group_privileges (#38436) (#39708)
pick #38436
2024-08-22 00:44:43 +08:00
Pxl
63d45f5d89 [Bug](predicate) fix wrong result of AcceptNullPredicate (#39497) (#39672)
pick from #39497
2024-08-22 00:24:57 +08:00
e51dd68b93 [fix](local shuffle) Fix correctness for bucket hash shuffle exchange… (#39691)
…r (#39568)

For query plan


![image](https://github.com/user-attachments/assets/334cc4c4-49ae-4330-83ff-03b9bae00e3c)

we will plan local exchangers  and get a new plan


![image](https://github.com/user-attachments/assets/2b8ece64-3aa0-423c-9db0-fd02024957db)

and the hash join operator will get probe and build data which are
different distributed (one is HASH shuffle and another is Bucket hash
shuffle). This PR fix it.
<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-08-22 00:23:39 +08:00
ebbebdf590 [regression](kerberos)add hive with kerberos write back case (#39682)
bp #38647
2024-08-21 18:29:42 +08:00
1460878bdf [fix](cluster key) forbid cluster key and remove case (#39679)
branch-2.1 does not support mow cluster key
2024-08-21 14:31:54 +08:00
0bfcee1251 [opt](file-cache) support system table file_cache_statistics (#39552)
1. Add new system table: `file_cache_statistics`

	This table is used for viewing metrics related to file cache on BE side

	```
	mysql> select * from information_schema.file_cache_statistics limit 10;

+-------+---------------+----------------------------+--------------------------------+--------------------+
| BE_ID | BE_IP | CACHE_PATH | METRIC_NAME | METRIC_VALUE |

+-------+---------------+----------------------------+--------------------------------+--------------------+
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_curr_elements | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_curr_size | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_max_elements | 102400 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_max_size | 21474836480 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio |
0.8539634687001242 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio_1h | 0
|
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio_5m | 0
|
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_curr_elements | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_curr_size | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_max_elements | 102400 |

+-------+---------------+----------------------------+--------------------------------+--------------------+
	```

	It will show metrics of file caches on each BE.

2. Add new metrics `hits_ratio_1h` and `hits_ratio_5m` for file cache

This 2 metrics will show the hit ratio of file cache in recent 1 hour or
5 minutes.
So that we can know recent hit ratio instead of global historical hit
ratio.
2024-08-21 10:03:39 +08:00
8a562aeb77 [opt](nereids) recover adoptive bucket shuffle (#39598)
## Proposed changes

pick from https://github.com/apache/doris/pull/36784

Co-authored-by: xiongzhongjian <xiongzhongjian@selectdb.com>
2024-08-21 09:26:53 +08:00
6df6f1dc97 [improvement](iceberg)]support doris's char/varchar to iceberg's string for 2.1 #38807 (#39645)
bp: #38807
2024-08-21 09:19:10 +08:00
27ba2542e2 [case](iceberg)append iceberg schema change case. (#38766) (#39630)
bp #38766
2024-08-21 09:17:12 +08:00
bb687bd69c [cherry-pick](branch-2.1) add function regexp_extract_or_null (#39561)
# Proposed changes

pick https://github.com/apache/doris/pull/38296
2024-08-21 09:14:58 +08:00
8e9bc7449b [test](inverted index) add test for need read data opt (#38261) (#39534)
## Proposed changes


pick from master #38261
2024-08-21 09:01:12 +08:00
273a62584c [opt](inverted index) unified optimization judgment to prevent omissions (#39473)
https://github.com/apache/doris/pull/38027
2024-08-17 16:57:19 +08:00
20936fe054 [branch-2.1][improvement](jdbc catalog) Compatible with ojdbc6 by adding version check (#39408)
pick (#39341)

In previous versions, we used a method based on JDBC 4.2 to read data,
so it was equivalent to abandoning support for ojdbc6. However, we
recently found that a large number of users still use Oracle version
11g, which will have some unexpected compatibility issues when using
ojdbc8 to connect. Therefore, I use version verification to make it
compatible with both ojdbc6 and ojdbc8, so that good compatibility can
be obtained through ojdbc6, and better reading efficiency can be
obtained through ojdbc8.
2024-08-17 16:43:01 +08:00
7687f2c53a [fix](ip-funcs) fix ip inet6_aton funcs #39415 (#39513) 2024-08-17 10:56:06 +08:00
ae8073f155 [opt](mtmv) partition rollup support week and quarter (#39286) (#39477)
pick from master #39286​
2024-08-16 20:01:06 +08:00
2948b5ea2b [branch-2.1][fix](jdbc scan) Remove the conjuncts.remove call in JdbcScan (#39407)
pick (#39180)

In #37565, due to the change in the calling order of finalize, the final
generated Plan will be missing the PREDICATES that have been pushed down
in Jdbc. Although this behavior is correct, before perfectly handling
the push down of various PREDICATES, we need to keep all conjuncts to
ensure that we can still filter data normally when the data returned by
Jdbc is a superset.
2024-08-16 19:01:40 +08:00
4458302a77 [Fix](Planner) fix delete from using does not attach partition information (#39020) (#39418)
cherry-pick from master #39020
Problem:
when use delete from using clause and assign partition information, it
would delete more data from other partition
Solved:
add partition information when transfer delete clause into insert into
select clause
2024-08-16 17:16:08 +08:00
824f035b98 [pick](Row store) fix row store with invalid json string in variant ty… (#39456)
#39394
2024-08-16 14:43:11 +08:00
d56000e924 [opt](Nereids) polish aggregate function signature matching (#39352) (#39460)
pick from master #39352

use double to match string
- corr
- covar
- covar_samp
- stddev
- stddev_samp

use largeint to match string
- group_bit_and
- group_bit_or
- group_git_xor

use double to match decimalv3
- topn_weighted

optimize error message
- multi_distinct_sum
- multi_distinct_sum0
2024-08-16 13:57:11 +08:00
021678c7c3 [fix](window_funnel) fix wrong result of window_funnel #38954 (#39270)
## Proposed changes

BP #38954
2024-08-16 09:59:31 +08:00
4380f3cb51 [fix](variable) support all type functions (#39144) (#39438)
pick from master #39144
2024-08-16 09:51:02 +08:00
3aaee8f7d5 [fix](Nereids) polish function signature search algorithm (#38497) (#39436)
pick from master #38497 and #39342

use array<double> for array<string>
- array_avg
- array_cum_sum
- array_difference
- array_product

use array<bigint> for array<string>
- bitmap_from_array

use double first
- fmod
- pmod

let high order function throw friendly exception
- array_filter
- array_first
- array_last
- array_reverse_split
- array_sort_by
- array_split

let return type same as parameter's type
- array_push_back
- array_push_front
- array_with_constant
- if

let greatest / least work same as mysql's greatest
2024-08-16 08:24:25 +08:00
6257e706fa [improve](ip)update ip for bloom_filter (#39414)
## Proposed changes
backport: https://github.com/apache/doris/pull/39253
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-16 08:20:19 +08:00
aebc70d75a revert [improvement](mv) Support to use cast when create sync materialized view #38008 (#39378)
## Proposed changes

this is brought by https://github.com/apache/doris/pull/38008
if use cast(FLOOR(MINUTE(time) / 15) as decimal(9, 0)) in group by
clause when sync materialized view. if downgrade from 2.1.6 to 2.1.5 or
upgrade 2.1.6 to 3.0.0
this may cause fe can not run. So revert the function.
2024-08-15 14:16:57 +08:00
4acd69590d [Fix](function) fix wrong nullable signature of function corr (#39380)
## Proposed changes

Issue Number: close #xxx

before `corr(nullable_x, nullable_y)` will core dump. not fixed.
no need to patch in master because the refactor
https://github.com/apache/doris/pull/37330 already changed the
implementation context
2024-08-15 14:10:09 +08:00
1accde9fb3 [fix](nestedtype) support nested type for schema change reorder (#39392)
## Proposed changes
backport: https://github.com/apache/doris/pull/39210
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-15 14:03:03 +08:00
c12137a8d6 [branch-2.1][fix](expr) Enhance SQL Expression Handling by Introducing printSqlInParens to CompoundPredicate (#39082)
pick #39064
2024-08-14 21:14:58 +08:00
226e01889c [fix](array_apply) pick array apply fix (#39328)
## Proposed changes
backport: https://github.com/apache/doris/pull/39105
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-14 18:52:29 +08:00
78d6e318fb [fix](ip)pick ip rowstore (#39345)
## Proposed changes
backport: https://github.com/apache/doris/pull/39258
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-14 18:51:58 +08:00
b26af32934 [fix](function) fix error return type in corr(float32,float32) (#39251) (#39350)
https://github.com/apache/doris/pull/39251
```
mysql [test11]>select corr(cast(x as float),cast(y as float)) from test_corr;
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]column_type not match data_types in agg node, column_type=Nullable(Float64), data_types=Nullable(Float32),column name=

```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-14 18:47:14 +08:00
a9692a305e [fix](function)timediff with now function causes a error signature (… (#39349)
…#39322)
https://github.com/apache/doris/pull/39322
## Proposed changes

```
mysql [(none)]>select round(timediff(now(),'2024-08-15')/60/60,2);
ERROR 1105 (HY000): errCode = 2, detailMessage = argument 1 requires datetimev2 type, however 'now()' is of datetime type
```
The reason is that the function parameter types were modified in
expectedInputTypes, which led to no match being found. The code here is
from a long time ago. Because the precision of datetimev2 could not be
deduced in the past, a separate implementation was made here. This code
can be safely deleted.


<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-14 18:36:14 +08:00
d2709f8600 [fix](test) fix test_numbers case (#39303)
bp: #38687

we use `order_qt` rather than `qt` to promise the order of results.
2024-08-13 22:26:39 +08:00
677435cef8 [Pick](Branch-2.1) pick json reader fix and support specify $. as column (#39271)
#39206
#38213
2024-08-13 17:44:45 +08:00
Pxl
33220109f7 [Bug](materialized-view) fix analyze where clause failed on mv (#39061) (#39209)
## Proposed changes
pick from #39061
fix analyze where clause failed on mv
do not analyze slot after replaceSlot to avoid duplicate columns in desc
2024-08-13 16:08:20 +08:00
228f78b80d [fix] (nereids) fix Match Expreesion in filter estimation (#39050) (#39215)
## Proposed changes

pick from master #39050
2024-08-13 10:57:53 +08:00
a6155a517d [fix] (topn) fix uncleared block in topn_next() (#39119) (#39224)
## Proposed changes

pick from master #39119
2024-08-13 10:34:17 +08:00