Commit Graph

18654 Commits

Author SHA1 Message Date
22f85be712 [fix](hive-ctas) support create hive table with full quolified name (#34984)
Before, when executing `create table hive.db.table as select` to create table in hive catalog,
if current catalog is not hive catalog, the default engine name will be filled with `olap`, which is wrong.

This PR will fill the default engine name base on specified catalog.
2024-05-18 18:42:43 +08:00
89d5f2e816 [fix](multi-catalog)remove http scheme in oss endpoint (#34907)
remove http scheme in oss endpoint, scheme maybe appear in url (http://bucket.http//.region.aliyuncs.com) if use http client
2024-05-18 18:42:33 +08:00
a59f9c3fa1 [fix](planner) fix unrequired slot bug when join node introduced by #25204 (#34923)
before fix, join node will retain some slots, which are not materialized and unrequired.
join node need remove these slots and not make them be output slots.

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2024-05-18 18:40:56 +08:00
435147d449 [enhance](mtmv) MTMV deal partition use name instead of id (#34910)
partition id will change when insert overwrite

When the materialized view runs a task, if the base table is in insert overwrite, the materialized view task may report an error: partition not found by partitionId

Upgrade compatibility: Hive currently does not support automatic refresh, so it has no impact
2024-05-18 18:40:29 +08:00
e3e5f18f26 [Fix](Json type) correct cast result for json type (#34764) 2024-05-18 18:40:17 +08:00
81bcb9d490 [opt](planner)(Nereids) support auto aggregation for random distributed table (#33630)
support auto aggregation for querying detail data of random distributed table:
the same key column will return only one row.
2024-05-18 18:40:16 +08:00
bfd875eae3 [opt](nereids) lazy get expression map when comparing hypergraph (#34753) 2024-05-18 18:38:19 +08:00
e66dd58860 [Improve](inverted index) improve performance by introducing bulk contains for bitmap in _build_index_result_column (#34831) 2024-05-18 18:38:04 +08:00
9b5028785d [fix](prepare) fix datetimev2 return err when binary_row_format (#34662)
fix datetimev2 return err when binary_row_format. before pr, Backend return datetimev2 alwary by to_string.
fix datatimev2 return metadata loss scale.
2024-05-18 18:37:41 +08:00
437c1a1ba4 [enhancement](regression-test) modify a key type tests (#34717)
Co-authored-by: cjj2010 <2449402815@qq.com>
2024-05-18 18:37:41 +08:00
274c96b12d [enhancement](regression-test) modify a key type tests (#34600)
Co-authored-by: cjj2010 <2449402815@qq.com>
2024-05-18 18:37:41 +08:00
05605d99a9 [opt](routine-load) optimize routine load task allocation algorithm (#34778) 2024-05-18 18:37:41 +08:00
cc11e50200 [fix](mtmv)Fix slot desc wrong in query rewrite by materialized view when query is complex (#34904) 2024-05-18 18:37:10 +08:00
5b72dd1217 [chore](test) remove useless drop table in test_list_partition_datatype (#34930) 2024-05-18 18:36:48 +08:00
73419c2431 [enhance](mtmv)MTMV support hive table determine whether the data is sync (#34845)
Previously supported, this PR only turns on the switch
2024-05-18 18:35:42 +08:00
eb7eaee386 [fix](function) money format (#34680) 2024-05-18 18:35:29 +08:00
6f5abfd23f [regression-test](fix) fix case bug, using test_insert_dft_tbl in multiple test cases #34983 2024-05-18 18:35:16 +08:00
db273d578f [Fix](tablet id) use int64_t instead of int32_t or uint32_t for tablet_id (#34962) 2024-05-18 18:34:05 +08:00
b51a4212d6 [fix](txn insert) Fix txn insert values error when connect to follower fe (#34950) 2024-05-18 18:33:55 +08:00
5d1f5968eb [fix](case)fix PolicyTest testMergeFilterNereidsPlanner failed (#34637)
The results of explain may have some order that is not fixed
2024-05-18 18:33:41 +08:00
0febfc10e4 [Fix](inverted index) fix wrong fs in inverted_index_file_writer (#34903) 2024-05-18 18:30:10 +08:00
dff6171546 [fix](auto inc) db_id and table_id should be int64_t instead of int32_t (#34912) 2024-05-18 18:29:59 +08:00
4b96f9834f [fix](move-memtable) change brpc connection type to single (#34883) 2024-05-18 18:29:20 +08:00
849eeb39e9 [fix](load) skip sending cancel rpc if VNodeChannel is not inited (#34897) 2024-05-18 18:29:10 +08:00
a07876e807 [fix](planner)correlated predicate should include isnull predicate (#34833) 2024-05-18 18:28:54 +08:00
8264078a9a [fix](nereids)4 phase agg may lost parameter in some case (#34816) 2024-05-18 18:28:41 +08:00
5719f6ff0c [fix](planner) fix date_xxx functions without complete function signature (#34761)
Problem:
When using current_date as input of functions like date_sub,
fold constant would failed cause of missing of function signature in Planner

Solved:
Add complete function signature of functions like date_sub
2024-05-18 18:26:38 +08:00
71caf88ec1 [opt](mtmv) Optimize the logic of slot mapping generate for performance (#34597)
Slot mapping is used for materialized view rewritting
given the relation mapping, the slot mapping is the same

Optimize the slot mapping genarate logic
Cache the slot mapping in materialization context by realation mapping key
2024-05-18 18:25:43 +08:00
Pxl
4a8df53553 [Chore](rollup) check duplicate column name when create table with rollup (#34827)
check duplicate column name when create table with rollup
2024-05-18 18:23:44 +08:00
1e53a2a81d [Improve](inverted index) improve query performance by not using output index result column (#34281) 2024-05-18 18:18:12 +08:00
6b1c441258 [fix](group_commit) Wal reader should check block length to avoid reading empty block (#34792) 2024-05-18 18:17:56 +08:00
38bac76b37 [opt](mtmv) Cache materialization check result for performance (#34301)
Need check materialization sql pattern in different abstract rule when rewrite by materialized view.
Such as the subClass of AbstractMaterializedViewJoinRule, MaterializedViewScanRule, AbstractMaterializedViewAggregateRule.
This check result can be cached when has checked, this can avoid unnecessary repeat check
2024-05-18 18:14:59 +08:00
30a036e7a4 [feature](mtmv) create mtmv support partitions rollup (#31812)
if create MTMV `date_trunc(`xxx`,'month')`
when related table is `range` partition,and have 3 partitions:
```
20200101-20200102
20200102-20200103
20200201-20200202
```
then MTMV will have 2 partitions:
```
20200101-20200201
20200201-20200301
```

when related table is `list` partition,and have 3 partitions:
```
(20200101,20200102)
(20200103)
(20200201)
```
then MTMV will have 2 partitions:
```
(20200101,20200102,20200103)
(20200201)
```
2024-05-18 18:14:48 +08:00
f7801948ad fix backup and restore failed between force_replication_allocation setted and not setted clusters (#34608) 2024-05-18 18:14:18 +08:00
6c515e0c76 [fix](group commit) Make compatibility issues on serializing and deserializing wal file more clear (#34793) 2024-05-18 18:12:43 +08:00
80dd027ce2 [opt](join) For left semi/anti join without mark join conjunct and without other conjucnts, stop probing after matching one row (#34703) 2024-05-18 18:08:50 +08:00
b6409f5584 [improvement](inverted index) Disable the use of skipping write index on load (#34719)
When `skip_write_index_on_load` is turned on, users will get an error when querying for the latest data(not compacted), giving them a bad experience. And we can use `inverted_index_ram_dir_enable = true` and `inverted_index_storage_format=V2` to reduce IO and CPU consumption. So we disable it now.

1. Disable setting `skip_write_index_on_load` to `true` in create table stmt.
2. Disable setting `skip_write_index_on_load` to `true` in alter table properties stmt. You can still alter `skip_write_index_on_load` to `false`.

Co-authored-by: Luennng <luennng@gmail.com>
2024-05-18 18:07:51 +08:00
1545d96617 [WIP](test) remove enable_nereids_planner in regression cases (part 4) (#34642)
before PR are
#34417
#34490
#34558
2024-05-18 18:07:39 +08:00
46bf43130f [test](case) error format case in test_query_json_object (#34722)
error format case in test_query_json_object
2024-05-18 18:07:23 +08:00
c71d0b6b22 [fix](Nereids) cast from json should always nullable (#34707) 2024-05-18 18:06:23 +08:00
5012ddd87a [fix](Nereids) fix sql cache return old value when truncate partition (#34698)
1. fix sql cache return old value when truncate partition
2. use expire_sql_cache_in_fe_second to control the expire time of the sql cache which in the NereidsSqlCacheManager
2024-05-18 18:05:31 +08:00
b3b848f862 [feature](Nereids): eliminate useless project (#34611) 2024-05-18 18:05:00 +08:00
e2614d453a [case](regression) Add hdfs backup restore case (#34716) 2024-05-18 18:03:05 +08:00
6f91e9cc4d [fix](test) fix s3 load test failed (#34671) 2024-05-18 18:02:31 +08:00
876248aa4e [fix](function) json_object can not input null value (#34591) 2024-05-18 18:00:48 +08:00
7e967e53b8 Fix failed p2 hive statistics case. (#34663) 2024-05-18 17:59:44 +08:00
691f3c5ee7 [Performance](Variant) Improve load performance for variant type (#33890)
1. remove phmap for padding rows
2. add SimpleFieldVisitorToScarlarType for short circuit type deducing
3. correct type coercion for conflict types bettween integers
4. improve nullable column performance
5. remove shared_ptr dependancy for DataType use TypeIndex instead
6. Optimization by caching the order of fields (which is almost always the same)
and a quick check to match the next expected field, instead of searching the hash table.

benchmark:
In clickbench data, load performance:
12m36.799s ->7m10.934s about 43% latency reduce

In variant_p2/performance.groovy:
3min44s20 -> 1min15s80 about 66% latency reducy
2024-05-18 17:58:33 +08:00
b76cfcd007 [refactor](mtmv) Materialization context and mtmv decoupling (#34093) (#34916)
Decoupling the MTMV from the materialization context.
Change MaterializationContext to abstract which is the materialization desc.
It now has AsyncMaterializationContext sub class, can also has other type of MaterializationContext such as
SyncMaterializationContext and so on.
2024-05-17 22:54:21 +08:00
385739564d [test](executor) Add workload group upgrade test #35007 2024-05-17 17:34:08 +08:00
2dc65ce356 2.1.3-rc09 2024-05-16 06:37:36 +08:00