Commit Graph

5593 Commits

Author SHA1 Message Date
6ffc26858a [Improvement](meta) add default_value column & is changed column for result of show_variables stmt (#23017)
* [Improvement](meta) add default_value column for result of show_variables stmt

* add Changed column to show whether value is modified

* fix code style issue
2023-08-20 20:48:45 +08:00
97fa840324 [feature](multi-catalog)support iceberg hadoop catalog external table query (#22949)
support iceberg hadoop catalog external table query
2023-08-20 19:29:25 +08:00
5ba505ebf4 [fix](multi-catalog)fix avro and jdbc scanner dependency (#23015)
add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions
2023-08-20 19:28:17 +08:00
6bf65253d0 [fix](Nereids): unstable test when run single UT. (#23189) 2023-08-18 23:14:56 +08:00
10abbd2b62 [Feauture](Export) support parallel export job using Job Schedule (#22854) 2023-08-18 22:24:42 +08:00
6847592137 [Fix](RoutineLoad)Fix when Unique (MoW) RoutineLoad imports unspecified Sequence column (#23167)
[Fix](RoutineLoad)Fix when Unique (MoW) routineload imports unspecified Sequence column
2023-08-18 21:49:09 +08:00
b6dd56fee0 [fix](multi-catalog)fix compability issue for s3 endpoint (#23175) 2023-08-18 18:37:21 +08:00
345eaab00b [refactor](Nereids): remove useless equals()/hashcode() about Id (#23162) 2023-08-18 18:31:31 +08:00
7c4870c371 [fix](catalog) fix hive partition prune bug on nereids (#23026) 2023-08-18 18:31:01 +08:00
9cee0ecccc [fix](show-table-status) fix priv error on show table status stmt (#22918) 2023-08-18 18:30:09 +08:00
f71b78c415 [enhancement](Nereids): remove override child(int index) (#23124)
method `child(int index)` use code `super.child(index)` will cause Pointer jump twice.
2023-08-18 17:34:49 +08:00
609d20de8c [refactor](nereids)remove ColumnStatistics.selectivity (#23039) 2023-08-18 16:45:54 +08:00
1c3cc77a54 [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236)
* [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty

* add ut

* fix nereids

* fix regression-test
2023-08-18 14:37:49 +08:00
a7771ea507 [fix](planner) fix current_timestamp param type mismatch when doing stream load (#23092)
FileLoadScanNode did not analyze the default value expr, result in target param type int32 become int8 as the original IntLiteral type.
2023-08-18 14:28:45 +08:00
635349a015 [fix](log4j) fix audit_log_roll_num not work for fe audit log file (#23157)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-08-18 14:13:45 +08:00
441032c3d8 [fix](Nereids): LogicalSink equals() shouldn't invoke super.equals() (#23145) 2023-08-18 14:05:48 +08:00
2d96d19030 [FIX](array-func) fix array() with decimal type (#23117)
if we write sql with : select array(1.0,2.0,null, null,2.0)
here will pass arg type with uint8 to be which does not match array() func sign with deicmal, and make be core. so here should cast from be and make null tag to cast decimal type
2023-08-18 12:12:50 +08:00
Pxl
59c6139aa5 [Chore](parser) fix create view failed when view contained cast as varchar (#23043)
fix create view failed when view contained cast as varchar
2023-08-18 11:50:18 +08:00
df8e7f7f09 [enhancement](msg) add disk root path in message (#23000) 2023-08-18 11:21:59 +08:00
d018ac8fb7 fix show grants throw NullPointerException (#22943) 2023-08-18 10:48:56 +08:00
a5ca6cadd6 [Improvement] Optimize count operation for iceberg (#22923)
Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
2023-08-18 09:57:51 +08:00
03d59ba81e [Fix](Nereids) fix sql-cache for nereids. (#22808)
1. should not use ((LogicalPlanAdapter)parsedStmt).getStatementContext().getOriginStatement().originStmt.toLowerCase() as the cache key (do not invoke toLowerCase()), for example: select * from tbl1 where k1 = 'a' is different with select * from tbl1 where k1 = 'A', so the cache should be missed.
2. according to issue 6735 , the cache key should contains all views' s ddl sql (including nested views)
2023-08-18 09:36:07 +08:00
hzq
38c182100a [refactor](mysql compatibility) An abstract class for all databases created for mysql compatibility (#23087)
Better code structure for mysql compatibility databases.
2023-08-18 09:16:23 +08:00
1f19d0db3e [improvement](tablet clone) improve tablet balance, scaling speed etc (#22317) 2023-08-17 22:30:49 +08:00
b91bb9f503 [fix](alter table property) fix alter property if rpc failed (#22845)
* fix alter property

* add regression case

* do not repeat
2023-08-17 18:02:34 +08:00
11d76d0ebe [fix](Nereids) non-inner join should not merge dist info (#22979)
1. left join should use left dist info.
2. right join should use right dist info.
3. full outer join should return ANY dist info.
2023-08-17 17:48:50 +08:00
d7a6b64a65 [Fix](Planner) fix case function with null cast to array null (#22947) 2023-08-17 16:37:07 +08:00
a248cb720c [fix](jdbc catalog) fix DefaultValueExpr in Jdbc table column when CTAS (#22978) 2023-08-17 15:52:20 +08:00
3fe419eafa [Fix](statistics)Fix update cached column stats bug (#23049)
`show column cached stats` sometimes show wrong min/max value:
```
mysql> show column cached stats hive.tpch100.region;
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| column_name | count | ndv  | num_null | data_size | avg_size_byte | min  | max  | updated_time |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| r_regionkey | 5.0   | 5.0  | 0.0      | 24.0      | 4.0           | N/A  | N/A  | null         |
| r_comment   | 5.0   | 5.0  | 0.0      | 396.0     | 66.0          | N/A  | N/A  | null         |
| r_name      | 5.0   | 5.0  | 0.0      | 40.8      | 6.8           | N/A  | N/A  | null         |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
```
This pr is to fix this bug. It is because while transferring ColumnStatistic object to JSON, it doesn't contain the minExpr and maxExpr attribute.
2023-08-17 15:20:02 +08:00
bf2b92f5e8 [fix](Nereids): PushdownDistinctThroughJoin don't push distinct for relation (#23066)
* [fix](Nereids): PushdownDistinctThroughJoin don't push distinct for relation.

* fix test
2023-08-17 14:50:34 +08:00
f5da9f4ccc [fix](muti-catalog)convert to s3 path when use aws endpoint (#22784)
Convert to s3 path when use aws endpoint
For compatibility, we can also use s3 client to access other cloud by setting s3 endpoint properties
2023-08-17 14:28:00 +08:00
92c8f842f7 [fix](nereids) dphyper join reorder use wrong method to get hash and other conjuncts (#22966)
should use getHashJoinConjuncts() and getOtherJoinConjuncts() to get hash and other conjuncts of hash join node instead of categorizing them by checking if it's 'EqualTo' expression
2023-08-17 11:03:45 +08:00
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
814acbf331 [pipeline](exec) disable pipeline load in master code (#23061)
disable pipeline load in master code
2023-08-16 21:53:58 +08:00
0594acfcf1 [fix](Nereids) scan should output all invisiable column (#23003) 2023-08-16 18:07:59 +08:00
f1880d32d9 [fix](nereids)bind slot failed because of "default_cluster" #23008
slot bind failed for following querys:
select tpch.lineitem.* from lineitem
select tpch.lineitem.l_partkey from lineitem

the unbound slot is tpch.lineitem.l_partkey, but the bounded slot is default_cluster:tpch.lineitem.l_partkey. They are not matched.
we need to ignore default_cluster: when compare dbName
2023-08-16 17:22:44 +08:00
92f443b3b8 [enhancement](Nereids): count(1) to count(*) #22999
add a rule to transform count(1) to count(*)
2023-08-16 17:19:23 +08:00
2dbca7a688 [Fix](Planner) fix multi phase analysis failed in multi instance environment substitution (#22840)
Problem:
When executing group_concat with order by inside in view, column can not be found when analyze.

Example:
create view if not exists test_view as select group_concat(c1,',' order by c1 asc) from table_group_concat;
select * from test_view;
it will return an error like: "can not find c1 in table_list"

Reason:
When we executing this sql in multi-instance environment, Planner would try to create plan in multi phase
aggregation. And because we analyze test_view independent with tables outside view. So we can not get
table informations inside view.

Solution:
Substitute order by expression of merge aggregation expressions.
2023-08-16 16:46:26 +08:00
7adb2be360 [Fix](Nereids) fix insert into return npe from follower node. (#22734)
insert into table command run at a follower node, it will forward to the master node, and the parsed statement is not set to the cascades context, but set to the executor::parsedStmt, we use the latter to get the user info.
2023-08-16 16:37:17 +08:00
5148bc6fa7 [fix](partial update)allow delete sign column in partial update in planForPipeline (#23034) 2023-08-16 14:20:39 +08:00
4510e16845 [improvement](delete) support delete predicate on value column for merge-on-write unique table (#21933)
Previously, delete statement with conditions on value columns are only supported on duplicate tables. After we introduce delete sign mechanism to do batch delete, a delete statement with conditions on value columns on unique tables will be transformed into the corresponding insert into ..., __DELETE_SIGN__ select ... statement. However, for unique table with merge-on-write enabled, the overhead of inserting these data can be eliminated. So this PR add the ability to allow delete predicate on value columns for merge-on-write unique tables.
2023-08-16 12:18:05 +08:00
3efa06e63e [Fix](View)varchar type conversion error (#22987) 2023-08-16 11:49:04 +08:00
221e7bdd17 [test](jdbc external) fix mysql and pg external regression test (#22998) 2023-08-16 10:44:47 +08:00
423002b20a [fix](nereids) partitionTopN & Window estimation (#22953)
* partitionTopN & winExpr estimation

* tpcds 44/47/57
2023-08-15 20:19:03 +08:00
80566f7fed [stats](nereids)support partition stats (#22606) 2023-08-15 17:52:25 +08:00
9b2323b7fd [Pipeline](exec) support async writer in pipelien query engine (#22901) 2023-08-15 17:32:53 +08:00
d7a5c37672 [improvement](tablet clone) update the capacity coeficient for calculating backend load score (#22857)
update the capacity coeficient for calcutating the backend load score:
1. Add fe config entry `backend_load_capacity_coeficient` to allow setting the capacity coeficient manually;
2. Adjust calculating capacity coeficient as below.

We emphasize disk usage for calculating load score. 
If a be has a high used capacity percent, we should increase its load score.
So we increase capacity coefficient with a be's used capacity percent.

But this is not enough. For example, if the tablets have a big difference in data size.
Then for below two BEs, their load score maybe the same:
BE A:  disk usage = 60%,  replica number = 2000  (it contains the big tablets)
BE B:  disk usage = 30%,  replica number = 4000  (it contains the small tablets)

But what we want is: firstly move some big tablets from A to B, after their disk usages are close,
then move some small tablets from B to A, finally both of their disk usages and replica number
are close.

To achieve this, when the max difference between all BE's disk usages >= 30%,  we set the capacity cofficient to 1.0 and avoid the affect of replica num. After the disk usage difference decrease, then decrease the capacity cofficient to make replica num effective.
2023-08-15 17:27:31 +08:00
7de362f646 [fix](Nereids): expand other join which has or condition (#22809) 2023-08-15 16:49:19 +08:00
dd09e42ca9 [enhancement](Nereids): expression unify constructor by using List (#22985) 2023-08-15 16:47:58 +08:00
140ab60a74 [Enhancement](multi-catalog) add a BE selection strategy for hdfs short-circuit-read. (#22697)
Sometimes the BEs will be deployed on the same node with DataNode, so we can use a more reasonable BE selection policy to use the hdfs short-circuit-read as much as possible.
2023-08-15 15:34:39 +08:00