Commit Graph

5755 Commits

Author SHA1 Message Date
d4a6ef3f8c [fix](Nereids) fix test framework of hypergraph (#22434) 2023-08-01 16:20:07 +08:00
26737dddff [feature](Nereids): pushdown MIN/MAX/SUM through join (#22264)
* [minor](Nereids): add more comment to explain code

* [feature](Nereids): pushdown MIN/MAX/SUM through join
2023-08-01 13:23:55 +08:00
a6e7e134a3 Revert "[fix](show-stmt) fix show create table missing storage_medium info (#21757)" (#22443)
This reverts commit ec72383d3372b519e7957f237fad456130230804.

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-08-01 12:00:34 +08:00
450e0b1078 [fix](nereids) recompute logical properties in plan post process (#22356)
join commute rule will swap the left and right child. This cause the change of logical properties. So we need recompute the logical properties in plan post process to get the correct result
2023-07-31 21:04:39 +08:00
bb67225143 [bugfix](profile summary) move detail info from summary to execution summary (#22425)
* [bugfix](profile summary) move detail info from summary to execution summary


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-07-31 20:37:01 +08:00
ec72383d33 [fix](show-stmt) fix show create table missing storage_medium info (#21757) 2023-07-31 19:26:21 +08:00
2a320ade82 [feature](property) Add table property "is_being_synced" (#22314) 2023-07-31 18:14:13 +08:00
e72a012ada [enhancement](stats) Retry when loading stats (#21849) 2023-07-31 17:33:20 +08:00
afb6a57aa8 [enhancement](nereids) Improve stats preload performance (#21970) 2023-07-31 17:32:01 +08:00
3a1d678ca9 [Fix](Planner) fix parse error of view with group_concat order by (#22196)
Problem:
    When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view

For example:
    it works when doing 
    "SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"
    but when create view it does not work
    "create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"

Reason:
    when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
    group_concat(`name`, "," ORDER BY id)  ==> group_concat(`name`, "," , ORDER BY id)

Solved:
    Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it
2023-07-31 17:20:23 +08:00
4c6458aa77 [enhancement](nereids) Execute sync analyze task with multi-thread (#22211)
It was executed in sequentialy, which may cause a lot of time
2023-07-31 15:05:07 +08:00
8ccd8b4337 [fix](Nereids) fix ends calculation when there are constant project (#22265) 2023-07-31 14:10:44 +08:00
f2919567df [feature](datetime) Support timezone when insert datetime value (#21898) 2023-07-31 13:08:28 +08:00
93a9cec406 [Improvement] Add iceberg metadata cache and support manifest file content cache (#22336)
Cache the iceberg table. When accessing the same table, the metadata will only be loaded once.
Cache the snapshot of the table to optimize the performance of the iceberg table function.
Add cache support for iceberg's manifest file content
a simple test from 2.0s to 0.8s

before
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
|    3 | a    | c    |
....
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (2.10 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
|    3 | a    | c    |
...
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (2.00 sec)

after
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
...
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (2.05 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
|    3 | a    | c    |
...
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (0.80 sec)
2023-07-31 10:12:09 +08:00
ec0be8a037 [bug](decimal) change result type for decimalv2 computation (#22366) 2023-07-31 10:00:34 +08:00
0e7f63f5f6 [fix](ipv6)Remove restrictions from IPv4 when add backend (#22323)
When adding be, it is required to have only one colon, otherwise an error will be reported. However, ipv6 has many colons

```
String[] pair = hostPort.split(":");
if (pair.length != 2) {
    throw new AnalysisException("Invalid host port: " + hostPort);
}
```
2023-07-30 22:47:24 +08:00
f87f29e1ab [fix](multi-catalog)compatible with hdfs HA empty prefix (#22342)
compatible with hdfs HA empty prefix
for example: ’hdfs:///‘ will be replaced to ’hdfs://ha-nameservice/‘
2023-07-30 22:21:14 +08:00
06e4061b94 [enhance](ColdHeatSeparation) carry use path style info along with cold heat separation to support using minio (#22249) 2023-07-30 21:03:33 +08:00
03761c37cd [Improvement](multi catalog) Support Iceberg, Paimon and MaxCompute table in nereids. (#22338) 2023-07-29 21:43:35 +08:00
47c2cc5c74 [vectorized](udf) java udf support with return map type (#22300) 2023-07-29 12:52:27 +08:00
ebd114b384 [enhancement](binlog) CreateTable inherit db binlog && Add some checks (#22293) 2023-07-29 08:27:27 +08:00
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
f7c106c709 [opt](nereids) enhance broadcast join cost calculation (#22092)
Enhance broadcast join cost calculation, by considering both the build side effort from building bigger hash table, and more probe side effort from bigger cost of ProbeWhenBuildSideOutput and ProbeWhenSearchHashTable, if parallel_fragment_exec_instance_num is more than 1.

Current solution gives a penalty factor on rightRowCount, and the factor is the total instance number to the power of 2.
Penalty on outputRows is not taken currently and will be refined in next generation cost model.

Also brings some update for shape checking:

update original control variable in shape file parallel_fragment_exec_instance_num to parallel_pipeline_task_num, if pipeline is enabled.
fix a be_number variable inactive issue.
2023-07-28 23:06:02 +08:00
05abfbc5ef [improvement](regression-test) add compression algorithm regression test (#22303) 2023-07-28 17:28:52 +08:00
25f26198f4 [fix](executor) only mysql connect to set GlobalPipelineTask (#22205) 2023-07-28 16:19:34 +08:00
5a0ad09856 [fix](nereids) SubqueryToApply may lost conjunct (#22262)
consider sql:
```
SELECT *
        FROM sub_query_correlated_subquery1 t1
        WHERE coalesce(bitand( 
        cast(
            (SELECT sum(k1)
            FROM sub_query_correlated_subquery3 ) AS int), 
            cast(t1.k1 AS int)), 
            coalesce(t1.k1, t1.k2)) is NULL
        ORDER BY  t1.k1, t1.k2;
```
is Null conjunct is lost in SubqueryToApply rule. This pr fix it
2023-07-28 15:08:56 +08:00
80673406b1 [fix](Nereids) project hidden columns when show_hidden_columns is true (#22285) 2023-07-28 15:08:18 +08:00
0c734a861e [Enhancement](delete) eliminate reading the old values of non-key columns for delete stmt (#22270) 2023-07-28 14:37:33 +08:00
9f565cf835 [fix](ut) fix ut of stats test #22325
After auto retry merged, it's hard to determine the execute times of doExecute method in compile time, and if the expected execute times in the expectation block is missed, unexpected invocation  exception would be thrown, so just remove the expected execute times
2023-07-28 14:23:35 +08:00
c2155678ca [fix](functions) fix now(null) crash (#22321)
before: BE crash
now:

mysql [test]>select now(null);
+-----------+
| now(NULL) |
+-----------+
| NULL      |
+-----------+
1 row in set (0.06 sec)
2023-07-28 14:07:56 +08:00
1c6246f7ee [improve](agg) support distinct agg node (#22169)
select c_name from customer union select c_name from customer
this sql used agg node to get distinct row of c_name,
so it's no need to wait for inserted all data to hash map,
could output the data which it's inserted into hash map successed.
2023-07-28 13:54:10 +08:00
ad080c691f [chore](log)Move non-user-friendly error message to be.WARNING (#22315)
Move non-user-friendly error message to be.WARNING
2023-07-28 13:15:25 +08:00
7be349a10b [opt](inverted index) add session variable enable_inverted_index_query to control whether query with inverted index (#22255) 2023-07-28 12:43:26 +08:00
5da5fac37a [refactor](Nereids) add result sink node (#22254)
use ResultSink as query root node to let plan of query statement
has the same pattern with insert statement
2023-07-28 11:31:09 +08:00
e87174dd6b [feature](planner) modify multi partition prefix value (#22098)
modify multi partition prefix value: 'p_'
2023-07-28 10:21:32 +08:00
bfa7f8df6d [fix](Nereids) parse logical binary stack overflow (#22308)
1. not use recursive parse to avoid stack overflow
2. To create a balanced tree instead of left deep tree
TODO: add expr_depth_limit to Nereids' parser
2023-07-28 09:48:17 +08:00
00863f25e9 [improvement](profile) add table name for file scan node (#22299)
```
VFILE_SCAN_NODE(region)  (id=0):(Active:  3.537us,  %  non-child:  0.00%)
                                -  RuntimeFilters:  :  
                              -  UseSpecificThreadToken:  False
                              -  AcquireRuntimeFilterTime:  501ns
                              -  AllocateResourceTime:  105.598us
```
2023-07-27 23:54:31 +08:00
442ae632e3 [fix](fs-cache) add 'scheme://authority' to fs cache key (#22263)
This file system cache key should contains `scheme://authority`, eg: `hdfs//nameservices1`.
Or it will encounter error:

```
Wrong FS: hdfs//abc/xxxx, expected: hdfs://def
```
2023-07-27 23:53:54 +08:00
f7d5453be8 [fix](nereids) fix cte bucket shuffle path (#22311) 2023-07-27 22:44:51 +08:00
461c4dfaae [fix](tablet clone) fix single replica load failed during migration (#22077) 2023-07-27 20:38:03 +08:00
e39d234db9 [opt](inverted index) add more check for create inverted index (#22297) 2023-07-27 20:33:24 +08:00
716d58f5ff [fix](Nereids) decimal divide should not return null if numerator is zero (#22309) 2023-07-27 20:23:04 +08:00
816fd50d1d [Enhancement](binlog) Add binlog enable diable check in BinlogManager (#22173)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-07-27 20:16:21 +08:00
a87d34b19b [Fix](multi catalog statistics)Improve external table statistics collection (#22224)
Improve external table statistics collection, including log, observability and fix some bugs.
1. Add Running state for statistics job.
2. Add progress for show analyze job. (n/m tasks finished, n/m task failed and so on)
3. Add analyze time cost for show analyze task.
4. Make task failure message more clear.
5. Synchronize the job status updating code in updateTaskStatus.
6. Fix NPE in HMSAnalyzeTask. (Avoid refreshing statistics cache if the collection sql failed)
7. Return error message for with sync collection while timeout. 
8. Log level improvement
9. Fix misuse of logCreateAnalysisJob for tasks.
2023-07-27 20:01:14 +08:00
2c849c619d [fix](nereids) only allow inner join in dphyper join reorder (#22307)
current dphyper join reorder hasn't consider the join conjunct referencing only one side of the child. This is common case in outer join conjunct. So we need disable outer join reorder in dphyper until this problem is addressed.
2023-07-27 19:46:37 +08:00
ae5e39ad26 [opt](Nereids) add double signature back for round like function (#22284)
add double signature back for round like function
2023-07-27 19:10:43 +08:00
Pxl
87b9425772 [Bug](materialized-view) fix where clause not analyzed after fe restart (#22268)
fix where clause not analyzed after fe restart
2023-07-27 18:34:44 +08:00
b51fcbd9c7 [opt](stats) Scale replica of stats table to 3 when it's possible (#22227)
So that we could improve the availability of stats.
2023-07-27 17:36:54 +08:00
6f1c03c766 [fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251) 2023-07-27 16:50:42 +08:00
4f6a3c5bf0 [feature](catalog) support clob type in oracle jdbc catalog (#21532) 2023-07-27 15:49:15 +08:00