Commit Graph

8289 Commits

Author SHA1 Message Date
3d758de7a2 [improvement](binlog) gc be binlog metas when tablet is dropped. (#22447) 2023-08-04 14:38:13 +08:00
34164f69ba [Enhancement](binlog) Add Barrier log into BinlogManager (#22559)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-04 14:37:12 +08:00
34b7f381b1 [fix](multi catalog)Filter .hive-staging dir under hive file path. #22574
Hive file path may contain temporary directory like this:

drwxrwxrwx   - root supergroup          0 2023-03-22 21:03 /usr/hive/warehouse/datalake_performance.db/clickbench_parquet_hits/.hive-staging_hive_2023-03-22_21-03-12_047_8461238469577574033-1
drwxrwxrwx   - root supergroup          0 2023-05-18 15:03 /usr/hive/warehouse/datalake_performance.db/clickbench_parquet_hits/.hive-staging_hive_2023-05-18_15-03-52_780_3065787006787646235-1
This will cause error when be try to read these files. Need to filter them during FE plan.
2023-08-04 14:14:53 +08:00
3d5b90befe [fix](tablet clone) fix not add colocate replica and print some logs #22378 2023-08-04 14:09:02 +08:00
658d75c816 [feature](Nereids): normalize join condition after expanding or condition NLJ (#22555) 2023-08-04 13:37:37 +08:00
d5a21de796 [Enhancement](planner)support fold constant for date_trunc() (#22122) 2023-08-04 13:32:48 +08:00
62b1a7bcf3 [tpcds](nereids) add rule to eliminate empty relation #22203
1. eliminate emptyrelation,
2. const fold after filter pushdown
2023-08-04 12:49:53 +08:00
0e9fad4fe9 [stats](nereids) improve Anti join stats estimation #22444
No impact on TPC-H
impact on TPC-DS 16/69/94  improved
2023-08-04 12:48:39 +08:00
d3cab017ec [chore](topn-opt) temporary disable two phase read for TableQueryPlanActionQ (#22543) 2023-08-04 11:53:48 +08:00
479e62de0f [Fix](multi catalog)Fix hive partition contains special character bug (#22541)
Hive partition path may contain special characters, need to encode it before creating a URI object based on the file path.
2023-08-03 23:53:25 +08:00
3447a70b25 [Fix](planner)fix delete stmt contains where but delete all data. (#22563) 2023-08-03 23:44:05 +08:00
a6f6b351fe [feature](profile) add DORIS_BUILD_SHORT_HASH in profile #22516 2023-08-03 21:25:26 +08:00
151120c907 [Improvement](statistics)Improve show analyze performance. #22484 2023-08-03 21:22:37 +08:00
469886eb4e [FIX](array)fix if function for array() #22553
[FIX](array)fix if function for array() #22553
2023-08-03 19:40:45 +08:00
60ca5b0bad [Improvement](statistics)Return meaningful error message when show column stats column name doesn't exist (#22458)
The error message was not good for not exist column while show column stats:
```
MySQL [hive.tpch100]> show column stats `lineitem` (l_extendedpric);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
```

This pr show a meaningful message:
```
mysql> show column stats `lineitem` (l_extendedpric);
ERROR 1105 (HY000): errCode = 2, detailMessage = Column: l_extendedpric not exists
```
2023-08-03 16:35:14 +08:00
27f6e4649e [improvement](stats) Catch exception properly #22503
Catch exception instead of throw to caller directly to avoid unexpected interruption of upper logic
2023-08-03 15:16:55 +08:00
3961b8df76 [refactor](Nereids) mv top-n two phase read rule from post processor to rewriter (#22487)
use three new plan node to represent defer materialize of TopN.
Example:

```
-- SQL
select * from t1 order by c1 limit 10;

-- PLAN
+------------------------------------------+
| Explain String                           |
+------------------------------------------+
| PhysicalDeferMaterializeResultSink       |
| --PhysicalDeferMaterializeTopN           |
| ----PhysicalDistribute                   |
| ------PhysicalDeferMaterializeTopN       |
| --------PhysicalDeferMaterializeOlapScan |
+------------------------------------------+
```
2023-08-03 14:28:13 +08:00
4f9969ce1e [feature](show-frontends-disk) Add Show frontend disks (#22040)
Co-authored-by: yuxianbing <yuxianbing@yy.com>
Co-authored-by: yuxianbing <iloveqaz123>
2023-08-03 14:04:48 +08:00
4322fdc96d [feature](Nereids): add or expansion in CBO(#22465) 2023-08-03 13:29:33 +08:00
85a95e206e [bugfix](profile) not output some variables correctly (#22537) 2023-08-03 13:17:02 +08:00
e670d84b72 [feature](executor) using max_instance_num to limit automatically instance (#22521) 2023-08-03 13:12:32 +08:00
596fd4d86d [improvement](file-scan) reduce the min size of file split (#22412)
Reduce from 128MB to 8MB.
So that user can set `file_split_size` more flexible.
2023-08-03 11:42:00 +08:00
fb644ad691 [improvement](stats) Add more logs and config options (#22436)
1. add more logs and make error messages more clear
2. sleep a while between retry analyze
3. make concurrency of sync analyze configurable
4. Ignore internal columns like delete sign to save resources
2023-08-03 09:55:29 +08:00
e5028314bc [Feature](Job)Support scheduler job (#21916) 2023-08-02 21:34:43 +08:00
8cac8df40c [Fix](Planner) fix create view tosql not include partition (#22482)
Problem:
When create view with join in table partitions, an error would rise like "Unknown column"

Example:
CREATE VIEW my_view AS SELECT t1.* FROM t1 PARTITION(p1) JOIN t2 PARTITION(p2) ON t1.k1 = t2.k1;
select * from my_view ==> errCode = 2, detailMessage = Unknown column 'k1' in 't2'

Reason:
When create view, we do tosql first in order to persistent view sql. And when doing tosql of table reference, partition key
word was removed to keep neat of sql string. But here when we remove partition keyword it would regarded as an alias.
So "PARTITION" keyword can not be removed.

Solved:
Add “PARTITION” keyword back to tosql string.
2023-08-02 20:04:59 +08:00
527782f3d3 [fix](nereids)move RecomputeLogicalPropertiesProcessor rule before topn optimization (#22488)
topn optimization will change MutableState. So need move RecomputeLogicalPropertiesProcessor rule before it
2023-08-02 17:36:56 +08:00
ddd90855a9 [vectorized](udaf) java udaf support with map type (#22397)
[vectorized](udaf) java udaf support with map type (#22397)
* test
* remove some unused
* update
* add case
2023-08-02 15:03:44 +08:00
16461fdc1c [feature](Nereids): pushdown COUNT through join (#22455) 2023-08-02 14:55:25 +08:00
41f984bb39 [fix](fe) Fix stmt forward #22469
The call of String.format() contains orphan %s that will cause following error.
Introduced from #21205
2023-08-02 10:34:04 +08:00
19d1f49fbe [improvement](compaction) compaction policy and options in the properties of a table (#22461) 2023-08-01 22:02:23 +08:00
809f67e478 [fix](nereids)fix bug of cast expr to decimalv3 without any check (#22466) 2023-08-01 21:59:47 +08:00
94dee833cd [fix](multi-catalog)fix compatible with hdfs HA empty prefix (#22424) 2023-08-01 21:48:16 +08:00
b8399148ef [fix](DOE) es catalog not working with pipeline,datetimev2, array and esquery (#22046) 2023-08-01 21:45:16 +08:00
d5d82b7c31 [stats](nereids) fix bug for avg-size (#22421) 2023-08-01 17:13:00 +08:00
d4a6ef3f8c [fix](Nereids) fix test framework of hypergraph (#22434) 2023-08-01 16:20:07 +08:00
26737dddff [feature](Nereids): pushdown MIN/MAX/SUM through join (#22264)
* [minor](Nereids): add more comment to explain code

* [feature](Nereids): pushdown MIN/MAX/SUM through join
2023-08-01 13:23:55 +08:00
a6e7e134a3 Revert "[fix](show-stmt) fix show create table missing storage_medium info (#21757)" (#22443)
This reverts commit ec72383d3372b519e7957f237fad456130230804.

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-08-01 12:00:34 +08:00
450e0b1078 [fix](nereids) recompute logical properties in plan post process (#22356)
join commute rule will swap the left and right child. This cause the change of logical properties. So we need recompute the logical properties in plan post process to get the correct result
2023-07-31 21:04:39 +08:00
bb67225143 [bugfix](profile summary) move detail info from summary to execution summary (#22425)
* [bugfix](profile summary) move detail info from summary to execution summary


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-07-31 20:37:01 +08:00
ec72383d33 [fix](show-stmt) fix show create table missing storage_medium info (#21757) 2023-07-31 19:26:21 +08:00
2a320ade82 [feature](property) Add table property "is_being_synced" (#22314) 2023-07-31 18:14:13 +08:00
e72a012ada [enhancement](stats) Retry when loading stats (#21849) 2023-07-31 17:33:20 +08:00
afb6a57aa8 [enhancement](nereids) Improve stats preload performance (#21970) 2023-07-31 17:32:01 +08:00
3a1d678ca9 [Fix](Planner) fix parse error of view with group_concat order by (#22196)
Problem:
    When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view

For example:
    it works when doing 
    "SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"
    but when create view it does not work
    "create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"

Reason:
    when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
    group_concat(`name`, "," ORDER BY id)  ==> group_concat(`name`, "," , ORDER BY id)

Solved:
    Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it
2023-07-31 17:20:23 +08:00
4c6458aa77 [enhancement](nereids) Execute sync analyze task with multi-thread (#22211)
It was executed in sequentialy, which may cause a lot of time
2023-07-31 15:05:07 +08:00
8ccd8b4337 [fix](Nereids) fix ends calculation when there are constant project (#22265) 2023-07-31 14:10:44 +08:00
f2919567df [feature](datetime) Support timezone when insert datetime value (#21898) 2023-07-31 13:08:28 +08:00
93a9cec406 [Improvement] Add iceberg metadata cache and support manifest file content cache (#22336)
Cache the iceberg table. When accessing the same table, the metadata will only be loaded once.
Cache the snapshot of the table to optimize the performance of the iceberg table function.
Add cache support for iceberg's manifest file content
a simple test from 2.0s to 0.8s

before
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
|    3 | a    | c    |
....
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (2.10 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
|    3 | a    | c    |
...
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (2.00 sec)

after
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
...
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (2.05 sec)

mysql> select * from tb3;
+------+------+------+
| id   | par  | data |
+------+------+------+
|    1 | a    | a    |
|    2 | a    | b    |
|    3 | a    | c    |
...
|   68 | a    | a    |
|   69 | a    | b    |
|   70 | a    | c    |
+------+------+------+
70 rows in set (0.80 sec)
2023-07-31 10:12:09 +08:00
ec0be8a037 [bug](decimal) change result type for decimalv2 computation (#22366) 2023-07-31 10:00:34 +08:00
0e7f63f5f6 [fix](ipv6)Remove restrictions from IPv4 when add backend (#22323)
When adding be, it is required to have only one colon, otherwise an error will be reported. However, ipv6 has many colons

```
String[] pair = hostPort.split(":");
if (pair.length != 2) {
    throw new AnalysisException("Invalid host port: " + hostPort);
}
```
2023-07-30 22:47:24 +08:00