Commit Graph

5755 Commits

Author SHA1 Message Date
cf66280e60 [opt](stats) Sampling when aggregate column stats (#21020)
In the previous implementation, when aggregating partition statistics into column statistics, the calculation of distinct values (ndv) for the entire column was performed without using sampling, resulting in reduced efficiency of the sampling process.

Before this PR analyze below table which has 1000000 lines would cost 5.75sec, after this PR, it would cost 3.39sec.


```sql
CREATE TABLE IF NOT EXISTS `duplicate_all` (
    `k3` int(11) null comment "",
    `k0` boolean null comment "",
    `k1` tinyint(4) null comment "",
    `k2` smallint(6) null comment "",
    `k4` bigint(20) null comment "",
    `k5` decimalv3(9, 3) null comment "",
    `k6` char(36) null comment "",
    `k10` date null comment "",
    `k11` datetime null comment "",
    `k7` varchar(64) null comment "",
    `k8` double null comment "",
    `k9` float null comment "",
    `k12` string  null comment "",
    `k13` largeint(40)  null comment ""
) engine=olap
DUPLICATE KEY(`k3`)
DISTRIBUTED BY HASH(`k3`) BUCKETS 5 properties("replication_num" = "3")
```
2023-06-25 15:52:01 +08:00
dd99468b8f [fix](stats) Fix jdbc timeout with multiple FE when execute analyze table (#21115)
SQL may forward to master to execute when connecting to follower node, the result should be set to `StmtExecutor#proxyResultSet`

Before this PR, in above scenario , submit analyze sql by  mysql client/jdbc whould return get malformed packet/ Communication failed.
2023-06-25 15:49:36 +08:00
76bdcf1d26 [improvement](pipeline) task group scan entity (#19924) 2023-06-25 14:43:35 +08:00
80d54368e0 [minor](Nereids) replace some nullable field to Optional (#20967) 2023-06-25 12:02:25 +08:00
207bc53b06 [functionpushdown](performance) move function pushdown as default false since its performance is not good (#21111)
set enable function pushdown default to false.
enable it in fuzzy mode to test this feature.
We should remove function pushdown in the future since we already have common expr pushdown.
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-25 10:36:20 +08:00
20b92b0812 [Feature](log)friendly hint for creating table failed (#20617) 2023-06-25 10:02:26 +08:00
5aa16e84bf [fix](catalog) do not call makeSureInitialized when create table from hms meta event (#21104)
In this PR, I remove the `makeSureInitialized()` call in `createTable()` method, because it is wrong and useless.
And also rename the methed's name to make it more clear.
2023-06-24 21:50:36 +08:00
Pxl
fa3bb2eabe [Bug](materialized-veiw) fix error happens when parsing create materialized view stmt #21095 2023-06-22 15:58:32 +08:00
eb6202e8be [minor](fe) remove several unnecessary codes (#21046)
1. The class 'ExternalDatabase' has implemented the 'GsonPostProcessable' interface, so
there is redundant codes in some subclass of 'ExternalDatabase'.
2. A LOG object is not used in this file.
2023-06-22 15:29:25 +08:00
9f0aa8a9de [fix](fuzzy)nereids and pipeline config changed by fuzzy in non-pipeline env. (#21092)
* fix: nereids and pipeline config  changed by fuzzy in non-pipeline env.

* fix: format

* fix: format
2023-06-22 08:36:19 +08:00
b192082b62 [Improve](load)Solve the problem of RoutineLoadTaskScheduler idling when there is no data (#20986)
Since the polling interval is 0, the CPU will be polled all the time when there is no data

Before and after comparison test, the CPU usage time is reduced by 2000 times
2023-06-22 00:41:45 +08:00
fff308352f [fix](nereids)the microseconds value is wrong when create datatimev2 literal from LocalDateTime (#21089)
* [fix](nereids)the microseconds value is wrong when create datatimev2 literal from LocalDateTime

* fix code style
2023-06-22 00:40:53 +08:00
e060ffab96 [Fix](cooldown) Fix incorrect judgement of isDropTableOrPartition (#21084) 2023-06-21 23:00:58 +08:00
8b561cfb03 [fix](nereids)create datev2 and datetimev2 literal if enable_date_conversion is true (#21065) 2023-06-21 20:29:36 +08:00
6ac0bfeceb [Feature](inverted index) add unicode parser for inverted index (#21035) 2023-06-21 20:14:06 +08:00
cc53391c9a Revert "[feature](merge-on-write) enable merge on write by default (#… (#21041) 2023-06-21 18:36:46 +08:00
2beed11256 [Bug](streamload) fix inconsistent load result of be and fe (#20950) 2023-06-21 18:12:51 +08:00
62fb0e642e [chore](dynamic schema) deprecated create dynamic schema table (#21058) 2023-06-21 14:44:57 +08:00
6f20cac1da [bugfix](cooldown) Fix potential deadlock while calling handleCooldownConf (#20975) 2023-06-21 14:44:01 +08:00
Pxl
5f0bb49d46 [Feature](materialized-view) support create mv contain aggstate column (#20812)
support create mv contain aggstate column
2023-06-21 13:06:52 +08:00
fcd778fb4f [Fix](mysql proto) avoid send duplicated OK packet (#21032)
1. The Mysql Go driver has a logic that terminates when it reads an EOF (end-of-file) and expects no data in the buffer. However, the front-end (FE) mistakenly returns an additional OK packet, which causes an exception to be thrown when reading the buffer.

2. Refactor some logic to support full prepared not just in where clause, like 
```
select ?, ? from tbl
```
2023-06-21 12:00:22 +08:00
18beb822a3 [FIX](array-type) fix array string output with fe const expr (#21042)
fe foldconstRule make array() function expr with const literal , and would not pass this array literal to be . but we should make fe array string output format is same with be array string output
2023-06-21 11:52:02 +08:00
ef17289925 [feature](jni) add jni metrics and attach to BE profile automatically (#21004)
Add JNI metrics, for example:
```
-  HudiJniScanner:  0ns
  -  FillBlockTime:  31.29ms
  -  GetRecordReaderTime:  1m5s
  -  JavaScanTime:  35s991ms
  -  OpenScannerTime:  1m6s
```
Add three common performance metrics for JNI scanner:
1. `OpenScannerTime`: Time to init and open JNI scanner
2. `JavaScanTime`: Time to scan data and insert into vector table in java side
3. `FillBlockTime`: Time to convert java vector table to c++ block

And support user defined metrics in java side, for example: `OpenScannerTime` is a long time for the open process, we want to determine which sub-process takes too much time, so we add `GetRecordReaderTime` in java side.
The user defined metrics in java side can be attached to BE profile automatically.
2023-06-21 11:19:02 +08:00
f10258577b [Fix](Planner) Fix group concat with multi distinct and segs (#20912)
Problem:
when use select group_concat(distinct a, 'seg1'), group_concat(distinct b, 'seg2') ... Error would rised
Reason:
Group_concat function regard 'seg' as arguments also, so multi distinct column error would rised
Solved:
let Multi Distinct group_concat function only get first argument as real argument
2023-06-20 21:00:18 +08:00
ca8f51602b [Improvement](multi catalog, statistics)Support two level external statistics cache loader (#20906)
The current column statistic cache loader is to load data from column_statistics olap table.
This pr is to change the cache loader logic to First load from column_statistics olap table, if no data was loaded, then load from table metadata. This is mainly to support fetch statistics data for external catalog using HMS or Iceberg api.
This is the first PR, next pr will implement the fetch logic for different external catalogs.
2023-06-20 16:43:18 +08:00
cb89af49e7 [improvement](replica) donot care last failed version in publish (#21001)
We just care 2 things:
1. If the replica acks right
2. If the replica catches up
2023-06-20 15:57:54 +08:00
0b1bbe4045 [Bugfix](CCR) BinlogTombstone tableId is null when db disable binlog (#20995) 2023-06-20 15:48:47 +08:00
0d80456869 [enhancement](backup) teach fe to acquire a consistent backup between be and fe (#21014) 2023-06-20 15:37:41 +08:00
f4d3f4ae19 [Fix](Nereids) failed to fold date_format() to constant (#20976) 2023-06-20 15:11:25 +08:00
ec34f72204 [enhancement](nereids) log for exception stack of sync analyze (#21013) 2023-06-20 15:11:03 +08:00
6b4a9edbbd [fix](nereids) Fix explain graph with CTE #20997
Add support of MultiCastDataSink
2023-06-20 14:55:21 +08:00
7da3fde89c [Fix](Nereids)cast to datev2 default for Nereids if enable_date_conversion (#20973) 2023-06-20 14:53:20 +08:00
53b2fe5db6 [improvement](jdbc) Set the JDBC connection timeout to be conf (#21000) 2023-06-20 14:23:48 +08:00
74a09fc6e5 [Dependency](fe)Use the release version of hive-catalog (#20921)
Used hive-catalog-shade 1.0.1
2023-06-20 11:53:59 +08:00
1eb4e5bd06 [Fix](Routineload)routine load does not support lowercase data source names (#21005) 2023-06-20 11:44:02 +08:00
923f7edad0 [opt](hudi) using native reader to read the base file with no log file (#20988)
Two optimizations:
1. Insert string bytes directly to remove decoding&encoding process.
2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.
2023-06-20 11:20:21 +08:00
7e01f074e2 [improvement](jdbc mysql) support auto calculate the precision of timestamp/datetime (#20788) 2023-06-20 10:39:34 +08:00
87258a13c4 [enhancement](nereids) Remove useless config option #20905
1. Remove useless config option
2. Fix timeout cancel, before this PR an OlapAnalysisTask would continue running even if it's already timeout.
2023-06-20 10:37:46 +08:00
824bc02603 [Function] Support date function: microsecond() (#20044) 2023-06-20 10:32:54 +08:00
0287cc15f2 [fix](meta) 'clean label from db' does not work (#20625)
When we use a label to load data, this label can not be used twice. But when we execute a sql 'CLEAN LABEL [label] FROM db;', we hope that the same label can be used again.
However, the sql above does not work. This PR is fixing this problem.
2023-06-20 10:25:31 +08:00
d02ecef406 [fix](Nereids): revert push down alias into union (#20991)
revert #20543 to tmp avoid problem
2023-06-20 09:32:26 +08:00
e7b070c9ec [fix](Nereids) subquery not return correct data type (#20985)
if we do type coercion on subquery, it return datatype after type coercion

error info
```
Both side of binary arithmetic is not numeric. left type is DECIMALV3(2, 1) and right type is DECIMAL(27, 9)')
```
2023-06-19 23:44:58 +08:00
5a28b6f9fc [fix](datetime) Fix the error in date calculation that includes constants (#20863)
before

```
mysql> select hours_add('2023-03-30 22:23:45.23452',8);
+-------------------------------------+
| hours_add('2023-03-30 22:23:45', 8) |
+-------------------------------------+
| 2023-03-31 06:23:45                 |
+-------------------------------------+

mysql> select date_add('2023-03-30 22:23:45.23452',8);
+------------------------------------+
| date_add('2023-03-30 22:23:45', 8) |
+------------------------------------+
| 2023-04-07 22:23:45                |
+------------------------------------+

mysql [test]>select hours_add('2023-03-30 22:23:45.23452',8);
+-------------------------------------------+
| hours_add('2023-03-30 22:23:45.23452', 8) |
+-------------------------------------------+
| 2023-03-31 06:23:45.000234                |
+-------------------------------------------+
```

after

```
mysql [test]>select hours_add('2023-03-30 22:23:45.23452',8);
+-------------------------------------------+
| hours_add('2023-03-30 22:23:45.23452', 8) |
+-------------------------------------------+
| 2023-03-31 06:23:45.23452                 |
+-------------------------------------------+
1 row in set (0.01 sec)

mysql [test]>select date_add('2023-03-30 22:23:45.23452',8);
+------------------------------------------+
| date_add('2023-03-30 22:23:45.23452', 8) |
+------------------------------------------+
| 2023-04-07 22:23:45.23452                |
+------------------------------------------+
1 row in set (0.00 sec)

mysql [test]>set enable_nereids_planner=true;
Query OK, 0 rows affected (0.00 sec)

mysql [test]>set enable_fallback_to_original_planner=false;
Query OK, 0 rows affected (0.00 sec)

mysql [test]>select hours_add('2023-03-30 22:23:45.23452',8);
+-------------------------------------------+
| hours_add('2023-03-30 22:23:45.23452', 8) |
+-------------------------------------------+
| 2023-03-31 06:23:45.23452                 |
+-------------------------------------------+
1 row in set (0.03 sec)

mysql [test]>select date_add('2023-03-30 22:23:45.23452',8);
+------------------------------------------+
| days_add('2023-03-30 22:23:45.23452', 8) |
+------------------------------------------+
| 2023-04-07 22:23:45.23452                |
+------------------------------------------+
1 row in set (0.00 sec)
```
2023-06-19 23:44:30 +08:00
e6f50c04f1 [fix](nereids)SubqueryToApply rule lost is null condition (#20971)
* [fix](nereids)SubqueryToApply rule lost is null condition
2023-06-19 23:43:40 +08:00
be8fb68712 [fix](nereids)distribute node missing rows and cost #20943
in dumped memo, distribute node missed estimated rows and cost.
2023-06-19 23:42:01 +08:00
f20ef165fe [opt](Nereids) update join stats derive (#20895)
in hash join condition, some equals are trustable, some are not.
an equal is trustable if one side is almost unique, like primary key. for such equal condition we could estimate more accurate.
the problem is in rewriten q20, the are 2 equal condition, one is trustable, another is not. But we treat both of them as trustable.

Test result:
on tpch100, from 2.2 sec to 0.44 sec
no impact on tpch other queries
no performance impact on tpcds queries
2023-06-19 23:40:44 +08:00
010861b7ec [enhancement](Nereids) Don't write to table_statistics when create sync analyze job anymore #20956
1. Don't write to table_statistics when create sync analyze job anymore since it's meaningless

2. Capture exceptions when creating each system analyze job to avoid the failure of creation of all automatic collection jobs due to a single job creation failure.

3. Mark auto triggered period job's job type as system
2023-06-19 20:00:41 +08:00
08ac55291f [opt](Nereids) change log level to debug to avoid log explode (#20954) 2023-06-19 18:50:06 +08:00
415f1053a4 [minor](progress) do not update progress if job id is not set (#20949) 2023-06-19 18:13:43 +08:00
63b9684696 [Feature](broker-load) Support priority for Broker Load job. (#20628)
Support priority for Broker Load job.
2023-06-19 14:16:48 +08:00