Commit Graph

13036 Commits

Author SHA1 Message Date
295ea482a1 [improvement](log) optimize template function log for performance (#23746)
change log level to debug and use format in template function log for performance.
2023-09-01 19:02:33 +08:00
0b94eee4c7 [fix](rest)query_info returns empty rows #23595 2023-09-01 18:50:49 +08:00
0d50c11d5c [Doc](AuditLoader) improvement auditLoader doc (#23758) 2023-09-01 18:48:39 +08:00
a8de805a7a [fix](Nereids) fix stats inject in or_expansion.groovy (#23748)
make stats injection run first
2023-09-01 18:31:58 +08:00
797d9de192 [fix](Nereids) When col stats is Unknow, not expression should return the stats with selectivity of 1 2023-09-01 17:36:31 +08:00
e3bbba82cf [Fix](planner) fix to_date failed in create table as select (#23613)
Problem:
when create table as select using to_date function, it would failed

Example:
create table test_to_date properties('replication_num' = '1') as select to_date('20230816') as datev2;

Reason:
after release version 2.0, datev1 is disabled, but to_date function signature does not upgrade, so it failed when checking return type of to_date

Solved:
when getfunction, forbidden to_date with return type date_v1, datetime v1 also changed to datetime v2 and decimal v2 changed to decimal v3
2023-09-01 17:28:40 +08:00
b5232ce0d7 [fix](nereids) NormalizeAggregate may push redundant expr to child project node (#23700)
NormalizeAggregate may push exprs to child project node. We need make sure there is no redundant expr in the pushed down expr list. This pr use 'Set' to make sure of that.
2023-09-01 17:16:10 +08:00
fe5feae480 [chroe](ci) add script for create_issue_comment (#23723)
Co-authored-by: stephen <hello-stephen@qq.com>
2023-09-01 15:46:25 +08:00
e3886bcf2a [fix](tablet sheduler) change sched period back to 1s (#23573)
This reverts commit 285bf978442fdff65fda5264ff40bd8291954ef2.

* change tablet sched peroid back to 1s
2023-09-01 15:29:59 +08:00
9d2fc78bd5 [fix](cooldown) Fix potential data loss when clone task's dst tablet is cooldown replica (#17644)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
2023-09-01 15:27:52 +08:00
91c5640cae [fix](tablet clone) fix clone backend chose wrong disk (#23729) 2023-09-01 15:12:35 +08:00
b843b79ddc [fix](tablet clone) fix tablet sched ctx toString cause null exeption (#23731) 2023-09-01 15:05:28 +08:00
Pxl
32853a529c [Bug](cte) fix multi cast data stream source not open expr (#23740)
fix multi cast data stream source not open expr
2023-09-01 14:57:12 +08:00
Pxl
0e9dd348fb [Improvment](materialized-view) add short circuit for selectBestMV #23743 2023-09-01 14:46:54 +08:00
eaf2a6a80e [fix](date) return right date value even if out of the range of date dictionary(#23664)
PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark.
How to fix:
1. Increase dictionary range: 1900 ~ 2038
2. The date out of 1900 ~ 2038 is regenerated.
2023-09-01 14:40:20 +08:00
5b2360e836 [opt](planner) speed up computeColumnsFilter on ScanNode (#23742)
computeColumnsFilter compute filter on all table base schema's column.
However, it table is very wide, such as 5000 columns. It will take a
long time. This PR compare conjuncts size and columns size. If conjuncts
size is small than columns size, then collect slots from conjuncts to
avoid traverse all columns.
2023-09-01 14:22:17 +08:00
e88c218390 [Improve](Job)Job internal interface provides immediate scheduling (#23735)
Delete meaningless job status
System scheduling is executed in the time wheel
Optimize window calculation code
2023-09-01 12:50:08 +08:00
c31cb5fd11 [enhance] use correct default value for show config action (#19284) 2023-09-01 11:28:26 +08:00
d96bc2de1a [enhance](policy) Support to change table's storage policy if the two policy has same resource (#23665) 2023-09-01 11:25:27 +08:00
d6450a3f1c [Fix](statistics)Fix external table auto analyze bugs (#23574)
1. Fix auto analyze external table recursively load schema cache bug.
2. Move some function in StatisticsAutoAnalyzer class to TableIf. So that external table and internal table could implement the logic separately. 
3. Disable external catalog auto analyze by default, could open it by adding catalog property "enable.auto.analyze"="true"
2023-09-01 10:58:14 +08:00
9a7e8b298a [Improvement](statistics)Show column stats even when error occurred (#23703)
Before, show column stats will ignore column with error.
In this pr, when min or max value failed to deserialize, show column stats will use N/A as value of min or max, and still show the rest stats. (count, null_count, ndv and so on).
2023-09-01 10:57:37 +08:00
b93a1a83a5 [opt](Nereids) let keywords list same with legacy planner (#23632) 2023-09-01 10:24:30 +08:00
e1090d6a63 [Fix](column predicate) seperate CHAR primitive type for column predicate (#23581) 2023-09-01 09:41:53 +08:00
hzq
16d6357266 [fix] (mac compile) Fix mac compile error & fe start time related (#23727)
Fix of PR #23582

Some Fe codes are deleted by [Improvement](pipeline) Cancel outdated query if original fe restarts #23582 , need to be added back;
Fix mac build failed caused by wrong thrift declaration order.
2023-09-01 08:02:30 +08:00
b16ab0bff7 [Docs] (maint-monitor) when config automatic-service-start, we need config JAVA_HOME in the fe.conf and be.conf firstly (#23610) 2023-09-01 08:01:12 +08:00
65f41f71c1 [pipelineX](refactor) refine codes (#23726) 2023-09-01 07:57:35 +08:00
d0e906f329 [Docs](alter partition) Fix the docs of adding default partition (#23705)
according to https://github.com/apache/doris/pull/15509, add a default list partition don't need the keyword `DEFAULT`
2023-09-01 00:20:12 +08:00
6b4d1c2d86 [Doc](flink connector) Add new configuration be nodes (#23698) 2023-09-01 00:16:08 +08:00
52e645abd2 [Feature](Nereids): support cte for update and delete statements of Nereids (#23384) 2023-08-31 23:36:27 +08:00
c74ca15753 [pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589) 2023-08-31 22:44:25 +08:00
b763bfa17d [Doc](tvf)Added tvf support for reading documents from avro files (#23436) 2023-08-31 21:49:27 +08:00
72fef48f87 [Doc](flink-connector)Flink connector adds schema change related parameter documents (#23439) 2023-08-31 21:48:27 +08:00
e680d42fe7 [feature](information_schema)add metadata_name_ids for quickly get catlogs,db,table and add profiling table in order to Compatible with mysql (#22702)
add information_schema.metadata_name_idsfor quickly get catlogs,db,table.

1. table  struct :   
```mysql
mysql> desc  internal.information_schema.metadata_name_ids;
+---------------+--------------+------+-------+---------+-------+
| Field         | Type         | Null | Key   | Default | Extra |
+---------------+--------------+------+-------+---------+-------+
| CATALOG_ID    | BIGINT       | Yes  | false | NULL    |       |
| CATALOG_NAME  | VARCHAR(512) | Yes  | false | NULL    |       |
| DATABASE_ID   | BIGINT       | Yes  | false | NULL    |       |
| DATABASE_NAME | VARCHAR(64)  | Yes  | false | NULL    |       |
| TABLE_ID      | BIGINT       | Yes  | false | NULL    |       |
| TABLE_NAME    | VARCHAR(64)  | Yes  | false | NULL    |       |
+---------------+--------------+------+-------+---------+-------+
6 rows in set (0.00 sec) 


mysql> select * from internal.information_schema.metadata_name_ids where CATALOG_NAME="hive1" limit 1 \G;
*************************** 1. row ***************************
   CATALOG_ID: 113008
 CATALOG_NAME: hive1
  DATABASE_ID: 113042
DATABASE_NAME: ssb1_parquet
     TABLE_ID: 114009
   TABLE_NAME: dates
1 row in set (0.07 sec)
```

2. when you create / drop catalog , need not refresh catalog . 
```mysql
mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.34 sec)


mysql> drop catalog hive2;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec) 


mysql> create catalog hive3 ... 
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;                                                                        
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.32 sec)
```

3. create / drop table , need not refresh catalog .  
```mysql
mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl ... ;


mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10666
1 row in set (0.04 sec)

mysql> drop table demo.example_tbl;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec) 

```

4. you can set query time , prevent queries from taking too long . 
```

fe.conf :  query_metadata_name_ids_timeout 

the time used to obtain all tables in one database

```
5. add information_schema.profiling in order to Compatible with  mysql

```mysql
mysql> select * from information_schema.profiling;
Empty set (0.07 sec)

mysql> set profiling=1;                                                                                 
Query OK, 0 rows affected (0.01 sec)
```
2023-08-31 21:22:26 +08:00
6fe2418cfc [fix](filter) fix error id in bloomfilter (#23564)
1. "set" may overwrite the original ID.
2.A bloom filter may not necessarily be an IN_OR_BLOOM_FILTER.

before may be
RuntimeFilterInfo  id  -1:  [type  =  BF,  input  =  25,  filtered  =  0]
now 
 RuntimeFilterInfo  id  0:  [type  =  BF,  input  =  25,  filtered  =  0]
2023-08-31 21:12:09 +08:00
25b6e4deb2 [fix](daemon) Fix incorrect initialization order of daemon services (#23578)
Current initialization dependency:

      Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo
                │
                │
BackendService ─┘
However, original code incorrectly initialize Daemon before StorageEngine.
This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.
2023-08-31 19:46:38 +08:00
b3a9c247af [refactor](move-memtable) add load stream stub (#23642) 2023-08-31 19:39:34 +08:00
b5e8217743 [opt](Nereids) speed up deepEquals of TreeNode (#23710) 2023-08-31 19:38:44 +08:00
3a34ec95af [FE](fucntion) add date_floor/ceil in FE function (#23539) 2023-08-31 19:26:47 +08:00
e54cd6a35d [fix](regression)fix case test_outfile_orc_max_file_size by replace table_export_name #23648
fix case test_outfile_orc_max_file_size by replace table_export_name
2023-08-31 18:51:13 +08:00
f1e43fcaa4 [opt](cache) Support segment cache dynamic opening and closing (#23659)
Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once.
TODO, Support page cache and other caches.

curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true
2023-08-31 18:48:26 +08:00
3a2c0d16f7 [fix](parquet) fix potential heap-use-after-free issue and cache issue (#23638)
1. When file meta cache is disabled (by setting `max_external_file_meta_cache_num=0` in be.conf),
the parquet's meta info is owned by parquet reader and will be released when calling `reader->close()`.

But the underlying file reader of this parquet reader will be released after `reader->close()`,
this may causing `heap-use-after-free` bug because some part of meta info may be referenced by file reader.

This PR fix it by making sure that meta info is released after file reader released.

2. Add modification time for file meta cache in BE, to avoid parquet read error like:
`Failed to deserialize parquet page header`
2023-08-31 18:23:05 +08:00
da5c78019c [opt](fe-ui) support read hardware info from aarch64 MacOS (#23708)
update the version of oshi and jna to support read hardware info from aarch64 MacOS
2023-08-31 18:16:33 +08:00
hzq
c083336bbe [Improvement](pipeline) Cancel outdated query if original fe restarts (#23582)
If any FE restarts, queries that is emitted from this FE will be cancelled.

Implementation of #23704
2023-08-31 17:58:52 +08:00
f214485733 [fix](regression) try fix regression test no_await (#23661) 2023-08-31 16:22:51 +08:00
cb2515b7c8 [Fix](meta lock) Should not acquire wlock twice (#23666) 2023-08-31 15:53:35 +08:00
7379cdc995 [feature](nereids) support subquery in select list (#23271)
1. add scalar subquery's output to LogicalApply's output
2. for in and exists subquery's, add mark join slot into LogicalApply's output
3. forbid push down alias through join if the project list have any mark join slots.
4. move normalize aggregate rule to analysis phase
2023-08-31 15:51:32 +08:00
62c075bf7e [improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672) 2023-08-31 14:44:17 +08:00
409640ac46 [Bug](decimal) Prevent invalid decimal value (#23677) 2023-08-31 14:43:10 +08:00
126606cb4d [Fix](cache) fix query cache returns wrong result after deleting partitions. (#23555)
The reason is that sql cache just use partitionKey , latestVersion and latestTime to check if the cache should be returned, if we delete some partition(s) which is not the latest updated partition, all above values are not changed, so the cache will hit.
Use a field to save the partition num of these tables and sum the partition nums and send it to BE, there are two situations which contains delete-partition ops:

- just delete some partition(s), so the sum of partition num will be lower than before.
- delete some partition(s) coexists with add some partition(s), so the latest time or latest version will be higher than before.
2023-08-31 14:22:52 +08:00
46eb0c7796 [Fix](status) fix printing too many logs in VNodeChannel::try_send_and_fetch_status #23693
after #23425, Status::InternalError(...) will print stacktrace and warning logs, so we can't use it in VNodeChannel::try_send_and_fetch_status
2023-08-31 13:54:23 +08:00