Commit Graph

3965 Commits

Author SHA1 Message Date
7a758f7944 [enhancement](mysql) Add have_query_cache variable to be compatible with old mysql client (#21701) 2023-07-11 14:05:40 +08:00
8d98f2ac7e [fix](errCode) Change the error code of a read-only variable (#21705) 2023-07-11 14:05:18 +08:00
5ed42705d4 [fix](jdbc scan) 1=1 does not translate to TRUE (#21688)
For most database systems, they recognize where 1=1 but not where true, so we should send the original 1=1 to the database
2023-07-11 14:04:49 +08:00
d3be10ee58 [improvement](column) Support for the default value of current_timestamp in microsecond (#21487) 2023-07-11 14:04:13 +08:00
7b403bff62 [feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623)
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable
2023-07-11 09:38:56 +08:00
d59c21e594 [test](spill) disable fuzzy spill variables for now (#21677)
we will rewrite this logic, so that it is useless now. Not test it anymore.
2023-07-10 22:28:41 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
202a5c636f [fix](create table) modify varchar default length 1 to 65533 (#21302)
*modify archer default length 1 to  varchar.max.length , when create table.*

```mysql
create table t2 (             
k1 CHAR,              
K2 CHAR(10) ,               
K3 VARCHAR ,             
 K4 VARCHAR(1024) )              
duplicate key (k1)              
distributed by hash(k1) buckets 1              
properties('replication_num' = '1');  

desc t2;
```

| Field | Type           | Null | Key   | Default | Extra |
| -- |--|--| -| -| -| 
| k1    | CHAR(1)        | Yes  | true  | NULL    |       |
| K2    | CHAR(10)       | Yes  | false | NULL    | NONE  |
| K3    | VARCHAR(65533) | Yes  | false | NULL    | NONE  |
| K4    | VARCHAR(1024)  | Yes  | false | NULL    | NONE  |
2023-07-10 17:57:21 +08:00
2b04fa604c fix: toCalendar should use Calendar.MONTH instead MONDAY (#21665) 2023-07-10 16:49:42 +08:00
0be349e250 [feature](jdbc) Support jdbc catalog to read json types (#21341) 2023-07-10 16:21:00 +08:00
a1a8ee8320 [enchancement](stats) Inject partition statistics #21543
The cost estimation can be more accurate if the statistics of partition are available. But we are running big data like 1T, can not really import.

So now we want to extend this by injecting partition statistics.

Syntax:

ALTER TABLE table_name MODIFY COLUMN column_name SET STATS ('stat_name' = 'stat_value', ...)
  [ PARTITION (partition_name) ];
Explanation:

- Table_name: The table to which the statistics are dropped. It can be a db_name.table_name form.
Column_name: Specified target column. table_name Must be a column that exists in. Statistics can only be modified one column at a time.

- Stat _ name and stat _ value: The corresponding stat name and the value of the stat info. Multiple stats are comma separated. Statistics that can be modified include row_count, ndv, num_nulls min_value max_value, and data_size.

- Partition_name: specifies the target partition. Must be a partition existing in table_name. Multiple partitions are separated by commas.
2023-07-10 15:06:25 +08:00
f9c56d59fc [improvement](statistics)Support external table show table stats, modify column stats and drop stats (#21624)
Support external table show table stats, modify column stats and drop stats.
2023-07-10 11:33:06 +08:00
Pxl
77336bff44 [Bug](materialized-view) adjust limit for create materialized view on uniq/agg table (#21580)
adjust limit for create materialized view on uniq/agg table
2023-07-10 10:04:17 +08:00
41fb3d5fa4 [opt](Nereids): Join use List<Plan> as children (#21608)
Join use List as children can avoid to construct extra ImmutableList
2023-07-09 17:11:55 +08:00
d9974e6337 [Chore](Job)Fix the wrong log when the export job reads fields and add more clear log information (#21490)
* [Chore](Job)Fix the wrong log when the export job reads fields and add more clear log information

* add OriginStatement .toString method
2023-07-09 17:06:38 +08:00
6b945680a7 [Improve](point query) audit point query (#21587) 2023-07-09 16:43:41 +08:00
015426b2b4 [fix](tablet report) fix fe can not update replica's status with be's report #21600 2023-07-09 16:23:18 +08:00
aacb9b9b66 [Enhancement](binlog) Add create/drop table, add/drop paritition && alter job, modify columns binlog support (#21544) 2023-07-09 09:11:56 +08:00
f2fb23e98f [pipeline](exec) disable pipeline load in now version (#21632) 2023-07-09 01:00:06 +08:00
f8a2c66174 [refactor](planner) refactor automatically set instance_num (#21640)
refactor automatically set instance_num
2023-07-08 21:59:17 +08:00
aad8043d44 [opt](Nereids) enable parallel scan for local phase agg (#21642)
after we forbid some cases off agg candidate plans,
all local phase agg require DistributionSpecAny for child.
So, we could enable parallel scan for it
2023-07-08 21:47:17 +08:00
51b0bbb667 [Feature] (binlog) Add getBinlogLag (#21637)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-07-08 07:41:45 +08:00
499592178e [fix](Nereids) Add alias name for system variable (#21615)
Add alias name for system variable to fix the col name is the values of system variable like:
```
mysql> select @@character_set_client;
+--------+
| 'utf8' |
+--------+
| utf8   |
+--------+
==================================
mysql> select @@character_set_client;
+------------------------+
| @@character_set_client |
+------------------------+
| utf8                   |
+------------------------+
```
2023-07-07 23:26:01 +08:00
d39bca5ec7 [fix](nereids) don't build cte producer if the consumer is empty relation (#21317)
explain WITH cte_0 AS ( SELECT 1 AS a ) SELECT * from cte_0 t1 join cte_0 t2 on true WHERE false;
before:
```
+----------------------------+
| Explain String             |
+----------------------------+
| PLAN FRAGMENT 0            |
|   OUTPUT EXPRS:            |
|     a[#1]                  |
|     a[#2]                  |
|   PARTITION: UNPARTITIONED |
|                            |
|   VRESULT SINK             |
|                            |
|   1:VEMPTYSET              |
|                            |
| PLAN FRAGMENT 1            |
|   OUTPUT EXPRS:            |
|     a[#0]                  |
|   PARTITION: UNPARTITIONED |
|                            |
|   MultiCastDataSinks       |
|                            |
|   0:VUNION                 |
|      constant exprs:       |
|          1                 |
+----------------------------+
```
after:

```
+----------------------------+
| Explain String             |
+----------------------------+
| PLAN FRAGMENT 0            |
|   OUTPUT EXPRS:            |
|     a[#0]                  |
|     a[#1]                  |
|   PARTITION: UNPARTITIONED |
|                            |
|   VRESULT SINK             |
|                            |
|   0:VEMPTYSET              |
+----------------------------+
```
2023-07-07 18:12:28 +08:00
cad9e8849c [minor](stats) ADD LOG in analyze task (#21362) 2023-07-07 18:04:15 +08:00
2d445bbb6d [opt](Nereids) forbid some bad case on agg plans (#21565)
1. forbid all candidates that need to gather process except must do it
2. forbid do local agg after reshuffle of two phase agg of distinct
3. forbid one phase agg after reshuffle
4. forbid three or four phase agg for distinct if any stage need reshuffle
5. forbid multi distinct for one distinct agg if do not need reshuffle
2023-07-07 17:45:55 +08:00
6b1a74af61 [Enhancement](planner&Nereids) support sql_select_limit for master (#21138)
support sql_select_limit for original planner and Nereids.
if enable the variable
In original planner, add limit to the top planNode
In Nereids, add limit node to the top in preprocess phase.
2023-07-07 17:18:38 +08:00
dc44345ee4 [Fix](Planner) change non boolean return type to boolean (#20599)
Problem: When using no boolean type as return type in where or having clause, the analyzer will check the return type and throw an error. But in some other databases, this usage is enable.

Solved: Cast return type to boolean in where clause and having clause. select *** from *** where case when *** then 1 else 0 end;
2023-07-07 17:12:41 +08:00
0b7b5dc991 [fix](catalog) wrong required slot info causing BE crash (#21598)
For file scan node, this is a special field `requiredSlot`, this field is set depends on the `isMaterialized` info of slot.
But `isMaterialized` info can be changed during the plan process, so we must update the `requiredSlot`
in `finalize` phase of scan node, otherwise, it may causing BE crash due to mismatching slot info.
2023-07-07 17:10:50 +08:00
02149ff329 [fix](nereids) Agg on unknown-stats column (#21428) 2023-07-07 17:03:04 +08:00
f908ea5573 [fix](Nereids) union distinct should not prune any column (#21610) 2023-07-07 14:38:28 +08:00
b5f247f73f [Improve](mysql)ensure constant time for computing hash value (#21569) 2023-07-07 14:04:11 +08:00
b70fb4ca8e [fix](test) build internal table for TPCHTest to fix testRank (#21566) 2023-07-07 12:46:07 +08:00
64d0e28ed0 [improvement](multi catalog)Use getPartitionsByNames to retrieve hive partitions (#21562)
Before, we get hive partition using HMS getPartition api. In this case, each partition need to call the api once. The performance is very poor when partition number is large. This pr use getPartitionsByNames to get multiple partitions in one api call.
To get 90000 partitions, the time costing is reduced to 14s from 108s.
2023-07-07 10:37:33 +08:00
9bcf79178e [Improvement](statistics, multi catalog)Support iceberg table stats collection (#21481)
Fetch iceberg table stats automatically while querying a table.
Collect accurate statistics for Iceberg table by running analyze sql in Doris (remove collect by meta option).
2023-07-07 09:18:37 +08:00
79221a54ca [refactor](Nereids): remove withLogicalProperties & check children size (#21563) 2023-07-06 20:37:17 +08:00
fba3ae96b9 Revert "[Fix](planner) Set inline view output as non constant after analyze (#21212)" (#21581)
This reverts commit 0c3acfdb7c744decb7b60e372007707a55d14e00.
2023-07-06 20:30:27 +08:00
2e651bbc9a [fix](nereids) fix some planner bugs (#21533)
1. allow cast boolean as date like type in nereids, the result is null
2. PruneOlapScanTablet rule can prune tablet even if a mv index is selected.
3. constant conjunct should not be pushed through agg node in old planner
2023-07-06 16:13:37 +08:00
0c3acfdb7c [Fix](planner) Set inline view output as non constant after analyze (#21212)
Problem:
Select list should be non const when from list have tables or multiple tuples. Or upper query will regard wrong of isConstant
And make wrong constant folding
For example: when using nullif funtion with subquery which result in two alternative constant, planner would treat it as constant expr. So analyzer would report an error of order by clause can not be constant

Solusion:
Change inline view output to non constant, because (select 1 a from table) as view , a in output is no constant when we see
view.a outside
2023-07-06 15:37:43 +08:00
068fe44493 [feature](profile) Add important time of legacy planner to profile (#20602)
Add important time in planning process. Add time points of:
// Join reorder end time
queryJoinReorderFinishTime means time after analyze and before join reorder
// Create single node plan end time
queryCreateSingleNodeFinishTime means time after join reorder and before finish create single node plan
// Create distribute plan end time
queryDistributedFinishTime means time after create single node plan and before finish create distributed node plan
2023-07-06 15:36:25 +08:00
bb3b6770b5 [Enhancement](multi-catalog) Make meta cache batch loading concurrently. (#21471)
I will enhance performance about querying meta cache of hms tables by 2 steps:
**Step1** : use concurrent batch loading for meta cache
**Step2** : execute some other tasks concurrently as soon as possible

**This pr mainly for step1 and it mainly do the following things:**
- Create a `CacheBulkLoader` for batch loading
- Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful)
- Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full).
- Remove parallel streams and use the `CacheBulkLoader` to do batch loadings
- Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size`
- Fix the spelling mistake for `max_hive_table_catch_num`
2023-07-06 15:18:30 +08:00
8839518bfb [Performance](Nereids): add withGroupExprLogicalPropChildren to reduce new Plan (#21477) 2023-07-06 14:10:31 +08:00
013bfc6a06 [Bug](row store) Fix column aggregate info lost when table is unique model (#21506) 2023-07-06 12:06:22 +08:00
b1be59c799 [enhancement](query) enable strong consistency by syncing max journal id from master (#21205)
Add a session var & config enable_strong_consistency_read to solve the problem that loading result may be shortly invisible to follwers, to meet users requirements in strong consistency read scenario.

Will sync max journal id from master and wait for replaying.
2023-07-06 10:25:38 +08:00
c1e82ce817 [fix](backup) fix show snapshot cauing mysql connection lost (#21520)
If this is no `info file` in repository, the mysql connection may lost when user executing `show snapshot on repo`,
```
2023-07-05 09:22:48,689 WARN (mysql-nio-pool-0|199) [ReadListener.lambda$handleEvent$0():60] Exception happened in one session(org.apache.doris.qe.ConnectContext@730797c1).
java.io.IOException: Error happened when receiving packet.
    at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:691) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
```

This is because there are some field missing in returned result set.
2023-07-05 22:44:57 +08:00
b6a5afa87d [Feature](multi-catalog) support query hive-view for nereids planner. (#21419)
Relevant pr #18815, support query hive views for nereids planner.
2023-07-05 21:58:03 +08:00
b3db904847 [fix](Nereids): when child is Aggregate, don't infer Distinct for it (#21519) 2023-07-05 19:39:41 +08:00
f868aa9d4a [Enhancement](multi-catalog) Add some checks for ShowPartitionsStmt. (#21446)
1.  Add some validations for ShowPartitionsStmt with hive tables
2. Make the behavior consistently with hive
2023-07-05 16:28:05 +08:00
0da1bc7acd [Fix](multi-catalog) Fallback to refresh catalog when hms events are missing (#21333)
Fix #20227, the implementation has some problems and can not catch event-missing-exception.
2023-07-05 16:27:01 +08:00
37a52789bd [improvement](statistics, multi catalog)Estimate hive table row count based on file size. (#21207)
Support estimate table row count based on file size.

With sample size=3000 (total partition number is 87491), load cache time is 45s.
With sample size=100000 (more than total partition number 87505), load cache time is 388s.
2023-07-05 16:07:12 +08:00