Commit Graph

8289 Commits

Author SHA1 Message Date
1a25f110ec [Fix](planner)Fix TupleDescriptor include not materialized slot bug (#18783)
setOutputSmap function in ScanNode may include not materialized to outputTupleDesc. This PR is to fix this.
2023-04-19 14:08:09 +08:00
446db3def6 [opt](nereids) estimate broadcast cost by a new formula (#18744)
estimate broadcast cost by an experience formula: beNumber^0.5 * rowCount
1. sender number and receiver number is not available at RBO stage now, so we use beNumber
2. senders and receivers work in parallel, that why we use square of beNumber
2023-04-19 12:14:55 +08:00
15529afed8 [minor](decimal) forbid to create table with decimal type exceeds 18 (#18763)
* [minor](decimal) forbid to create table with decimal type exceeds 18

* update
2023-04-19 11:34:27 +08:00
0b379de602 [refactor](scan) optimize the agg function of count(1) (#18739) 2023-04-19 09:10:51 +08:00
d24a8a524e [refactor](fe): Remove resource group which is useless (#18249) 2023-04-18 21:04:30 +08:00
5c076b738b [improvement](resource-group) add test for resource group (#18575)
Co-authored-by: wangbo <youseebiggirl_t_t@qq.com>
2023-04-18 20:20:50 +08:00
4a16eff16d [fix](merge-on-write) enable_unique_key_merge_on_write property should only be used for unique table (#18734) 2023-04-18 18:40:01 +08:00
031d35d4a1 [fix](stats) Stats still in cache after user dropped it (#18720)
1. Evict the dropped stats from cache
2. Remove codes for the partition level stats collection
3. Disable analyze whole database directly
4. Fix the potential death loop in the stats cleaner
5. Sleep thread in each loop when scanning stats table to avoid excessive IO usage by this task.
2023-04-18 16:41:10 +08:00
c3f808cc06 Revert "[enhancement](Nereids) optimize bloom filter size reducing strategy (#18596)" (#18768)
This reverts commit 3eac53f75d5f3eb05e958403efeb7578ad86e438.
2023-04-18 15:37:19 +08:00
62e4140d17 [fix](olap) fix lost disable_auto_compaction info when fe restart (#18757)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-18 14:11:40 +08:00
6b351a2818 [vectorzied](function) fix array_map function analyzed failed with order by clause (#18676)
* [vectorzied](function) fix array_map function analyzed failed with order by clause

* add test
2023-04-18 12:01:44 +08:00
3a6eae0ec5 [feature](Nereids): infer not null from Agg Count(distinct). (#18599) 2023-04-18 11:22:36 +08:00
98b8efc2c2 [fix](multi-catalog)fix old s3 properties check (#18430)
fix old s3 properties check
fix for #18005 (comment)
2023-04-18 09:58:13 +08:00
10b252856d [feature](Nereids): pullup semiJoin through aggregate. (#18669) 2023-04-18 09:31:07 +08:00
86b8e95045 [fix](Nereids): when GroupExpr already exists, we need to remove ParentExpression (#18749) 2023-04-17 23:12:26 +08:00
575c1620c2 [Improve](fe)Use commons-lang3 uniformly and refactor PatternGenerator#generateTypePattern (#18666)
`commons-lang`(1and2) is no longer maintained since 2011, and the official recommendation is `commons-lang3`, which can be smoothly upgraded to be compatible with `commons-lang`.
We use both dependencies in `fe`, which can be completely unified.

`PatternGenerator#generateTypePattern` has many meaningless loops, and IntegerRange is introduced for,
which is unnecessary. So I refactored it.
2023-04-17 20:15:17 +08:00
74d424e6d4 [Bug](DECIMAL) Fix bug for arithmatic expr DECIMALV2 / DECIMALV3 (#18723) 2023-04-17 16:43:36 +08:00
d61f52d277 [fix](Nereids): fix sum func in eager agg (#18675) 2023-04-17 15:06:28 +08:00
5300b21db7 [Bug](DECIMALV3) report failure if a decimal value is overflow (#18336) 2023-04-17 13:18:14 +08:00
eb128753ac [Opt](pipeline) opt pipeline shared scan (#18715) 2023-04-17 13:06:39 +08:00
a2278dbc6c [opt](nereids) optimize filter estimation for pattern "col=col" #18716
Tpc-h q10 and q5 benefit from this optimization.

For a given hash join condition, A=B, sometimes both A and B are reduced by filters. In this pr, both reductions are counted in join estimation.
2023-04-17 11:44:35 +08:00
b5b0148010 [feature](Nereids): when cost time > 5s, throw timeout Exception (#18316) 2023-04-17 11:21:54 +08:00
3eac53f75d [enhancement](Nereids) optimize bloom filter size reducing strategy (#18596) 2023-04-17 10:50:08 +08:00
ddbff2aa39 [feature](jni) map c++ block to java vector table (#18566)
PR(#17960) has introduced vector table which can map java table to c++ block.
In some cases(java udf & jdbc exector), we should map c++ block to java table. This PR implements this function.

The memory structure of java vector table and c++ block is consistent,
so the implementation doesn't copy the block, just passes the memory address.
2023-04-17 00:04:53 +08:00
57982ddc46 [Fix](catalog)Fix hudi-catalog get file split error (#18644) (#18673)
`hudi-common` depends on `parque-avro`, but the dependency scope is `provide`. 
When we use `hudi-catalog`, `HoodieAvroWriteSupport` will be called. This method depends on `parque-avro`, so it will generate ClassNotFound
Describe your changes.
2023-04-16 21:56:14 +08:00
e6884a3768 [log](fe) add more detail log for master transfer (#17350) (#17485) 2023-04-16 18:35:06 +08:00
1cbbc60822 [feature](config) support "experimental" prefix for FE config (#18699)
For each release of Doris, there are some experimental features.
These feature may not stable or qualified enough, and user need to use it by setting config or session variables,
eg, set enable_mtmv = true, otherwise, these feature is disable by default.

We should explicitly tell user which features are experimental, so that user will notice that and decide whether to
use it.

Changes
In this PR, I support the experimental_ prefix for FE config and session variables.

Session Variable

Given enable_nereids_planner as an example.

The Nereids planner is an experimental feature in Doris, so there is an EXPERIMENTAL annotation for it:

@VariableMgr.VarAttr(..., expType = ExperimentalType.EXPERIMENTAL)
private boolean enableNereidsPlanner = false;
And for compatibility, user can set it by:

set enable_nereids_planner = true;
set experimental_enable_nereids_planner = true;
And for show variables, it will only show experimental_enable_nereids_planner entry.

And you can also see all experimental session variables by:

show variables like "%experimental%"
Config

Same as session variable, give enable_mtmv as an example.

@ConfField(..., expType = ExperimentalType.EXPERIMENTAL)
public static boolean enable_mtmv = false;
User can set it in fe.conf or ADMIN SET FRONTEND CONFIG stmt with both names:

enable_mtmv
experimental_enable_mtmv
And user can see all experimental FE configs by:

ADMIN SHOW FRONTEND CONFIG LIKE "%experimental%";
TODO
Support this feature for BE config

Only add experimental for:

enable_pipeline_engine
enable_nereids_planner
enable_single_replica_insert
and FE config:

enable_mtmv
enabel_ssl
enable_fqdn_mode
Should modify other config and session vars
2023-04-16 18:32:10 +08:00
afdac1204d [improve](postgresql catalog) support postgresql bytea type to doris string (#18623)
* [improve](postgresql catalog) support postgresql bytea type to doris string

* modify function name

* add case
2023-04-16 18:14:42 +08:00
7bc242d665 [regression-test](prepared statement) Fix connection error when test framework used lower jdbc version (#18665) 2023-04-16 18:13:45 +08:00
c12646382d [feature](multicatalog) enable doris hive/iceberg catalog to read data on tencent GooseFS (#18685) 2023-04-16 18:11:57 +08:00
7dd96bc341 [fix](olap) remove zorder support when create table (#18698) 2023-04-16 09:24:18 +08:00
8f0d4ae625 [Fix](fe)Upgrade hive-catalog-shade version to 1.0.3 (#18690) 2023-04-15 22:10:45 +08:00
bcff3710ca [fix] set execution timeout for brokerload and use query timeout when… (#18694)
We should use query timeout if execution timeout is not set to upgrade.
2023-04-15 20:41:04 +08:00
d2efc619b0 [Enchancement](statistics) Show histogram statistics, show specified column statistics (#18657) 2023-04-14 22:36:40 +08:00
f7e129934e [fix](nereids) only order by slot reference could use topn opt (#18622)
select cast(k1 as INT) as id from tbl1 order by id limit 2; 

is not valid for topN optimization, because 'id' is
a cast expr not a table column from scan node.
This pr address this issue.
2023-04-14 20:59:06 +08:00
683d64b361 [Refactor](multi catalog)Remove redundant param context for FileQueryScanNode (#18636)
Remove redundant param context for FileQueryScanNode.
Remove duplicated code for QueryScanProviders.
2023-04-14 20:20:21 +08:00
e1b3955e05 [refactor](jdbc) using jvm parameters to init jdbc datasource (#18670)
using the jvm parameters to init jdbc datasource connect pool.
if anyone don't need to maintain the connect, so could set JDBC_MIN_POOL=0
2023-04-14 18:45:29 +08:00
f2d75cb492 [fix](Nereids) fix signature precision round for decimalv3 (#18639)
add decimalv3 signature to below functions:
ceil
dceil
dfloor
dround
floor
round
round_bankers
truncate
fix ComputePrecisionForRound to get correct signature
2023-04-14 18:18:41 +08:00
5acf764d9c [fix](trino catalog) To specify both catalog and database, run the show table command (#18645)
* [fix](trino catalog) To specify both catalog and database, run the show table command

* fix
2023-04-14 17:51:50 +08:00
362b5a34ae [feat](stats) Support to delete expired stats periodically (#18614)
Support to delete expired stats periodically and manually.

default cleaner running interval is 2 days

Manually clean syntax is
```sql
DROP EXPIRED STATS
```

TODO:
1. process external catalog's stats
2. run drop at the appointed time
3. sleep a short time after drop one batch
2023-04-14 17:32:51 +08:00
65f9db90c8 [feature](nereids) forbid unknown col stats #18617
Add session variable forbid_unknown_col_stats. When this var is true, nereids rejects to use unknown column stats.
the main purpose of this pr is to save debug effort.
2023-04-14 17:13:39 +08:00
5d1abe4507 [Bugfix](Mtmv)Fix mtmv meta load failed (#18605)
MTMV meta load fail since meta was public to the CI System
2023-04-14 16:29:18 +08:00
4174d5a707 [opt](nereids) optimze aggregation estimation #18607
`select count(*) from T group by A, B`
suppose `ndv(A) > ndv(B)`
the estimated row count of aggregate is between ndv(A) and ndv(A) * ndv(B)

in previous version, we choose upper bound, that is ndv(A) * ndv(B). The drawback of this choice is the estimated row is often bigger that row count of T.

In this version, we choose the lower bound.
2023-04-14 16:13:25 +08:00
73e087d79c [feature](Nereids): support eager agg for Plan inside project. (#18637) 2023-04-14 15:30:33 +08:00
9634d21a28 [fix](info_db) avoid infodb query timeout when external catalog info is too large or is not reachable (#18662)
When query tables in information_schema databases, it may timeout due to:

There are external catalog with too many tables.
The external catalog is unreachable
So I add a new FE config infodb_support_ext_catalog.
The default is false, which means that when select from tables in information_schema database,
the result will not contain the information of the table in external catalog.

Describe your changes.
2023-04-14 14:40:31 +08:00
db5ec6f6b0 [FIX](thrift)Fix with 1.2 version for thrift #18658 2023-04-14 14:07:42 +08:00
4d18ea30f4 [fix](Nereids) get_json_bigint should return bigint type (#18626) 2023-04-14 14:01:44 +08:00
e009c459bf [enhancement](planner) remove date function if its child's type is date (#18593)
if we have expr like below
```
date(c1) -- c1's type is date or datev2
```
the expr's result is exactly same with c1, and we should
remove date function. This expr optimization will simplify
expr, speed up execution and increase the opportunity of
push filters to storage layer.
2023-04-14 14:01:20 +08:00
008ae4984b [feature](Nereids): convert rightSemi to leftSemi for matching more rule. (#18648) 2023-04-14 11:20:22 +08:00
e6b0e05840 [fix](Nerieds) Fix some bugs in binding and type coercion (#18548)
1. fix bind ambiguous slots exception because select same slots
2. fix bind SetOperation multiple times because CTE
3. fix case when clause not coercion to same type
4. fix an exception when set_var hint exists in subquery or CTE
2023-04-14 11:00:24 +08:00