Commit Graph

5755 Commits

Author SHA1 Message Date
097dcf2119 [fix](outfile) unify broker and hdfs path in outfile (#18809)
unify broker and hdfs path in outfile
fix fe ut and add outfile case
2023-04-20 21:01:39 +08:00
94509e51af [fix](editLog) add sufficient replay logic and edit log for altering light schema change (#18746) 2023-04-20 19:20:03 +08:00
c4e469c82c [feature](agg) Support spill to disk in aggregation (#18051) 2023-04-20 18:59:08 +08:00
668c681fbc [Fix](Nereids) Check bound status in analyze straight after bounding (#18581)
Probleam:
Dead loop cause of keep pushing analyze tasks into job stack. When doing analyze process and generate new operators, the same analyze rule would be pushed again, so it cause dead loop. And analyze process generate new operators when trying to bound order by key and aggregate function.

Solve:
We need to make it throw exception before complex analyze and rewrite process, so checking whether all expressions being bound should be done twice. One is done after bounding all expression, another is done after all analyze process in case of generate new expressions and new operators.

Example:
Cases were put in file: regression-test/suites/nereids_p0/except/test_bound_exception.groovy
2023-04-20 18:50:13 +08:00
8e2146f48c [Enhencement](Export) support export with outfile syntax (#18325)
`Export` syntax provides asynchronous export function, but `Export` does not achieve vectorization.
`Outfile` syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.
2023-04-20 17:27:04 +08:00
ea795b9909 [fix](nereids)disable SelectMaterializedIndexWithAggregate rule (#18380)
* [fix](nereids)disable SelectMaterializedIndexWithAggregate rule

* rebase code

* disable related test cases

* remove failed test cases for now
2023-04-20 17:02:36 +08:00
918a244068 [chore](pom) update apache pom to 29 (#18843) 2023-04-20 16:57:05 +08:00
c659e0bfc7 [Improvement](bloom filter) adjust bloom filter size (#18846) 2023-04-20 16:50:22 +08:00
3644dfa9fd [fix](Nereids) stddev functions not support decimalv3 type arg (#18840) 2023-04-20 14:54:12 +08:00
52d32cccad [enhance](Nereids): check cycle by getParentGroupExpressions(). (#18687) 2023-04-20 11:51:58 +08:00
3328a65b75 [Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835)
Fix decimal v3 precision loss issues in the multi-catalog module.
Now it will use decimal v3 to represent decimal type in the multi-catalog module.
Regression Test: `test_load_with_decimal.groovy`
2023-04-20 11:02:53 +08:00
33d4c60570 [RegressionTest](fuzzy) enable set global enable_pipeline_engine (#18832)
enable set global enable_pipeline_engine
2023-04-20 10:38:11 +08:00
Pxl
c40860aba4 [Chore](thrift) generate thrift java code to make code analysis work well (#18793)
generate thrift java code to make code analysis work well
2023-04-19 19:33:17 +08:00
fb377a9da9 [Improvement](functions)Optimized some datetime function's return value (#18369) 2023-04-19 15:51:11 +08:00
1f5f5a12b6 [fix](Nereids): need update parentExpression after replace child. (#18771) 2023-04-19 15:13:42 +08:00
93b35bbfbf [feature](multi-catalog) add catalog comment and create time info (#18778)
add catalog comment and create time info
```
create catalog hms_ctl
comment 'your comment' 
properties (
'type'='hms',
'hive.metastore.uris' = 'thrift://xx:1234' );
```
Create Time will generate when the catalog is created.

use show catalogs and show create catalog to get these info.
2023-04-19 15:08:42 +08:00
1a25f110ec [Fix](planner)Fix TupleDescriptor include not materialized slot bug (#18783)
setOutputSmap function in ScanNode may include not materialized to outputTupleDesc. This PR is to fix this.
2023-04-19 14:08:09 +08:00
446db3def6 [opt](nereids) estimate broadcast cost by a new formula (#18744)
estimate broadcast cost by an experience formula: beNumber^0.5 * rowCount
1. sender number and receiver number is not available at RBO stage now, so we use beNumber
2. senders and receivers work in parallel, that why we use square of beNumber
2023-04-19 12:14:55 +08:00
15529afed8 [minor](decimal) forbid to create table with decimal type exceeds 18 (#18763)
* [minor](decimal) forbid to create table with decimal type exceeds 18

* update
2023-04-19 11:34:27 +08:00
0b379de602 [refactor](scan) optimize the agg function of count(1) (#18739) 2023-04-19 09:10:51 +08:00
d24a8a524e [refactor](fe): Remove resource group which is useless (#18249) 2023-04-18 21:04:30 +08:00
5c076b738b [improvement](resource-group) add test for resource group (#18575)
Co-authored-by: wangbo <youseebiggirl_t_t@qq.com>
2023-04-18 20:20:50 +08:00
4a16eff16d [fix](merge-on-write) enable_unique_key_merge_on_write property should only be used for unique table (#18734) 2023-04-18 18:40:01 +08:00
031d35d4a1 [fix](stats) Stats still in cache after user dropped it (#18720)
1. Evict the dropped stats from cache
2. Remove codes for the partition level stats collection
3. Disable analyze whole database directly
4. Fix the potential death loop in the stats cleaner
5. Sleep thread in each loop when scanning stats table to avoid excessive IO usage by this task.
2023-04-18 16:41:10 +08:00
c3f808cc06 Revert "[enhancement](Nereids) optimize bloom filter size reducing strategy (#18596)" (#18768)
This reverts commit 3eac53f75d5f3eb05e958403efeb7578ad86e438.
2023-04-18 15:37:19 +08:00
62e4140d17 [fix](olap) fix lost disable_auto_compaction info when fe restart (#18757)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-18 14:11:40 +08:00
6b351a2818 [vectorzied](function) fix array_map function analyzed failed with order by clause (#18676)
* [vectorzied](function) fix array_map function analyzed failed with order by clause

* add test
2023-04-18 12:01:44 +08:00
3a6eae0ec5 [feature](Nereids): infer not null from Agg Count(distinct). (#18599) 2023-04-18 11:22:36 +08:00
98b8efc2c2 [fix](multi-catalog)fix old s3 properties check (#18430)
fix old s3 properties check
fix for #18005 (comment)
2023-04-18 09:58:13 +08:00
10b252856d [feature](Nereids): pullup semiJoin through aggregate. (#18669) 2023-04-18 09:31:07 +08:00
86b8e95045 [fix](Nereids): when GroupExpr already exists, we need to remove ParentExpression (#18749) 2023-04-17 23:12:26 +08:00
575c1620c2 [Improve](fe)Use commons-lang3 uniformly and refactor PatternGenerator#generateTypePattern (#18666)
`commons-lang`(1and2) is no longer maintained since 2011, and the official recommendation is `commons-lang3`, which can be smoothly upgraded to be compatible with `commons-lang`.
We use both dependencies in `fe`, which can be completely unified.

`PatternGenerator#generateTypePattern` has many meaningless loops, and IntegerRange is introduced for,
which is unnecessary. So I refactored it.
2023-04-17 20:15:17 +08:00
74d424e6d4 [Bug](DECIMAL) Fix bug for arithmatic expr DECIMALV2 / DECIMALV3 (#18723) 2023-04-17 16:43:36 +08:00
d61f52d277 [fix](Nereids): fix sum func in eager agg (#18675) 2023-04-17 15:06:28 +08:00
5300b21db7 [Bug](DECIMALV3) report failure if a decimal value is overflow (#18336) 2023-04-17 13:18:14 +08:00
eb128753ac [Opt](pipeline) opt pipeline shared scan (#18715) 2023-04-17 13:06:39 +08:00
a2278dbc6c [opt](nereids) optimize filter estimation for pattern "col=col" #18716
Tpc-h q10 and q5 benefit from this optimization.

For a given hash join condition, A=B, sometimes both A and B are reduced by filters. In this pr, both reductions are counted in join estimation.
2023-04-17 11:44:35 +08:00
b5b0148010 [feature](Nereids): when cost time > 5s, throw timeout Exception (#18316) 2023-04-17 11:21:54 +08:00
3eac53f75d [enhancement](Nereids) optimize bloom filter size reducing strategy (#18596) 2023-04-17 10:50:08 +08:00
ddbff2aa39 [feature](jni) map c++ block to java vector table (#18566)
PR(#17960) has introduced vector table which can map java table to c++ block.
In some cases(java udf & jdbc exector), we should map c++ block to java table. This PR implements this function.

The memory structure of java vector table and c++ block is consistent,
so the implementation doesn't copy the block, just passes the memory address.
2023-04-17 00:04:53 +08:00
57982ddc46 [Fix](catalog)Fix hudi-catalog get file split error (#18644) (#18673)
`hudi-common` depends on `parque-avro`, but the dependency scope is `provide`. 
When we use `hudi-catalog`, `HoodieAvroWriteSupport` will be called. This method depends on `parque-avro`, so it will generate ClassNotFound
Describe your changes.
2023-04-16 21:56:14 +08:00
e6884a3768 [log](fe) add more detail log for master transfer (#17350) (#17485) 2023-04-16 18:35:06 +08:00
1cbbc60822 [feature](config) support "experimental" prefix for FE config (#18699)
For each release of Doris, there are some experimental features.
These feature may not stable or qualified enough, and user need to use it by setting config or session variables,
eg, set enable_mtmv = true, otherwise, these feature is disable by default.

We should explicitly tell user which features are experimental, so that user will notice that and decide whether to
use it.

Changes
In this PR, I support the experimental_ prefix for FE config and session variables.

Session Variable

Given enable_nereids_planner as an example.

The Nereids planner is an experimental feature in Doris, so there is an EXPERIMENTAL annotation for it:

@VariableMgr.VarAttr(..., expType = ExperimentalType.EXPERIMENTAL)
private boolean enableNereidsPlanner = false;
And for compatibility, user can set it by:

set enable_nereids_planner = true;
set experimental_enable_nereids_planner = true;
And for show variables, it will only show experimental_enable_nereids_planner entry.

And you can also see all experimental session variables by:

show variables like "%experimental%"
Config

Same as session variable, give enable_mtmv as an example.

@ConfField(..., expType = ExperimentalType.EXPERIMENTAL)
public static boolean enable_mtmv = false;
User can set it in fe.conf or ADMIN SET FRONTEND CONFIG stmt with both names:

enable_mtmv
experimental_enable_mtmv
And user can see all experimental FE configs by:

ADMIN SHOW FRONTEND CONFIG LIKE "%experimental%";
TODO
Support this feature for BE config

Only add experimental for:

enable_pipeline_engine
enable_nereids_planner
enable_single_replica_insert
and FE config:

enable_mtmv
enabel_ssl
enable_fqdn_mode
Should modify other config and session vars
2023-04-16 18:32:10 +08:00
afdac1204d [improve](postgresql catalog) support postgresql bytea type to doris string (#18623)
* [improve](postgresql catalog) support postgresql bytea type to doris string

* modify function name

* add case
2023-04-16 18:14:42 +08:00
7bc242d665 [regression-test](prepared statement) Fix connection error when test framework used lower jdbc version (#18665) 2023-04-16 18:13:45 +08:00
c12646382d [feature](multicatalog) enable doris hive/iceberg catalog to read data on tencent GooseFS (#18685) 2023-04-16 18:11:57 +08:00
7dd96bc341 [fix](olap) remove zorder support when create table (#18698) 2023-04-16 09:24:18 +08:00
8f0d4ae625 [Fix](fe)Upgrade hive-catalog-shade version to 1.0.3 (#18690) 2023-04-15 22:10:45 +08:00
bcff3710ca [fix] set execution timeout for brokerload and use query timeout when… (#18694)
We should use query timeout if execution timeout is not set to upgrade.
2023-04-15 20:41:04 +08:00
d2efc619b0 [Enchancement](statistics) Show histogram statistics, show specified column statistics (#18657) 2023-04-14 22:36:40 +08:00