Commit Graph

2873 Commits

Author SHA1 Message Date
5e0c34b35a [fix](join) should call getOutputTblRefIds to get child's tuple info (#13227)
* [fix](join) should call getOutputTblRefIds to get child's tuple info
2022-10-14 09:46:14 +08:00
87e5e2b48b [Fix](array-type) Disable schema change between array type columns (#13261)
Currently, we do not support schema change between array type columns.
We should forbid users from doing this operation.
2022-10-13 22:59:09 +08:00
cb300b0b39 [feature](agg) support any,any_value agg functions. (#13228) 2022-10-13 18:31:19 +08:00
fe1524a287 [Enhancement](load) remove load mem limit (#13111)
#12716 removed the mem limit for single load task, in this PR I propose to remove the session variable load_mem_limit, to avoid confusing.

For compatibility, load_mem_limit in thrift not removed, the value is set equal to exec_mem_limit in FE
2022-10-13 17:19:22 +08:00
4a6eb01ccb [refactor](Nereids): refactor UT by using Pattern and rename to remove consecutive (#13337)
* rename

* refactor UT
2022-10-13 16:41:51 +08:00
0ff04e81bc [fix](DynamicPartition) Not check max_dynamic_partition_num when disable DynamicPartition (#13267)
Disable max_dynamic_partition_num check when disable DynamicPartition by ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false"), when max_dynamic_partition_num changed to larger and then changed to a lower value, the actual dynamic partition num may larger than max_dynamic_partition_num, and cannot disable DynamicPartition
2022-10-13 14:37:39 +08:00
db7f955a70 [improve](Nereids): split otherJoinCondition with List. (#13216)
* split otherJoinCondition with List.
2022-10-13 13:49:46 +08:00
4248c6f37c [improve](Nereids): avoid duplicated stats derive. (#13293) 2022-10-13 13:49:21 +08:00
e08ba8d573 [feature](restore) Add new property 'reserve_dynamic_partition_enable' to restore statement (#12498)
Add restore new property 'reserve_dynamic_partition_enable', which means you can
get a table with dynamic_partition_enable property which has the same value
as before the backup. before this commit, you always get a table with property
'dynamic_partition_enable=false' when restore.
2022-10-13 11:16:15 +08:00
7147c77f22 [Enhancement](broker)Doris support obs broker load (#12781)
1. Upgrade fs_broker module hadoop2.7.3->hadoop2.8.3
2. Support obs broker load

org.apache.doris.broker.hdfs.FileSystemManager add getOBSFileSystem method
2022-10-13 09:44:13 +08:00
1bd14f1d82 [feature-wip](jsonb) jsonb parse function and load (#13129)
add function to parse json string to jsonb format and use it to support stream load.
2022-10-12 13:56:37 +08:00
Pxl
5c68f69362 [improvement](config) set enable_local_exchange default value to true (#13292) 2022-10-12 09:07:24 +08:00
3c5e7e2f24 [feature](nereids) refactor statistics framework and introduce StatsCalculatorV2 (#12987)
* squash

change data type of metrics to double

unit test

add stats for some function

add stats for arithmeticExpr

1. set max/min of ColumnStats to double
2. add stats for binaryExpr/compoundExpr

in predicate

* Add LiteralExpr in ColumnStat just for user display only.
2022-10-11 17:23:49 +08:00
5af1439934 [feature](auth) support user password policy and alter user stmt (#13051) 2022-10-11 16:37:35 +08:00
b5da751c2a [enhancement](Nereids) remove redundant log when fall back to legacy parser (#13243) 2022-10-11 10:53:07 +08:00
f007e0aed0 [fix](statstics) Incorrectly using the number of buckets to determine whether the table is partitioned (#13218) 2022-10-10 17:22:24 +08:00
63903136c4 [refactor](jcup) Format keywords in sql_parser.cup (#13133)
The key keyword definition section of `sql_parser.cup` is unordered and messy:
1. It is almost unreadable
2. There are no rules to format it when we make a change to it
3. **It takes unnecessary effort to resolve conflict caused by the unordered keywords**

We can apply some simple rules to format it:
1. Sort in lexicographical order
4. Break into several "sections", keywords in each section have the same prefix `KW_${first_letter}`
5. Every 2 sections are connected with an empty line containing only 4 white spaces

e.g.

```
terminal String
    KW_A...

    KW_B...

    ...

    KW_Z...
```
2022-10-10 14:34:51 +08:00
375dfedd83 [feature](nereids) dump physical tree and memo (#13091)
dump memo info and physical plan in stdout and log
set `enable_nereids_trace` variable true/false to open/close this dump.

following is a fragment of memo:
```
Group[GroupId#8]
GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#6 GroupId#7 ] stats=(rows=25, isReduced=false, width=2)
GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#7 GroupId#6 ] stats=(rows=25, isReduced=false, width=2)
```
2022-10-10 13:05:28 +08:00
e829061614 [fix](sort)should not change resolvedTupleExprs in toThrift method (#13211)
The toThrift method will be called mutilple times for sending data to different be but the changes of resolvedTupleExprs should be done only once. This pr make sure the resolvedTupleExprs can only be changed only once
2022-10-10 08:39:58 +08:00
15fc3c2c89 [enhancement](statistics) optimize the default configuration related to statistics, etc. (#13136)
This pr is mainly to optimize statistical tasks. Includes the following:
1. No longer generate statistics tasks for empty tables, and move the logic of skipping empty partitions to the process of task generation.
2. Adjusted the default configuration related to statistics to improve the efficiency of statistics collection, parameters include `cbo_concurrency_statistics_task_num`,`statistic_job_scheduler_execution_interval_ms`  and `statistic_task_scheduler_execution_interval_ms`.
3. Optimize the display of statistical tasks.
4. In addition, some `org.apache.parquet.Strings` packages are changed to `com.google.common.base.Strings` to avoid the exception that Strings cannot be found in local debug.

etc.
2022-10-09 16:34:20 +08:00
da933ecd21 [fix](Nereids) plan broadcast on right semi join by mistake (#13206) 2022-10-09 16:32:12 +08:00
ece4a6c194 [doc][fix](multi-catalog) add doc for multi catalog and fix refresh bug (#13097)
1. Add all document about multi catalog feature.
2. Fix a bug that REFRESH edit log is not handled
2022-10-09 09:14:44 +08:00
869fe2bc5d [Improvement](outfile) Support ORC format in outfile (#13019) 2022-10-08 20:56:32 +08:00
63f5dc1953 [feature](Nereids): support Alias join reorder and fix bug. (#12890)
* [improve](Nereids): simplify onCondition check.

* feature: support project Alias for join reorder.
2022-10-08 10:45:04 +08:00
7b75c2df54 [fix](BE) fix the stream load error when upgrade BE from 1.1.2 to master (#13058) 2022-10-05 12:13:26 +08:00
4a0b4f1836 [fix](fe-test) TestWithFeService do not clean up dorisHome (#13073) 2022-10-04 21:32:27 +08:00
b083fb6d5f [fix](decimal) retain Decimal trailing zero when select on fe (#13065) 2022-10-04 21:31:18 +08:00
74fc98ceeb [improvement](ResourceTag) support upper case in tag name (#13063) 2022-10-04 21:30:37 +08:00
3f47f67b16 [fix](parquet) fix parquet write setting property is not effective (#12912) 2022-10-04 21:25:57 +08:00
e167aa120f [fix](jdbc) fix insert into date type to oracle using wrong type (#12883)
using JDBC insert into date type to ORACLE,
it's should be use to_date function convert string to java.sql.date
2022-10-04 21:24:33 +08:00
b53533408b not allow alter mow property (#13108) 2022-10-03 21:31:09 +08:00
d44af5decf [fix](alter-load) fix bug that tablet version may be wrong when doing alter and load (#13070)
the `isRunning()` method of `TransactionState` is missing `PRE_COMMITTED` status.
Which cause wrong judgment of `isPreviousTransactionsFinished`
2022-09-30 23:39:30 +08:00
95561baddd [fix](planner) throw NPE when all group by expr is constant and no agg expr in select list (#13087) 2022-09-30 18:47:01 +08:00
90f11ed7c1 [enhancement](Nereids) remove unnecessary exchange between global and distinct local aggregate node (#13057)
Add partition info into LogicalAggregate and set it as original group expression list of aggregate when we do aggregate disassemble with distinct aggregate function.
2022-09-29 23:12:37 +08:00
31a23baa37 [fix](planner) Add default execution interval time for stats framework (#13044)
Set a default execution interval for stats collection related threads.
2022-09-29 22:40:27 +08:00
7aae98eb71 [fix](comment) sparkload comment mislead which file types it support (#12982) 2022-09-29 20:23:57 +08:00
287ff50a6f [Bug](datev2) Fix compatible error between datev2 and date (#13024) 2022-09-29 18:01:55 +08:00
a7b42a7029 [Fix](Nereids) Fix exception message when can't bind slot. (#13048) 2022-09-29 16:51:07 +08:00
42729786bf [enhancement](Nereids) push filter into join otherJoinCondition (#12842) 2022-09-29 16:19:30 +08:00
1ae9454771 [enhancement](Nereids) planner performance speed up (#12858)
optimize planner by:
1. reduce duplicated calculation on equals, getOutput, computeOutput eq.
2. getOnClauseUsedSlots: the two side of equalTo is centainly slot, so not need to use List.
2022-09-29 16:01:10 +08:00
fae7296336 [Enhancement](fe-core) make UT-SelectRollupTest more stable (#13030) 2022-09-29 14:25:01 +08:00
c2fae109c3 [Improvement](outfile) Support output null in parquet writer (#12970) 2022-09-29 13:36:30 +08:00
d53205076e [feature](Nereids) implicit cast StringLiteral to another side type of BinaryOperator if available (#13038)
for expression 5 > '1'. before this PR, we normalize it to '5' > '1'. After this PR, we normalize it to 5 > 1 to compatible with legacy planner.
2022-09-28 21:34:25 +08:00
d739aa7c53 [enhancement](Nereids) optimization for star-schema join reorder (#12817)
the basic idea of star-schema support is:
1. fact_table JOIN dimension_table, if dimension table are filtered, the result can be regarded as applying a filter on fact table.
2. fact_table JOIN dimension_table, if the dimension table is not filtered, the number of join result tuple equals to the number of fact tuples.
3. dimension table JOIN fact table, the number of join result tuple is that of fact table or 2 times of dimension table.

If star-schema support is enabled:
1. nereids regard duplicate key(unique key/aggregation key) as primary key
2. nereids try to regard one join key as  primary key and another join key as foreign key.
3. if nereids found that no join key is primary key, nereids fall back to normal estimation.
2022-09-28 21:09:55 +08:00
7019166469 [enhancement](Nereids) let BinaryArithmetic's dataType and nullable match with BE (#13015)
Do type promotion for BinaryArithmetic:
- Add
- Subtract
- Multiply

Do always nullable for:
- Mod
2022-09-28 20:02:27 +08:00
28ce1878ca [fix](planner) fix push down no grouping agg (#12983)
The value column of the agg does not support zone_map index, fixing the value column pushing down to zone map causes null pointer.
2022-09-28 17:01:01 +08:00
1b1f13ec84 [optimization](array-type) optimize error prompts when sql parser report error (#12999)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-28 14:35:41 +08:00
eef9367705 [feature](Nereids) use one stage aggregation if available (#12849)
Currently, we always disassemble aggregation into two stage: local and global. However, in some case, one stage aggregation is enough, there are two advantage of one stage aggregation.
1. avoid unnecessary exchange.
2. have a chance to do colocate join on the top of aggregation.

This PR move AggregateDisassemble rule from rewrite stage to optimization stage. And choose one stage or two stage aggregation according to cost.
2022-09-28 10:38:03 +08:00
339877930d [fix](join)report 'natural join is not supported' instead of getting wrong result (#13008)
* [fix](join)report 'natural join is not supported' instead of getting wrong result

* add regression test
2022-09-28 09:08:56 +08:00
d80b7b9689 [feature-wip](new-scan) support more load situation (#12953) 2022-09-27 21:48:32 +08:00