#12716 removed the mem limit for single load task, in this PR I propose to remove the session variable load_mem_limit, to avoid confusing.
For compatibility, load_mem_limit in thrift not removed, the value is set equal to exec_mem_limit in FE
Disable max_dynamic_partition_num check when disable DynamicPartition by ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false"), when max_dynamic_partition_num changed to larger and then changed to a lower value, the actual dynamic partition num may larger than max_dynamic_partition_num, and cannot disable DynamicPartition
Add restore new property 'reserve_dynamic_partition_enable', which means you can
get a table with dynamic_partition_enable property which has the same value
as before the backup. before this commit, you always get a table with property
'dynamic_partition_enable=false' when restore.
* squash
change data type of metrics to double
unit test
add stats for some function
add stats for arithmeticExpr
1. set max/min of ColumnStats to double
2. add stats for binaryExpr/compoundExpr
in predicate
* Add LiteralExpr in ColumnStat just for user display only.
The key keyword definition section of `sql_parser.cup` is unordered and messy:
1. It is almost unreadable
2. There are no rules to format it when we make a change to it
3. **It takes unnecessary effort to resolve conflict caused by the unordered keywords**
We can apply some simple rules to format it:
1. Sort in lexicographical order
4. Break into several "sections", keywords in each section have the same prefix `KW_${first_letter}`
5. Every 2 sections are connected with an empty line containing only 4 white spaces
e.g.
```
terminal String
KW_A...
KW_B...
...
KW_Z...
```
dump memo info and physical plan in stdout and log
set `enable_nereids_trace` variable true/false to open/close this dump.
following is a fragment of memo:
```
Group[GroupId#8]
GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#6 GroupId#7 ] stats=(rows=25, isReduced=false, width=2)
GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#7 GroupId#6 ] stats=(rows=25, isReduced=false, width=2)
```
The toThrift method will be called mutilple times for sending data to different be but the changes of resolvedTupleExprs should be done only once. This pr make sure the resolvedTupleExprs can only be changed only once
This pr is mainly to optimize statistical tasks. Includes the following:
1. No longer generate statistics tasks for empty tables, and move the logic of skipping empty partitions to the process of task generation.
2. Adjusted the default configuration related to statistics to improve the efficiency of statistics collection, parameters include `cbo_concurrency_statistics_task_num`,`statistic_job_scheduler_execution_interval_ms` and `statistic_task_scheduler_execution_interval_ms`.
3. Optimize the display of statistical tasks.
4. In addition, some `org.apache.parquet.Strings` packages are changed to `com.google.common.base.Strings` to avoid the exception that Strings cannot be found in local debug.
etc.
Add partition info into LogicalAggregate and set it as original group expression list of aggregate when we do aggregate disassemble with distinct aggregate function.
optimize planner by:
1. reduce duplicated calculation on equals, getOutput, computeOutput eq.
2. getOnClauseUsedSlots: the two side of equalTo is centainly slot, so not need to use List.
the basic idea of star-schema support is:
1. fact_table JOIN dimension_table, if dimension table are filtered, the result can be regarded as applying a filter on fact table.
2. fact_table JOIN dimension_table, if the dimension table is not filtered, the number of join result tuple equals to the number of fact tuples.
3. dimension table JOIN fact table, the number of join result tuple is that of fact table or 2 times of dimension table.
If star-schema support is enabled:
1. nereids regard duplicate key(unique key/aggregation key) as primary key
2. nereids try to regard one join key as primary key and another join key as foreign key.
3. if nereids found that no join key is primary key, nereids fall back to normal estimation.
Currently, we always disassemble aggregation into two stage: local and global. However, in some case, one stage aggregation is enough, there are two advantage of one stage aggregation.
1. avoid unnecessary exchange.
2. have a chance to do colocate join on the top of aggregation.
This PR move AggregateDisassemble rule from rewrite stage to optimization stage. And choose one stage or two stage aggregation according to cost.