Many feature in fe session variable is disabled by default. So that these features do not pass github workflow test actually. I add a fuzzy test config in fe.conf. If it is set to true, then we will use fuzzy session variables for every connection so that every feature developer could set fuzzy values for its config.
Co-authored-by: yiguolei <yiguolei@gmail.com>
1. Reduce the configuration options for statistics framework, and add comment for those rest.
2. Move the logic of creation of analysis job to the `StatisticsRepository` which defined all the functions used to interact with internal statistics table
3. Move AnalysisJobScheduler to the statistics package
4. Support display and injections manually for statistics
boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace.
The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace.
refer to:
boostorg/stacktrace#118boostorg/stacktrace#111
in origin algorithm, the penalty is abs(leftRowCount - RightRowCount). this will make some right deep tree escape from penalty, because the substraction is almost zero. Penalty by RightRowCount can avoid this escape.
1. when we translate colocated join, we lost RF information attached to the right child, and hence BE will not generate those RFs.
2. when a RF is useless, we prune all RFs on the scan node by mistake
Fix three bugs:
1. The EOF of lazy read columns may be not equal to the EOF of predicate columns.
(for example: If the predicate column has 3 pages, with 400 rows for each, but the last page
is filtered by page index. When batch_size=992, the EOF of predicate column is true.
However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.)
2. The array column does not count the number of nulls
3. Generate wrong NullMap for array column
When calling select on remote files, download cache files to local disk.
When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.
Create partitions use :
```
PARTITION BY RANGE(event_day)(
FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR,
FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH,
FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK,
FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY,
PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15'))
)
PARTITION BY RANGE(event_time)(
FROM ("2023-01-03 12") TO ("2023-01-14 22") INTERVAL 1 HOUR
)
```
can create a year/month/week/day/hour's date partitions in a batch,
also it is compatible with the single partitioning method.
## Problem summary
This pr support
1. `numbers` TableValuedFunction for nereids test, like `select * from numbers(number = 10, backend_num = 1)`
2. bitmap/hll aggregate function
3. support find variable length function in function registry, like `coalesce`
4. fix a bug that print nerieds trace will throw exception because use RewriteRule in ApplyRuleJob, e.g: `AggregateDisassemble`, introduced by #13957