1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
We shouldn't push top Project of JoinCluster in PushdownProjectThroughJoin
like
```
* Project (id + 1) if this project is top project of Join Cluster
* |
* Join
* / \
* Join Join
* / ....
* Join
```
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
1. fixed some error regressions (results error with big nstance_num due to incorrect order by).
2. if set parallel_fragment_exec_instance_num to 0, the concurrency in the Pipeline execution engine will automatically be set to half of the number of CPU cores.
3. add limit to parallel_fragment_exec_instance_num that it cannot be set to more than fe.conf::max_instance_num(Default: 128)
```
mysql [(none)]>set parallel_fragment_exec_instance_num = 514;
ERROR 1231 (42000): errCode = 2, detailMessage = Variable 'parallel_fragment_exec_instance_num' can't be set to the value of '514(Should not be set to more than 128)'
```
change tpcds q67 testcase name to q67_ignore_temporarily since the error should be ignored temporarily
due to precision problem that will be fixed further.
Add table level statistics, support SHOW TABLE STATS statement to show table level statistics.
Implement automatically analyze statistics, support ANALYZE... WITH AUTO ... statement to automatically analyze statistics.
TODO:
collate relevant p0 tests
Supplement the design description to README.md
Issue Number: close #xxx
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:
support periodic collection of statistics.
Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.
In my scene, We need to specify databases that are excluded to synchronize to doris,
like some databases store temporary table.
Since #17803 introduce `specified_database_list` to specify 'include databases',
this pr introduce new config `exclude_database_list` to specify 'exclude databases',
and rename `specified_database_list` to `include_database_list` for naming symmetry.
BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.
Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future.
This will result the dict string column query return wrong result, if there are many null values in the column.
I also add some regression test for dict column's equal query, larger than query, less than query.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
1. Added test p0 for sampling collection statistics
2. Modify the uniqueKeys of table analysis_jobs for deletion based on relevant conditions
3. Solve the problem that incremental statistics p0 is less stable
This pr mainly optimizes the following items:
- the collection of statistics: clear up invalid historical statistics before collecting them, so as not to affect the final table statistics.
- the incremental collection of statistics: in the case of incremental collection, only the corresponding partition statistics need to be collected.
TODO: Supports incremental collection of materialized view statistics.
Dynamic mode used in array type when serialize it to mysql row buffer using dynamic mode, when combine binary row format with dynamic mode,something goes wrong, and lead to invalid binary row format.
since we cannot do stats derive and cost estimate on agg very good.
this PR remove some aggregate pattern that usually not good.
1. one stage agg after exchange. this pattern is good only when process very few rows.
2. three stage distinct agg with gather middle merge.
Hive support create partition with a specific location. In this case, the file path for the create partition may not contain the partition name and value. Which will cause Doris fail to query the the hive partition.
This pr is to fix this bug.
segcompaction_p1 contains fairly large load jobs, which will exceed
memlimit or timeout in pipeline under such heavy loads.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Fix bug when reading array type in parquet file:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed,
reason = [IO_ERROR]Decode too many values in current page
```
When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong.
The situation where this error occurs is particularly extreme: The column pages have remaining values to be read,
but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.