To be more compatible with MySQL, rename JSONB type name and function name to JSON.
The old JSONB type name and jsonb_xx function can still be used for backward compatibility.
There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.
1. add json_unquote and json_extract functions
2. remove mv releated code in visitPhysicalOlapScan
3. forbid bitmap and hll type for topn node's sort exprs
4. HashDistributionInfo of olap scan node should use the slots from output not the full schema
5. SelectMaterializedIndexWithoutAggregate should use the filter node's output together with the predicate to get the correct mv
6. forbid SimplifyArithmeticRule for decimal type
7. make DecimalLiteral's type and value consistent with each other if the value is decimalv2
8. json_array need support empty argument
Fix threes bugs of timestampv2 precision:
1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2;
2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost.
3. TVF doesn't use the precision from meta data of file format.
We can support this by add a new properties for tvf, like :
`select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)`
User can:
Specify compression explicitly by setting `"compression" = "xxx"`.
Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`)
Currently, we only support reading compress file in `csv` format, and on BE side, we already support.
All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
We shouldn't push top Project of JoinCluster in PushdownProjectThroughJoin
like
```
* Project (id + 1) if this project is top project of Join Cluster
* |
* Join
* / \
* Join Join
* / ....
* Join
```
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
1. fixed some error regressions (results error with big nstance_num due to incorrect order by).
2. if set parallel_fragment_exec_instance_num to 0, the concurrency in the Pipeline execution engine will automatically be set to half of the number of CPU cores.
3. add limit to parallel_fragment_exec_instance_num that it cannot be set to more than fe.conf::max_instance_num(Default: 128)
```
mysql [(none)]>set parallel_fragment_exec_instance_num = 514;
ERROR 1231 (42000): errCode = 2, detailMessage = Variable 'parallel_fragment_exec_instance_num' can't be set to the value of '514(Should not be set to more than 128)'
```
change tpcds q67 testcase name to q67_ignore_temporarily since the error should be ignored temporarily
due to precision problem that will be fixed further.
Add table level statistics, support SHOW TABLE STATS statement to show table level statistics.
Implement automatically analyze statistics, support ANALYZE... WITH AUTO ... statement to automatically analyze statistics.
TODO:
collate relevant p0 tests
Supplement the design description to README.md
Issue Number: close #xxx
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:
support periodic collection of statistics.
Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.
In my scene, We need to specify databases that are excluded to synchronize to doris,
like some databases store temporary table.
Since #17803 introduce `specified_database_list` to specify 'include databases',
this pr introduce new config `exclude_database_list` to specify 'exclude databases',
and rename `specified_database_list` to `include_database_list` for naming symmetry.
BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.