Fix broker load p2 test case error.
1. Move test data from cos Hong kong region to Beijing region.
2. Move broker load test to p2 group.
3. Fix error message mismatch error.
* Add unexpected/result support
* Rename result.hpp -> result.h && Add NOLINT in expected.hpp
* Add NOLINT in result.h to avoid clang-tidy checker
* Rename result.h to expected.h
* Add Apache License for be/src/util/expected.hpp
* Disable clang-format in be util/expected.hpp
1. Supports sampling to collect statistics
2. Improved syntax for collecting statistics
3. Support histogram specifies the number of buckets
4. Tweaked some code structure
---
The syntax supports WITH and PROPERTIES, using the same syntax as before.
Column Statistics Collection Syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
[ (column_name [, ...]) ]
[ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ]
[ PROPERTIES ('key' = 'value', ...) ];
```
Column histogram collection syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
[ (column_name [, ...]) ]
UPDATE HISTOGRAM
[ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ]
[ PROPERTIES ('key' = 'value', ...) ];
```
Illustrate:
- sync:Collect statistics synchronously. Return after collecting.
- incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported.
- sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled.
- buckets:Specifies the maximum number of buckets generated when collecting histogram statistics.
- table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`.
- column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas.
- properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement)
- 'sync' = 'true'
- 'incremental' = 'true'
- 'sample.percent' = '50'
- 'sample.rows' = '1000'
- 'num.buckets' = 10
---
TODO:
- Supplement the complete p0 test
- `Incremental` statistics see #18653
add two phase read topn opt, the legacy planner's PR are:
- #15642
- #16460
- #16848
TODO:
we forbid limit(sort(project(scan))) since be core when plan has a project on the scan.
we need to remove this restirction after we fix be bug
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Including below functions:
1. broker load
2. export
3. select into outfile
4. create repo and backup to gfs
after config env, use gfs like other hdfs system.
Probleam:
Dead loop cause of keep pushing analyze tasks into job stack. When doing analyze process and generate new operators, the same analyze rule would be pushed again, so it cause dead loop. And analyze process generate new operators when trying to bound order by key and aggregate function.
Solve:
We need to make it throw exception before complex analyze and rewrite process, so checking whether all expressions being bound should be done twice. One is done after bounding all expression, another is done after all analyze process in case of generate new expressions and new operators.
Example:
Cases were put in file: regression-test/suites/nereids_p0/except/test_bound_exception.groovy
`Export` syntax provides asynchronous export function, but `Export` does not achieve vectorization.
`Outfile` syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.
Fix decimal v3 precision loss issues in the multi-catalog module.
Now it will use decimal v3 to represent decimal type in the multi-catalog module.
Regression Test: `test_load_with_decimal.groovy`
Optimize instr and locate function for constant arguments.
instr and locate function constant arguments has 58%~200% performance improvement.
refactor locate(substr, str, pos) as standardized arguments processing.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.