Add this option in conf:
/**
* If set false, user couldn't submit analyze SQL and FE won't allocate any related resources.
*/
@ConfField
public static boolean enable_stats = true;
It will be checked during analyze of analyze related stmt and init analyze manager
Fix bug when reading array type in parquet file:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed,
reason = [IO_ERROR]Decode too many values in current page
```
When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong.
The situation where this error occurs is particularly extreme: The column pages have remaining values to be read,
but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.
Refactor FileQueryScanNode init and finalize methods.
Handle schema related initialization in init method, handle scan range generation in finalize method.
Introduced from #18609.
When setting global variables from Non Master FE, there will be error like:
`Variable 'password_history' is a GLOBAL variable and should be set with SET GLOBAL`
Because when setting global variables from Non Master FE, Doris will do following step:
1. forward this SetStmt to Master FE to execute.
2. Change this SetStmt to "SESSION" level, and execute it again on this Non Master FE.
But for "GLOBAL only" variable, such ash "password_history", it doesn't allow to set on SESSION level.
So when doing step 2, "set password_history=xxx" without "GLOBAL" keywords will throw exception.
So in this case, we should just ignore this exception and return.
1. remove unnecessary project node above scan node.
2. fix in subquery may be recognized as scalar subquery bug
3. fix some Quantile related functions' return type bug
PR1: add new storage file system template and move old storage to new package
PR2: extract some method in old storage to new file system.
PR3: use storages to access remote object storage, and use file systems to access file in local or remote location. Will add some unit tests.
---------
Co-authored-by: jinzhe <jinzhe@selectdb.com>
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
A very simple implementation to query hive views, it is an EXPERIMENTAL feature.
We can try to parse the ddl of hive views and try to execute the query relies on the fact that HiveQL
is very similar to Doris SQL. But if the ddl of hive views use some complicated or incompatible grammar,
the query might fail.
after pr #18670
could use jvm parameters to init jdbc datasource,
but when set JDBC_MIN_POOL=0, it can be immediately closed.
There is no need to wait for the recycling timer.
1. Supports sampling to collect statistics
2. Improved syntax for collecting statistics
3. Support histogram specifies the number of buckets
4. Tweaked some code structure
---
The syntax supports WITH and PROPERTIES, using the same syntax as before.
Column Statistics Collection Syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
[ (column_name [, ...]) ]
[ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ]
[ PROPERTIES ('key' = 'value', ...) ];
```
Column histogram collection syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
[ (column_name [, ...]) ]
UPDATE HISTOGRAM
[ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ]
[ PROPERTIES ('key' = 'value', ...) ];
```
Illustrate:
- sync:Collect statistics synchronously. Return after collecting.
- incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported.
- sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled.
- buckets:Specifies the maximum number of buckets generated when collecting histogram statistics.
- table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`.
- column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas.
- properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement)
- 'sync' = 'true'
- 'incremental' = 'true'
- 'sample.percent' = '50'
- 'sample.rows' = '1000'
- 'num.buckets' = 10
---
TODO:
- Supplement the complete p0 test
- `Incremental` statistics see #18653
add two phase read topn opt, the legacy planner's PR are:
- #15642
- #16460
- #16848
TODO:
we forbid limit(sort(project(scan))) since be core when plan has a project on the scan.
we need to remove this restirction after we fix be bug
Including below functions:
1. broker load
2. export
3. select into outfile
4. create repo and backup to gfs
after config env, use gfs like other hdfs system.