1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
1. fixed some error regressions (results error with big nstance_num due to incorrect order by).
2. if set parallel_fragment_exec_instance_num to 0, the concurrency in the Pipeline execution engine will automatically be set to half of the number of CPU cores.
3. add limit to parallel_fragment_exec_instance_num that it cannot be set to more than fe.conf::max_instance_num(Default: 128)
```
mysql [(none)]>set parallel_fragment_exec_instance_num = 514;
ERROR 1231 (42000): errCode = 2, detailMessage = Variable 'parallel_fragment_exec_instance_num' can't be set to the value of '514(Should not be set to more than 128)'
```
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:
support periodic collection of statistics.
Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.
The document of configs(FE and BE) and session variables is hard to maintain.
Because developer need to modify both code and document.
And you can see that some of config's document is missing.
So I plan to write the document of config or variables directly in code, and using
script to generate document automatically.
How To
This CL mainly changes:
Add field in Config and Session Variables' annaotion
description: The description of the config or variable item. It is a String array. And first element is in Chinese, second is in English
options: the valid options if the config or variable is enum.
Add a scripts docs/generate-config-and-variable-doc.sh
Simple run sh docs/generate-config-and-variable-doc.sh and it will generate docs of FE config and variables,
And save it under docs/admin-manual/config/fe-config.md and docs/advanced/variables.md,
both in Chinese and in English.
And there are template markdowns for this script to read and replace with real doc content.
TODO
Too many description need to be filled. I will finish them in next PR. And now the origin doc remain unchanged.
Find a way to check the description field of config and variables, to make sure we won't missing it.
Generate doc for BE config.
Add this option in conf:
/**
* If set false, user couldn't submit analyze SQL and FE won't allocate any related resources.
*/
@ConfField
public static boolean enable_stats = true;
It will be checked during analyze of analyze related stmt and init analyze manager
A very simple implementation to query hive views, it is an EXPERIMENTAL feature.
We can try to parse the ddl of hive views and try to execute the query relies on the fact that HiveQL
is very similar to Doris SQL. But if the ddl of hive views use some complicated or incompatible grammar,
the query might fail.
`Export` syntax provides asynchronous export function, but `Export` does not achieve vectorization.
`Outfile` syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.
For each release of Doris, there are some experimental features.
These feature may not stable or qualified enough, and user need to use it by setting config or session variables,
eg, set enable_mtmv = true, otherwise, these feature is disable by default.
We should explicitly tell user which features are experimental, so that user will notice that and decide whether to
use it.
Changes
In this PR, I support the experimental_ prefix for FE config and session variables.
Session Variable
Given enable_nereids_planner as an example.
The Nereids planner is an experimental feature in Doris, so there is an EXPERIMENTAL annotation for it:
@VariableMgr.VarAttr(..., expType = ExperimentalType.EXPERIMENTAL)
private boolean enableNereidsPlanner = false;
And for compatibility, user can set it by:
set enable_nereids_planner = true;
set experimental_enable_nereids_planner = true;
And for show variables, it will only show experimental_enable_nereids_planner entry.
And you can also see all experimental session variables by:
show variables like "%experimental%"
Config
Same as session variable, give enable_mtmv as an example.
@ConfField(..., expType = ExperimentalType.EXPERIMENTAL)
public static boolean enable_mtmv = false;
User can set it in fe.conf or ADMIN SET FRONTEND CONFIG stmt with both names:
enable_mtmv
experimental_enable_mtmv
And user can see all experimental FE configs by:
ADMIN SHOW FRONTEND CONFIG LIKE "%experimental%";
TODO
Support this feature for BE config
Only add experimental for:
enable_pipeline_engine
enable_nereids_planner
enable_single_replica_insert
and FE config:
enable_mtmv
enabel_ssl
enable_fqdn_mode
Should modify other config and session vars
When query tables in information_schema databases, it may timeout due to:
There are external catalog with too many tables.
The external catalog is unreachable
So I add a new FE config infodb_support_ext_catalog.
The default is false, which means that when select from tables in information_schema database,
the result will not contain the information of the table in external catalog.
Describe your changes.
Previously in Doris FE, there is no specific thread pool for grpc-client-channel,
by default the underlying netty logic would use one dynamic unbounded cache threadpool.
The workload for this grpc threadpool is unseen.
Use ThreadpoolMgr to create one customized threadpool to get Prometheus-compatible metric data.
Split ExternalFileScanNode into FileQueryScanNode and FileLoadScanNode.
Remove some useless code in FileLoadScanNode.
Remove unused config item: enable_vectorized_load and enable_new_load_scan_node
1. remove TypeCoercion and CharacterLiteralTypeCoercion
2. Nereids Cast do not relay on legacy planner's analyze()
3. fix below problem in legacy planner, after this PR
a. BOOLEAN can cast to DECIMALV2 explicitly
b. compare between BOOLEAN and DATE will cast both side to DOUBLE
c. HLL cannot be implicitly cast to any other type
1. Support the show load warnings for mysql load to get the detail error message.
2. Fix fillByteBufferAsync not mark the load as finished in same data load
3. Fix drain data only in client mode.
Co-authored-by: ByteYue <[yj976240184@gmail.com](mailto:yj976240184@gmail.com)>
This PR is an optimization for https://github.com/apache/doris/pull/17478:
1. Change the buffer size of `LineReader` to 4MB to align with the size of prefetch buffer.
2. Lazily prefetch data in the first read to prevent wasted reading.
3. S3 block size is 32MB only, which is too small for a file split. Set 128MB as default file split size.
4. Add `_end_offset` for prefetch buffer to prevent wasted reading.
The query performance of reading data on object storage is improved by more than 3x+.
1. Organize http documents
2. Add http interface authentication for FE
3. **Support https interface for FE**
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
Inspired by c++ function `std::vector::emplace_back()`, we can use variadic template for this issue.
e.g.
```
[['struct'], 'STRUCT<TYPES>', ['TYPES'], 'ALWAYS_NOT_NULLABLE', ['TYPES...']]
```
`...TYPES` in template_types defines a variadic template `TYPE`. Then the variadic template will be expanded to multiple normal templates based on actual input arguments at runtime in FE.
But make sure `TYPES...` is placed on the last position in all template type arguments.
BTW, the origin template function logic is not affected.
Sometimes the competition of lock is fierce in DatabaseTransactionMgr, which may lead to publish time out, i think we should have a log to hint these lock competition.
1. close (https://github.com/apache/doris/issues/16458) for nereids
2. varchar and string type should be treated as same type in bucket shuffle join scenario.
```
create table shuffle_join_t1 ( a varchar(10) not null )
create table shuffle_join_t2 ( a varchar(5) not null, b string not null, c char(3) not null )
```
the bellow 2 sqls can use bucket shuffle join
```
select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.a;
select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.b;
```
3. PushdownExpressionsInHashCondition should consider both hash and other conjuncts
4. visitPhysicalProject should handle MarkJoinSlotReference
When setting FE config default_storage_medium to SSD, and set all BE storage path as SSD.
And table will be stored with storage medium SSD.
But there is a FE config storage_cooldown_second and its default value is 30 days.
So after 30 days, the storage medium of table will be changed to HDD, which is unexpected.
This PR removes the storage_cooldown_second, and use a max value to set the cooldown time of SSD
storage medium when the default_storage_medium is SSD.