while query with multi where conditions, such as `where dt in (20210926,20210919) and hour<=13`,
will cause int * int product overflow result. and then in the function extend_scan_key will call
`range.convert_to_fixed_value()` mistakenly. And for a big `range[_low_value, _high_value)`,
mass value will be inserted into _fixed_values, result in oom finally.
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
1. Add a custom thirdparty download url.
2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
1. Fix bug of UNKNOWN Operation Type 91
2. Support using resource_tag property of user to limit the usage of BE
3. Add new FE config `disable_tablet_scheduler` to disable tablet scheduler.
4. Add documents for resource tag.
5. Modify the default value of FE config `default_db_data_quota_bytes` to 1PB.
6. Add a new BE config `disable_compaction_trace_log` to disable the trace log of compaction time cost.
7. Modify the default value of BE config `remote_storage_read_buffer_mb` to 16MB
8. Fix `show backends` results error
9. Add new BE config `external_table_connect_timeout_sec` to set the timeout when connecting to odbc and mysql table.
10. Modify issue template to enable blank issue, for release note or other specific usage.
11. Fix a bug in alpha_row_set split_range() function.
This reverts commit dedb57f87e31305db3e2a13e374ba4fd58043fca.
Reverts #6252
This commit may cause tablet which segments are all empty never to compaction, and results in -235 error.
I will revert this commit, and the problem will be solved in #6671
In some cases, the query plan thrift structure of a query may be very large
(for example, when there are many columns in SQL), resulting in a large number
of "send fragment timeout" errors.
This PR adds an FE config to control whether to transmit the query plan in a compressed format.
Using compressed format transmission can reduce the size by ~50%. But it may reduce
the concurrency by ~10%. Therefore, in the high concurrency small query scenario,
you can choose to turn off compaction.
Support hdfs in select outfile clause without broker.
This PR implement a HDFS writer in BE which is used to write HDFS file directly without using broker.
Also the hdfs outfile clause syntax check has been added in FE.
The syntax:
```
select * from xx into outfile "hdfs://user/outfile_" format as csv
properties ("hdfs.fs.dafultFS" = "xxx", "hdfs.hdfs_user" = "xxx");
```
Note that all hdfs configurations need to carry a prefix `hdfs.`.
1. Fix a memory leak in `collect_iterator.cpp` (Fix#6700)
2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix#6701)
3. Make the error msg of stream load more friendly.
1.Fix a potential BE coredump of sending batch when loading data. (Fix [Bug] BE crash when loading data #6656)
2.Fix a potential BE coredump when doing schema change. (Fix [Bug] BE crash when doing alter task #6657)
3.Optimize the metric of base_compaction_request_failed.
4.Add Order column in show tablet result. (Fix [Feature] Add order column in SHOW TABLET stmt result #6658)
5.Fix bug that tablet repair slot not being released. (Fix [Bug] Tablet scheduler stop working #6659)
6.Fix bug that REPLICA_MISSING error can not be handled. (Fix [Bug] REPLICA_MISSING error can not be handled. #6660)
7.Modify column name of SHOW PROC "/cluster_balance/cluster_load_stat"
8.Optimize the result of SHOW PROC "/statistic" to show COLOCATE_MISMATCH tablets (Fix [Feature] the health status of colocate table's tablet is not shown in show proc statistic #6663)
9.Fix bug that show load where state='pending' can not be executed. (Fix [Bug] show load where state='pending' can not be executed. #6664)
1. Support boolean data type for spark-doris-connector because Doris has previously supported the boolean data type
2. Bug-Fix for the Doris BE core when spark request data from be
* [BUG][Profile] Fixed the problem that BE's profile could not add child profile in the specified correct location
bug:
runtime_profile()->add_child(build_phase_profile, false, nullptr);
child profile will add to second location
* Update runtime_profile.cpp
1. insert very large string value may coredump
2. some analitic functiuon and agg function result may be incorrect
3. string compare may be coredump when string type is too large
4. string type in delete condition can not process correctly
5. add text/blob as alias of string to compitable with mysql
6. fix string type min/max agg may process incorrectly
There are many historical job records in Doris, such as load jobs, alter jobs, export jobs and so on.
These historical jobs are generally cleaned up periodically by the cleanup thread, to avoid taking too much memory.
This PR reorganized the cleanup logic of historical jobs and optimized the cleanup logic of some historical jobs
to reduce the memory usage of historical jobs.
The following FE configuration items are related to historical job cleaning:
1. label_keep_max_second
Used to determine whether LoadJob, LoadJobV2, RoutineLoadJob or TransactionState are expired.
2. streaming_label_keep_max_second
Used to determine whether InsertJob, DeleteJob or TransactionState are expired.
Different from label_keep_max_second, this config is used to clean up these frequently submitted jobs or load transactions.
3. history_job_keep_max_second
Used to determine whether AlterJob, ExportJob are expired
This pr mainly supports
1. Export query result sets concurrently
2. Query result set export supports s3 protocol
Among them, there are several preconditions for concurrently exporting query result sets
1. Enable concurrent export variables
2. The query itself can be exported concurrently
(some queries containing sort nodes at the top level cannot be exported concurrently)
3. Export the s3 protocol used instead of the broker
After exporting the result set concurrently,
the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}
* fix bugs with string type
1. not support string with agg type min/max
2. agg_update with large string may coredump
3. stringval with large string may coredump
4. not support string as partition key
1. Add license/total line/release badegs.
2. Add monthly active contributor and contributor growth graph
3. fix a pom.xml bug
4. Modify some routine load log on BE side
This CL mainly changes:
1. the `storage_page_cache_limit` is based on config `mem_limit`
the default is 20% of `mem_limit`.
2. the `buffer_pool_limit` is based on config `mem_limit`
the default is 20% of `mem_limit`.
3. the `buffer_pool_clean_pages_limit` is based on config `buffer_pool_limit`
the default is 50% of `buffer_pool_limit`
4. Fix some show bugs of lru cache hit ratio and usage ratio
5. Fix a create view bug that `notEvalNondeterministicFunction` should be reset after analyze.
fix#6269
The outline of our changes is to improve our memory in case of OOM in BE and to speed up the calculation.
1. We do not need to do Aggregation in load, which has already been done in the ETL spark job.
2. Based on 1, we do not need to serialize/deserialize bitmap/HLL objects.
Encapsulate some http interfaces for better management and maintenance of doris clusters.
The http interface includes getting cluster connection information, node information, node configuration information, batch modifying node configuration, and getting query profile.
For details, please refer to the document:
`docs/zh-CN/administrator-guide/http-actions/fe/manager/`