Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G
Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
Due to unlimited queue in OlapScanNode and NodeChannel, memory usage can be
very large for reading and writing large table, e.g 'insert into tableB select * from tableA'.
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;
Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening
key point:
1. each disk has a thread pool for scanners
2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin.
performance testing:
vec version: 25% faster than single thread pool in a high io util disk test case
normal version: 8% faster than single thread pool in a high io util disk test case
# Proposed changes
Issue Number: close#6238
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
Co-authored-by: wangbo <506340561@qq.com>
Co-authored-by: emmymiao87 <522274284@qq.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: thinker <zchw100@qq.com>
Co-authored-by: Zeno Yang <1521564989@qq.com>
Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
Co-authored-by: xinghuayu007 <1450306854@qq.com>
Co-authored-by: weizuo93 <weizuo@apache.org>
Co-authored-by: yiguolei <guoleiyi@tencent.com>
Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
Co-authored-by: awakeljw <993007281@qq.com>
Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>
## Problem Summary:
### 1. Some code from clickhouse
**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**
The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris
### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node
You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.
### 3. Data Model
Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.
### 4. How to use
1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)
### 5. Some diff from origin exec engine
https://github.com/doris-vectorized/doris-vectorized/issues/294
## Checklist(Required)
1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target.
Close related #7546
1. Delete useless variables
2. Add const modifier for read-only function
3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule:
[https://en.cppreference.com/w/cpp/language/rule_of_three]
4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function.
Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
1. optimize error message when using batch delete
2. rename session variable is_report_success to enable_profile
3. add table name to OlapScanner profile
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
1. Add a custom thirdparty download url.
2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
This is part of the array type support and has not been fully completed.
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table
this pr merge some code from #4655#4650#4644#4643#4623#2979
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule
OlapScanNode::scanner_thread until all tasks are scheduled.
Although each task does not scan data and exits quickly, it still consumes a lot of resources.
(Guess)This may be the cause of the BUG (#5767) causing the I/O to be full.
So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for
the end of all scanner_threads, transfer_thread will also exit.
1.
Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch.
2.
Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool.
Co-authored-by: chenmingyu <chenmingyu@baidu.com>
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
Scanner threads may be running and using the member vars of OlapScanNode,
when the OlapScanNode has already destroyed.
We can use `_running_thread` to be the last accessed member variable.
And `transfer_thread` need to wait for `_running_thread==0`.
After `transfer_thread` joined, `OlapScanNode::close()` can continue.
Introduced by PR #5051.
As @liutang123 said, when PlanFragmentExecutor is destructed, it will call
`close -> ExecNode::close -> OlapScanNode::close`. OlapScanNode will wait for `_transfer_thread`.
`_transfer_thread` will wait for all OlapScanner processing to complete.
OlapScanner is processed by the scanner thread. When the last scanner processing is completed,
`_transfer_thread` will break out of the loop, and PlanFragmentExecutor will continue to destruct.
And if it is completed, its RuntimeProfile::Counter will also be destructed.
At this time, the ScopedTimer in the Scan thread may still use this Counter when it is destructed.
So we must make sure that the timer is deconstructed before deconstructing the runtime profile.
And Refactor ColumnRangeValue and OlapScanNode
This patch mainly do the following:
- Fix issue #5071
- Change type_min in ColumnRangeValue as static
- Add Class of type_limit make code clear
- Refactor the function of normalize_in_and_eq_predicate