Commit Graph

112 Commits

Author SHA1 Message Date
c8418d13b5 [improvement](config)Use session variable to replace configuration for 'enable_function_pushdown' (#11641) 2022-08-10 19:25:02 +08:00
4f5e1601df [bug](scanner) Improve limit query performance on olapScannode and avoid infinite loop (#11301)
1. Fix a bug that query large column table may cause infinite loop
2. Optimize the query logic with limit, for the case where the limit value is relatively small, reduce the parallelism of the scanner, reduce unnecessary resource consumption, and increase the number of similar queries that the system can carry at the same time, and increase the query speed by more than 60%
2022-08-01 13:50:12 +08:00
d6f937cb01 (performance)[scanner] Isolate local and remote queries using different scanner… (#11006) 2022-07-29 19:14:46 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
f6cb7a838b [Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355)
* support like/not like conjuncts push down to storage engine
* vectorized engine support like/not like conjuncts push down to storage engine
* support both evaluate and evaluate_vec method in like predicate
* reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts
* change #ifndef to pragma once as per comments
* change enable_function_pushdown default to false
Co-authored-by: heguangnan <heguangnan@bytedance.com>
2022-07-19 08:33:04 +08:00
ca5dbb1bcc Fix olap scan node normalize_in_and_eq_predicate infinite loop bug. (#10817) 2022-07-14 14:54:57 +08:00
a044b5dcc5 [refactor](predicate) refactor predicates in scan node (#10701)
* [reafactor](predicate) refactor predicates in scan node

* update
2022-07-11 09:21:01 +08:00
a2f74bf260 [Improvement] remove profile with poor readability (#10581) 2022-07-05 11:09:23 +08:00
4ec6e3ee81 [refactor] Remove debug action since it is never used. (#10484)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-29 20:37:51 +08:00
139cd3d11a [Improvement] remove olap filters when use in key ranges (#10278) 2022-06-23 09:12:29 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
4b9d500425 [improvement](profile) Add table name and predicates (#10093) 2022-06-16 10:59:31 +08:00
35c3e4e33c [Bug] runtime filter is not used as expected (#10001)
* [Bug] runtime filter is not used as expected

* update
2022-06-08 11:10:39 +08:00
f377c26bf7 [refactor][be] Optimize headers (#9708) 2022-05-30 16:12:10 +08:00
4af2493c42 [Improvement] optimize scannode concurrency query performance in vectorized engine. (#9792) 2022-05-30 16:04:40 +08:00
a96b41db7a [Improvement] Simplify expressions for _vconjunct_ctx_ptr (#9816) 2022-05-29 23:05:21 +08:00
f4dd3bf013 [bugfix] fix memleak in olapscannode(#9736) 2022-05-26 15:06:54 +08:00
a9831f87f2 [refactor]refactor lazy materialized (#8834)
[refactor]refactor lazy materialized (#8834)
2022-05-06 19:16:35 +08:00
832338c55e [improvement] set name for scanner threads and fix compile error in clang (#9336) 2022-05-05 09:53:43 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
f3dce9a6c1 [fix](planner) fix is-null predicate in where statement cannot be pushed down to the storage layer (#9035) 2022-04-18 19:35:02 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
98cab78320 [refactor](schema_hash) remove schema_hash since every tablet id in be is unique (#8574) 2022-04-07 08:37:45 +08:00
c31c6ae91a [improvement](storage) Add more detailed timer on SegmentIter in profile (#8768)
* [improvement](storage) Add more detailed timer on SegmentIter in profile

* add OutputColumnTime
2022-04-02 10:35:28 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
e807e8b108 [improvement](memory) fix olap table scan and sink memory usage problem (#8451)
Due to unlimited queue in OlapScanNode and NodeChannel, memory usage can be
very large for reading and writing large table, e.g 'insert into tableB select * from tableA'.
2022-03-13 22:12:15 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
3eedd15f9c [optimize] optimze tablet read, avoid to create too much scanner for small tablet (#8096) 2022-03-08 13:59:45 +08:00
Pxl
668188b91f [improvement][vectorized] support es node predicate peel (#8174) 2022-02-26 17:02:54 +08:00
936da4f10a [feature](thread-pool) Support thread pool per disk for scanners (#7994)
Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening

key point:
1. each disk has a thread pool for scanners
2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin.

performance testing: 
vec version: 25% faster than single thread pool in a high io util disk test case
normal version: 8% faster than single thread pool in a high io util disk test case
2022-02-18 09:40:58 +08:00
aea3e4e59b [refactor] Remove version hash from BE and related test in BE (#8027) 2022-02-14 09:29:27 +08:00
4e783afa7a [feature] add Generic debug timer for debugging or profiling (#7923)
add a group of debug-timer for the purpose of profiling or testing
you can use these timers for custom meaning purpose unlike the specific named timer
2022-01-31 22:15:43 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
563545475e [Optimize](Runtime Filter) Support merge in runtime filter(#7546) (#7547)
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist  remote target.

Close related #7546
2022-01-06 19:08:35 +08:00
85521944dd [refactor](olap-scan-node) Refactor olap scannode (#7131)
1. Delete useless variables
2. Add const modifier for read-only function
3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule: 
    [https://en.cppreference.com/w/cpp/language/rule_of_three]
4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function. 
    Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override
2021-12-16 10:33:41 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
088a16d33b Chinese annotation modification (#6958)
* Modify Chinese comment (#6951)
2021-11-09 18:00:14 +08:00
c3b133bdb3 [Refactor] Refactor the reader code (#6866)
1. Removed useless redundant code logic
2. Change reader to interface, add tuple reader to simplify the structure of reader
2021-10-30 18:15:28 +08:00
4170aabf83 [Optimize] optimize some session variable and profile (#6920)
1. optimize error message when using batch delete
2. rename session variable is_report_success to enable_profile
3. add table name to OlapScanner profile
2021-10-27 18:03:12 +08:00
adb6bfdf74 [Bug] Fix bug that truncate table may change the storage medium property (#6905) 2021-10-25 10:07:27 +08:00
7297b275f1 [Optimize] Optimize cpu consumption when importing parquet files (#6782)
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
2021-10-03 12:14:35 +08:00
ad3c9390a2 [Bug] Fix bdbje getDatabaseNames() bug and scan node close bug (#6769)
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
    1. Add a custom thirdparty download url.
    2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
2021-09-29 11:11:28 +08:00
850cf10991 [Refactor] refactor olap_scan_node: discard boost, remove dynamic_cast (#6622)
1. refactor olap_scan_node: discard boost, remove dynamic_cast
2. use move instead of copy version for push_back
2021-09-27 10:32:57 +08:00
5c45e26644 Fixed zone map init error for string type (#6667)
Fixed the problem that the StringValue memory generated by Expr may be released before use
Fixed from_string for String type may overflow
2021-09-23 09:44:22 +08:00
74ddea8d83 [Optimize] Remove some unused code to reduce lock contention (#6566)
1. Remove global runtime profile counter
2. Remove unused thread token register
2021-09-07 11:56:12 +08:00
3f2fdd236f Add scan thread token (#6443) 2021-08-27 10:56:17 +08:00
8738ce380b Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391) 2021-08-18 09:05:40 +08:00
2030c44dba [Log] Modify some log level on BE side (#6381) 2021-08-14 10:25:45 +08:00
34af66bf1d [BUG][Memory] fix memory tracker DCHECK fail in debug mode and Fix Process Memory limit fail (#6438) 2021-08-14 10:24:33 +08:00