Commit Graph

41 Commits

Author SHA1 Message Date
8370115cf6 [enhancement](memtracker) Improve performance of tracking real physical memory of PODArray #12168 2022-08-30 10:22:12 +08:00
1fc5515a78 [enhancement](memory) Remove unused reservation tracker (#11969) 2022-08-24 08:49:34 +08:00
fcae979798 [fix](memtracker) Fix PartitionedAggregationNode DCHECK when mem exceed limit (#11902) 2022-08-19 09:56:49 +08:00
ecbf87d77b [bugfix](memtracker)fix exceed memory limit log (#11485) 2022-08-04 10:22:20 +08:00
ad31b6c902 [bugfix and improvement]fix mem tracker for load and simplify some macros (#11125) 2022-07-22 21:59:36 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
4ec6e3ee81 [refactor] Remove debug action since it is never used. (#10484)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-29 20:37:51 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
ae680b4248 [UDF] support RPC udaf part 1: support create RPC udaf in fe (#8510) 2022-04-21 17:38:58 +08:00
869fdff2f0 [refactor] add reference path for source file from impala (#9115)
According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.
2022-04-20 12:29:57 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
7a73645eee [refactor] remove some unused code (#8022) 2022-02-12 15:17:28 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
fb5adaf18e [fix](mem-tracker) Fix mem limit -1 in partition aggregate node (#7181)
Make error message more clear.
2021-11-24 10:43:35 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
6674af6001 [BUG] fix streaming_preagg dcheck cause be down (#6873)
in debug mode,query memory not enough, may cause be down
fe set useStreamingPreagg true, but be function CreateHashPartitions check is_streaming_preagg_ should false.

then casue core dump.
```
*** Check failure stack trace: ***
    @          0x2aa48ad  google::LogMessage::Fail()
    @          0x2aa6734  google::LogMessage::SendToLog()
    @          0x2aa43d4  google::LogMessage::Flush()
    @          0x2aa7169  google::LogMessageFatal::~LogMessageFatal()
    @          0x24703be  doris::PartitionedAggregationNode::CreateHashPartitions()
    @          0x2468fd6  doris::PartitionedAggregationNode::open()
    @          0x1e3b153  doris::PlanFragmentExecutor::open_internal()
    @          0x1e3af4b  doris::PlanFragmentExecutor::open()
    @          0x1d81b92  doris::FragmentExecState::execute()
    @          0x1d840f7  doris::FragmentMgr::_exec_actual()
```

we should remove DCHECK(!is_streaming_preagg_)
2021-11-12 10:37:46 +08:00
4f744333c2 fix some core in local test: (#6594)
1. insert very large string value may coredump
    2. some analitic functiuon and agg function result may be incorrect
    3. string compare may be coredump when string type is too large
    4. string type in delete condition can not process correctly
    5. add text/blob as alias of string to compitable with mysql
    6. fix string type min/max agg may  process incorrectly
2021-09-10 09:52:03 +08:00
9216735cfa [New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329)
1. FE vectorized plan code
2. Function register vec function
3. Diff function nullable type
4. New thirdparty code and new thrift struct
2021-08-11 14:54:06 +08:00
a1bce25677 [BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491) 2021-03-17 09:27:05 +08:00
93a4c7efc1 [LOG] Standardize the use of VLOG in code (#5264)
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
2021-01-21 12:09:09 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
7bf926eba8 [Profile] Improve the running profile
1. Delete Invalid Counter In Data_Stream_Sender. (#3598)
2. Add Counter For PartitionHashTable of PartitionAggregationNode:
     * Hash Probe Method
     * Row processed by Aggregation
     * HashFilledBuckets: Counter How Many FilledBuckets in Aggragation
     * HTResize: Counter How Many Resize of HashTable
     * HashProbe: Counter Probe of HashTable
     * HashFailedProbe: Counter Failed Probe of HashTable
     * HashTravelLength: Total TravelLength for Probe
     * HashCollisions: Counter of HashCollision
3. Del some unecessary code in PartitionHashTable by template
2020-05-16 21:35:30 +08:00
2dc995df7b [CodeStyle] Rename new_partition_aggregation_node and new_partitioned_hash_table (#3166) 2020-03-20 19:59:01 +08:00
5a8fcd263f [CodeStyle] Delete obsolete code of partition_aggregation_node and partitioned_hash_table (#3162) 2020-03-20 16:25:29 +08:00
2d3dbc2c42 Revert "[CodeStyle] Del obsolete code of partition_aggregation_node (#3154)" (#3160)
This reverts commit dae013d797c1c2c9e54246d5ace4bdd90b297d43.
2020-03-20 14:47:25 +08:00
dae013d797 [CodeStyle] Del obsolete code of partition_aggregation_node (#3154) 2020-03-20 11:33:55 +08:00
839ec45197 Remove llvm relative code from be/src/exec (#2955)
Remove unused LLVM related codes of directory:be/src/exec (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/exec.
2020-02-20 20:43:26 +08:00
9f5e5717d4 Unify the msg of 'Memory exceed limit' (#1737)
The new msg of limit exceed: "Memory exceed limit. %msg, Backend:%ip, fragment:%id Used:% , Limit:%. xxx".
This commit unifies the msg of 'Memory exceed limit' such as check_query_state, RETURN_IF_LIMIT_EXCEEDED and LIMIT_EXCEEDED.
2019-09-03 10:42:16 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
c34b306b4f Decimal optimize branch #695 (#727) 2019-03-22 17:22:16 +08:00
a2b299e3b9 Reduce UT binary size (#314)
* Reduce UT binary size

Almost every module depend on ExecEnv, and ExecEnv contains all
singleton, which make UT binary contains all object files.

This patch seperate ExecEnv's initial and destory to anthor file to
avoid other file's dependence. And status.cc include debug_util.h which
depend tuple.h tuple_row.h, and I move get_stack_trace() to
stack_util.cpp to reduce status.cc's dependence.

I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a

Issue: #292

* Update
2018-11-15 16:17:23 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00
2868793b6b Change license to Apache License 2.0 (#262) 2018-11-01 09:06:01 +08:00
051aced48d Missing many files in last commit
In last commit, a lot of files has been missed
2018-10-31 16:19:21 +08:00
2419384e8a push 3.3.19 to github (#193)
* push 3.3.19 to github

* merge to 20ed420122a8283200aa37b0a6179b6a571d2837
2018-05-15 20:38:22 +08:00
e2311f656e baidu palo 2017-08-11 17:51:21 +08:00