Commit Graph

105 Commits

Author SHA1 Message Date
eebfbd0c91 Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323)" (#10424)
This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.
2022-06-25 22:18:08 +08:00
2cc670dba6 [fix](vectorized) Support outer join for vectorized exec engine (#10323)
In a vectorized scenario, the query plan will generate a new tuple for the join node.
This tuple mainly describes the output schema of the join node.
Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema.
For example:
1. The case where the null side column caused by outer join is converted to nullable.
2. The projection of the outer tuple.
2022-06-24 08:59:30 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
0683181fef [API changed](parser) Remove merge join syntax (#9795)
Remove merge join sql and merge join node
2022-05-30 09:04:21 +08:00
cd105bee0a [refactor](es) Clean es tcp scannode and related thrift definitions (#9553)
PaloExternalSourcesService is designed for es_scan_node using tcp protocol.
But es tcp protocol need deploy a tcp jar into es code. Both es version and lucene version are upgraded,
and the tcp jar is not maintained any more.

So that I remove all the related code and thrift definitions.
2022-05-14 10:03:55 +08:00
718a51a388 [refactor][style] Use clang-format to sort includes (#9483) 2022-05-10 21:25:35 +08:00
d330bc3806 [Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709) (#9280)
Implement vectorized stream load.
Added fe configuration option `enable_vectorized_load` to enable vectorized stream load.

    Co-authored-by: tengjp@outlook.com
    Co-authored-by: mrhhsg@gmail.com
    Co-authored-by: minghong.zhou@163.com
    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
2022-04-29 09:50:51 +08:00
869fdff2f0 [refactor] add reference path for source file from impala (#9115)
According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.
2022-04-20 12:29:57 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
aaaaae53b5 [feature] (memory) Switch TLS mem tracker to separate more detailed memory usage (#8605)
In pr #8476, all memory usage of a process is recorded in the process mem tracker,
and all memory usage of a query is recorded in the query mem tracker,
and it is still necessary to manually call `transfer to` to track the cached memory size.

We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker.

In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately
counts more detailed memory usage than before.
2022-03-24 14:29:34 +08:00
Pxl
a824c3e489 [feature](vectorized) support lateral view (#8448) 2022-03-17 10:04:24 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
5fc0a9f40d [improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319)
This PR mainly changes:

1. Help to Cancel the load job ASAP when encounter unqualified data.
    Solution is described in #6318 .
    Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues.

2. fix a NPE bug when create user with empty host
3. fix compile warning after rebasing the master(vectorization)
2022-01-18 13:13:55 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
0499b2211b [feat](lateral-view) Support execution of lateral view stmt (#7255)
1. Add table function node
2. Add 3 table functions: explode_split, explode_bitmap and explode_json_array
2021-12-16 10:46:15 +08:00
85521944dd [refactor](olap-scan-node) Refactor olap scannode (#7131)
1. Delete useless variables
2. Add const modifier for read-only function
3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule: 
    [https://en.cppreference.com/w/cpp/language/rule_of_three]
4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function. 
    Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override
2021-12-16 10:33:41 +08:00
Pxl
3fcb3db57a [fix](vectorized-engine) fix core when enable_vectorized_engine open (#7159) 2021-11-24 10:42:12 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
9216735cfa [New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329)
1. FE vectorized plan code
2. Function register vec function
3. Diff function nullable type
4. New thirdparty code and new thrift struct
2021-08-11 14:54:06 +08:00
98e80aa65e [refactor] Replace boost::function with std::function (#5700)
Replace boost::function with std::function
2021-05-09 22:00:48 +08:00
0131c33966 [Enhance] Improve the readability of memtrackers' name (#5455)
Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker
2021-03-11 22:33:31 +08:00
93a4c7efc1 [LOG] Standardize the use of VLOG in code (#5264)
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
2021-01-21 12:09:09 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
a61d0de173 [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs (#4438) 2020-09-25 10:19:50 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
51367abce7 [Bug] Fix bug that BE crash when doing Insert Operation (#3872)
Mainly change:
1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`.
2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception.
3. Protect the `_status` in RuntimeState with lock
4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be
deconstructed after the object pool.
5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it,
RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.
2020-06-19 17:09:04 +08:00
f4c03fe8e2 1. Delete the code of Sort Node we do not use now. (#3666)
Optimize the quick sort by find_the_median and try to reduce recursion level of quick sort.
2020-05-26 10:20:57 +08:00
d0fe7e4d94 [Profile] Make running profile clearer and more intuitive to improve usability (#3405)
This CL mainly made the following modifications:
    1. Delete Invalid MemoryUsed Counter and Add PeakMemUsage in each exec node and datastreamsender
    2. Add intent in child execnode profile,make it is easily to know the relationship between execnode
    3. Del _is_result_order we not support any more in olap_scan_node.h and olap_scan_node.cpp
    4. Add scan_disk method to olap_scanner to fix the counter _num_disks_accessed_counter
    5. Now we do not use buffer pool to read and write disk, so annotation eadio counter and 
    6. Delete the MemUsed counter in exec node.
2020-04-30 14:57:21 +08:00
4eb27bc7e3 [Profile] Make running profile clearer and more intuitive to improve usability (#3365) (#3383)
This CL mainly made the following modifications:
1. Delete Invalid method in Running Profile Class.
2. Move Memlimit Counter from blockmgr to fragment and add PeakMemUsage Counter
3. Fix the bug of buffer pool memlimit counter
4. Call compute_time_in_profile() before pretty_print() to show the _local_time_percent without child running profile
5. Add TransferThread ThreadToken count in AveThreadToken Counter
2020-04-24 21:38:55 +08:00
2dc995df7b [CodeStyle] Rename new_partition_aggregation_node and new_partitioned_hash_table (#3166) 2020-03-20 19:59:01 +08:00
5a8fcd263f [CodeStyle] Delete obsolete code of partition_aggregation_node and partitioned_hash_table (#3162) 2020-03-20 16:25:29 +08:00
2d3dbc2c42 Revert "[CodeStyle] Del obsolete code of partition_aggregation_node (#3154)" (#3160)
This reverts commit dae013d797c1c2c9e54246d5ace4bdd90b297d43.
2020-03-20 14:47:25 +08:00
dae013d797 [CodeStyle] Del obsolete code of partition_aggregation_node (#3154) 2020-03-20 11:33:55 +08:00
b286f4271b Remove unused PreAggregtionNode (#3151) 2020-03-20 09:19:47 +08:00
0959abc1dc [ExceptNode] Implement except node (#3056)
implement except node,
support  statement like:

``` 
select a from t1 except select b from t2
```
2020-03-17 10:54:40 +08:00
dc07182bd4 [Intersect] Implements intersect node (#3034)
imlement of the intersect node
now can support statement like `select a from t intersect select b from t1 intersect select 1;`
2020-03-09 10:52:55 +08:00
a340bc7a00 Remove unused LLVM related codes of directory:be/src/runtime (#2910) (#2985)
Remove unused LLVM related codes of directory (step 4):be/src/runtime (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/runtime
2020-02-25 13:47:20 +08:00
839ec45197 Remove llvm relative code from be/src/exec (#2955)
Remove unused LLVM related codes of directory:be/src/exec (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/exec.
2020-02-20 20:43:26 +08:00
fc55423032 [SQL] Support Grouping Sets, Rollup and Cube to extend group by statement
Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax 
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube  or rollup like 
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```

[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)
2020-01-17 16:24:02 +08:00
27d6794b81 Support subquery with non-scalar result in Binary predicate and Between-and predicate (#2360)
This commit add a new plan node named AssertNumRowsNode
which is used to determine whether the number of rows exceeds the limit.
The subquery in Binary predicate and Between-and predicate should be added a AssertNumRowsNode
which is used to determine whether the number of rows in subquery is more than 1.
If the number of rows in subquery is more than 1, the query will be cancelled.

For example:
There are 4 rows in table t1.
Query: select c1 from t1 where c1=(select c2 from t1);
Result: ERROR 1064 (HY000): Expected no more than 1 to be returned by expression select c2 from t1

ISSUE-2270
TPC-DS 6,54,58
2019-12-05 21:27:33 +08:00
b246d93128 Avoid SerDe for aggregation query with object pool (#1854) 2019-09-26 13:51:13 +08:00
9f5e5717d4 Unify the msg of 'Memory exceed limit' (#1737)
The new msg of limit exceed: "Memory exceed limit. %msg, Backend:%ip, fragment:%id Used:% , Limit:%. xxx".
This commit unifies the msg of 'Memory exceed limit' such as check_query_state, RETURN_IF_LIMIT_EXCEEDED and LIMIT_EXCEEDED.
2019-09-03 10:42:16 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
934ca2481a Make MySQL support optional (#1248) 2019-06-05 12:28:15 +08:00
9c82d41981 Support Doris query ES by HTTP way (#925) 2019-04-28 17:14:44 +08:00
397747af2c Fix bug that push down the predicates past AggregateNode (#658) 2019-02-26 10:55:14 +08:00
4d5f92cce7 Add EsScanNode (#450) 2019-01-17 17:59:33 +08:00
0e5b193243 Add cpu and io indicates to audit log (#531) 2019-01-17 12:43:15 +08:00