Commit Graph

18 Commits

Author SHA1 Message Date
201cd207f9 [Enhancement][Vectorized] Improve hash table build efficiency (#9250)
1. MAP_POPULATE is missing for mmap in Allocator, because macro OS_LINUX is not defined in allocator.h;
2. MAP_POPULATE has no effect for mremap as for mmap, zero-fill enlarged memory range explicitly to pre-fault the pages
2022-04-29 14:26:33 +08:00
26bc462e1c [feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145)
1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: 731730da85/docs/en/server.md (bthread-local)

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
2022-04-27 20:34:02 +08:00
6ed59bb98b [refactor](code_style) remove useless inline #8933
1.Member functions defined in a class are inline by default (implicitly), and do not need to be added
2.inline is a keyword used for implementation, which has no effect when placed before the function declaration
2022-04-10 18:29:55 +08:00
7fb4b6a6e2 [chore](tsan) add file mremap_fallback for tsan (#8665) 2022-04-08 09:01:53 +08:00
92feb9c6c8 [fix] Fix error crc32 method to cal uint128 and int128 (#8577) 2022-03-23 10:33:32 +08:00
a76889b319 [improvement] Avoid print large string in error log (#8436)
1. Avoid print large string in error log
    If user load a unqualified large string, the all string will be saved in error log,
    so the error log is too big that can not be shown be using `show load warnings on "url"`.
    Err: `Got packet bigger than 'max_allowed_packet' bytes`

2. Remove duplicate help doc
    Do not allow doc with same title, or error thrown when starting FE:
    `java.lang.IllegalArgumentException: Multiple entries with same key:`
2022-03-11 17:23:47 +08:00
Pxl
cd8694e532 [feature][vectorized] support replace() (#8384) 2022-03-08 18:57:12 +08:00
ada39dd9ad [improvement][vec] better memequal impl to speed up string compare (#8229)
like #8214

faster string compare operator in vec engine.
2022-03-01 11:25:12 +08:00
01fb25a498 [UT] Fix the UT of column_nullable_test (#8180)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-23 15:37:40 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
Pxl
143c4085ee [Feature][Vectorized] support aggregate function ndv()/approx_count_distinct() (#8044) 2022-02-16 14:30:13 +08:00
Pxl
b26e7e3c28 [feature](function)(vec) support locate function (#7988)
* support function locate in vectorized engine

* add ut and fix some bug
2022-02-12 16:00:37 +08:00
358bd79fb1 [improvement](vec)(Join) Mem reuse to speed up join operator (#7905)
1. Reuse the mem of output block in vec join node
2. Add the function `replicate` in column
2022-01-31 22:14:12 +08:00
1ba20b1dbb [improvement](storage) improving Column inserter (#7855)
* optimize Column inserter

* DCHECK

* DCHECK

Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-01-27 14:18:15 +08:00
015371ac72 [fix](grouping-set) Fix the bug of grouping set core in both vec and non vec query engine (#7800) 2022-01-26 16:15:30 +08:00
f227472db2 [chore] fix error while compiling with -O3 (#7890) 2022-01-26 12:53:56 +08:00
800a36343a [chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712)
Prepare to generate hermetic build using GCC 11 and Clang 13.
The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh)

To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.
2022-01-21 12:12:04 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00