Commit Graph

17 Commits

Author SHA1 Message Date
1f30e563a7 [refactor][vectorized] refactor first/last value agg functions (#10661)
* refactor first and last
[refactor][vectorized] refactor first/last value agg functions

* add some change

* remove first/last about always nullable

* remove always nullable and register it

* refactor value remove bool null flag

* refactor win first last to ptr and pos
2022-07-30 18:38:56 +08:00
babab5d535 [feature-wip] support datetimev2 (#11085) 2022-07-23 16:07:59 +08:00
00c9455f16 [fix](array-type) fix arrow column to doris array column (#10855)
* support merge array column, while convert from arrow column to doris array column

* fix typo

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-16 11:49:42 +08:00
ba1c527a23 [improvement](arrow) Avoid parse timezone for each datetime value (#10869)
* [improvement](arrow) Avoid parse timezone for each datetime value

Convert arrow batch to doris block is too slow when there are datetime values.
Because we call `TimezoneUtils::find_cctz_time_zone` for each values.

After modify, the tpch-100 q1 with external table cost from 40s -> 9s

Co-authored-by: morningman <morningman@apache.org>
2022-07-15 21:19:36 +08:00
Pxl
6a566ccb74 [Enhancement][Vectorized] add constexpr_loop_match (#10283) 2022-06-29 14:58:50 +08:00
ca94867b4e [Feature-wip] add date v2 type (#9916) 2022-06-26 16:07:56 +08:00
0e404edf54 [improvement] Change array offset type from UInt32 to UInt64 (#10070)
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
2022-06-19 10:24:08 +08:00
19bc14cf8d [feature-wip](array-type) Add array type support for vectorized parquet-orc scanner (#9856)
Only support one level array now.
for example:
- nullable(array(nullable(tinyint))) is **support**.
- nullable(array(nullable(array(xx))) is **not support**.
2022-06-09 12:11:47 +08:00
35f99faa0a [Bug][Vectorized] fix core dump on vcase_expr::close (#9893)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-01 08:05:09 +08:00
f4dd3bf013 [bugfix] fix memleak in olapscannode(#9736) 2022-05-26 15:06:54 +08:00
bee5c2f8aa [feature-wip](parquet-vec) Support parquet scanner in vectorized engine (#9433) 2022-05-17 09:37:17 +08:00
Pxl
0ee53be883 [fix][improvement](runtime-filter) fix string type length limit error && add runtime filter decimal support (#8282) 2022-03-03 22:44:49 +08:00
Pxl
668188b91f [improvement][vectorized] support es node predicate peel (#8174) 2022-02-26 17:02:54 +08:00
Pxl
90a8ca808a [Bug][Vectorized] fix bitmap_min(empty) not return null (#8190) 2022-02-24 11:06:27 +08:00
Pxl
87e555c27d [Feature][Vectorized] support function json_array/json_object/json_quote (#8158) 2022-02-22 09:29:56 +08:00
Pxl
143c4085ee [Feature][Vectorized] support aggregate function ndv()/approx_count_distinct() (#8044) 2022-02-16 14:30:13 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00