Commit Graph

24 Commits

Author SHA1 Message Date
Pxl
81bab55d43 [Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903) 2023-01-16 11:21:28 +08:00
fe562bc3e7 [Bug](Agg) fix crash when encountering not supported agg function like last_value(bitmap) (#15257)
The former logic inside aggregate_function_window.cpp would shutdown BE once encountering agg function with complex type like BITMAP. This pr makes it don't crash and would return one more concrete error message which tells the unsupported function signature to user.
2022-12-23 14:23:21 +08:00
8c0e13ab51 [improvement](profile) add detail memory counter for exec nodes (#14806)
* [improvement](profile) improve accuraccy of memory usage and add detail memory counter

* fix
2022-12-05 11:51:52 +08:00
b04ec41c1d [Vectorized](udaf) fix java-udaf couldn't get jar core dump (#14393)
fix java-udaf couldn't get jar core dump
2022-11-22 20:49:02 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
12652ebb0e [UDF](java udf) using config to enable java udf instead of macro at compile time (#14062)
* [UDF](java udf) useing config to enable java udf instead of macro at compile time
2022-11-11 09:03:52 +08:00
c14277e587 [fix](analytic) fix coredump cause by empty analytic parameter types (#13808)
* fix fe compile error
2022-11-01 17:25:36 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
dc8f64b3e3 [improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801) 2022-08-22 10:12:06 +08:00
288b440b14 [improvement](vectorized) Improve count distinct performance by using fastunion (#11516)
Improve count distinct performance by using fastunion.
Testing our user real data has a 10-40% performance improvement.
2022-08-16 12:18:46 +08:00
092a394782 [improvement](agg)limit the output of agg node (#11461)
* [improvement](agg)limit the output of agg node
2022-08-05 07:53:55 +08:00
0b1d06bfd6 [Vectorized] Support order by aggregate function (#11187)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-07-28 09:12:58 +08:00
b7c9007776 [improvement][agg]Process aggregated results in the vectorized way (#11084) 2022-07-22 22:04:43 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
6736e06679 [feature](udf) Vectorization support remote udaf #10683 (#10685) 2022-07-18 17:15:34 +08:00
18348a83ad [chore][compile] fix java udf compile error (#10841) 2022-07-14 20:51:44 +08:00
Pxl
f58a071605 [Bug][Function] pass intermediate argument list to be (#10650) 2022-07-08 20:50:05 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
a498463ab5 [feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217) (#8584)
Usage Example:
1. create table for test;
```
`CREATE TABLE `array_test` (
  `k1` tinyint(4) NOT NULL COMMENT "",
  `k2` smallint(6) NULL COMMENT "",
  `k3` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 5
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);`
```

2. insert some data
```
`insert into array_test values(1, 2, [1, 2]);`
`insert into array_test values(2, 3, null);`
`insert into array_test values(3, null, null);`
`insert into array_test values(4, null, []);`
```

3. open vectorized
`set enable_vectorized_engine=true;`

4. query array data
`select * from array_test;`
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    4 | NULL | []     |
|    2 |    3 | NULL   |
|    1 |    2 | [1, 2] |
|    3 | NULL | NULL   |
+------+------+--------+
4 rows in set (0.061 sec)

Code Changes include:
1. add column_array, data_type_array codes;
2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory;
3. support create data_type for ARRAY date type;
4. RowBlockV2::convert_to_vec_block support ARRAY date type;
5. VMysqlResultWriter::append_block support ARRAY date type;
6. vectorized::Block serialize and deserialize support ARRAY date type;
2022-03-22 15:21:44 +08:00
Pxl
e0dbf48682 [Vectorized] [AggFunction] Support group_concat (#8086) 2022-02-17 14:19:07 +08:00
071be928f9 [fix](vectorized) fix bug multi distinct function get wrong type (#7900) 2022-01-28 22:31:41 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00