Commit Graph

48 Commits

Author SHA1 Message Date
b92a764665 [feature](function) Support for aggregate function foreach combiner for some error function (#31913)
Support for aggregate function foreach combiner for some error function
2024-03-21 14:07:49 +08:00
Pxl
493385c2c7 [Bug](AggState) fix not match function when agg combinator function has alias (#31101) 2024-02-20 16:23:53 +08:00
8b225c6c3c [pipelineX](fix) Fix core dump if cancelled (#29138) 2023-12-28 10:04:51 +08:00
e1587537bc [Fix](status) fix unhandled status in exprs #28218
which marked static_cast<void> in https://github.com/apache/doris/pull/23395/files
partially fixed #28160
2023-12-11 11:04:58 +08:00
fd1db4da3d [agg](profile) fix incorrent profile (#28004) 2023-12-05 20:48:10 +08:00
693982fd1a [feature](decimal) support decimal256 (#25386) 2023-10-25 15:47:51 +08:00
c21eb315b0 [feature](thrift api) support expr in MemoryScratchSink and make arrow::Schema recalculate with block info (#24603) 2023-10-18 07:51:56 -05:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
c04e5bac39 [bug](pipelineX) fix java-udaf failed with open pipelineX (#24939) 2023-09-27 13:14:10 +08:00
dcd6c3c022 [pipelineX](refactor) propose a new pipeline execution model (#22562) 2023-08-21 15:38:45 +08:00
b41fcbb783 [feature](agg) add the aggregation function 'mag_agg' (#22043)
New aggregation function: map_agg.

This function requires two arguments: a key and a value, which are used to build a map.

select map_agg(column1, column2) from t group by column3;
2023-07-25 11:21:03 +08:00
Pxl
88cbea2b56 [Bug](agg-state) fix core dump on not nullable argument for aggstate's nested argument (#21331)
fix core dump on not nullable argument for aggstate's nested argument
2023-06-30 18:20:25 +08:00
0a59580aa4 [Enhancement](function) fix compatibility issues of sum/count during upgrade process (#20890)
in order to solve agg of sum/count is not compatibility during the upgrade process.
in PR [refactor](agg_state) refactor agg_state type to support fixed length object type #20370 have changed the serialize type and serialize column of sum/count
before is ColumnVector, now sum/count change to use ColumnFixedLengthObject
so during the upgrade process, will be not compatible if exist Old BE and Newer BE
2023-06-17 12:51:01 +08:00
31a4f96f01 [refactor](exprcontext) move close to expr context's dector method (#20747)
The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.
2023-06-14 18:01:07 +08:00
Pxl
7f8c5c81e7 [Feature](agg_state) support agg_state combinator on nereids (#20164)
support agg_state combinator on nereids
2023-06-12 12:49:26 +08:00
d03bd73795 [bug](udaf) fix java-udaf can't exectue add function (#20554)
In some case of agg function, maybe running as streaming agg firstly,
this will call the add function when serialize, so need implement add function also.
2023-06-09 08:44:12 +08:00
Pxl
8e39f0cf6b [Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298)
storage function name and result is nullable in agg state type
2023-06-04 22:44:48 +08:00
Pxl
bbb3af6ce6 [Feature](agg_state) support agg_state combinators (#19969)
support agg_state combinators state/merge/union
2023-05-29 13:07:29 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
8d7a9fd21b [refactor](exceptionsafe) add factory creator to some class (#18978)
make vexprecontext,vexpr,function,query context,runtimestate thread safe.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-24 10:32:11 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
4e1cdb9ce7 [fix](agg_sort)fix bug of agg sort group concat with order by(#18447) 2023-04-07 08:42:36 +08:00
7d92bf095a [fix](expr) refractor create_tree_from_thrift to avoid stack overflow (#18214) 2023-03-31 10:38:20 +08:00
Pxl
2bc014d83a [Enchancement](function) remove unused params on aggregate function (#16886)
remove unused params on aggregate function
2023-02-20 11:08:45 +08:00
Pxl
81bab55d43 [Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903) 2023-01-16 11:21:28 +08:00
fe562bc3e7 [Bug](Agg) fix crash when encountering not supported agg function like last_value(bitmap) (#15257)
The former logic inside aggregate_function_window.cpp would shutdown BE once encountering agg function with complex type like BITMAP. This pr makes it don't crash and would return one more concrete error message which tells the unsupported function signature to user.
2022-12-23 14:23:21 +08:00
8c0e13ab51 [improvement](profile) add detail memory counter for exec nodes (#14806)
* [improvement](profile) improve accuraccy of memory usage and add detail memory counter

* fix
2022-12-05 11:51:52 +08:00
b04ec41c1d [Vectorized](udaf) fix java-udaf couldn't get jar core dump (#14393)
fix java-udaf couldn't get jar core dump
2022-11-22 20:49:02 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
12652ebb0e [UDF](java udf) using config to enable java udf instead of macro at compile time (#14062)
* [UDF](java udf) useing config to enable java udf instead of macro at compile time
2022-11-11 09:03:52 +08:00
c14277e587 [fix](analytic) fix coredump cause by empty analytic parameter types (#13808)
* fix fe compile error
2022-11-01 17:25:36 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
dc8f64b3e3 [improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801) 2022-08-22 10:12:06 +08:00
288b440b14 [improvement](vectorized) Improve count distinct performance by using fastunion (#11516)
Improve count distinct performance by using fastunion.
Testing our user real data has a 10-40% performance improvement.
2022-08-16 12:18:46 +08:00
092a394782 [improvement](agg)limit the output of agg node (#11461)
* [improvement](agg)limit the output of agg node
2022-08-05 07:53:55 +08:00
0b1d06bfd6 [Vectorized] Support order by aggregate function (#11187)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-07-28 09:12:58 +08:00
b7c9007776 [improvement][agg]Process aggregated results in the vectorized way (#11084) 2022-07-22 22:04:43 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
6736e06679 [feature](udf) Vectorization support remote udaf #10683 (#10685) 2022-07-18 17:15:34 +08:00
18348a83ad [chore][compile] fix java udf compile error (#10841) 2022-07-14 20:51:44 +08:00
Pxl
f58a071605 [Bug][Function] pass intermediate argument list to be (#10650) 2022-07-08 20:50:05 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
a498463ab5 [feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217) (#8584)
Usage Example:
1. create table for test;
```
`CREATE TABLE `array_test` (
  `k1` tinyint(4) NOT NULL COMMENT "",
  `k2` smallint(6) NULL COMMENT "",
  `k3` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 5
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);`
```

2. insert some data
```
`insert into array_test values(1, 2, [1, 2]);`
`insert into array_test values(2, 3, null);`
`insert into array_test values(3, null, null);`
`insert into array_test values(4, null, []);`
```

3. open vectorized
`set enable_vectorized_engine=true;`

4. query array data
`select * from array_test;`
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    4 | NULL | []     |
|    2 |    3 | NULL   |
|    1 |    2 | [1, 2] |
|    3 | NULL | NULL   |
+------+------+--------+
4 rows in set (0.061 sec)

Code Changes include:
1. add column_array, data_type_array codes;
2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory;
3. support create data_type for ARRAY date type;
4. RowBlockV2::convert_to_vec_block support ARRAY date type;
5. VMysqlResultWriter::append_block support ARRAY date type;
6. vectorized::Block serialize and deserialize support ARRAY date type;
2022-03-22 15:21:44 +08:00
Pxl
e0dbf48682 [Vectorized] [AggFunction] Support group_concat (#8086) 2022-02-17 14:19:07 +08:00
071be928f9 [fix](vectorized) fix bug multi distinct function get wrong type (#7900) 2022-01-28 22:31:41 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00