doris

Author	SHA1	Message	Date
HappenLee	2cc670dba6	[fix](vectorized) Support outer join for vectorized exec engine (#10323 ) In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple.	2022-06-24 08:59:30 +08:00
Adonis Ling	5e47b03595	[feature-wip](array-type) Add array aggregation functions (#10108 )	2022-06-17 11:07:49 +08:00
Xinyi Zou	d58e00c49c	[fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803 ) When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the `Tuple/Block data` into the controller attachment and transmit it through http brpc. This is to avoid errors when the length of the protoBuf request exceeds 2G: `Bad request, error_text=[E1003]Fail to compress request`. In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`, but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`. Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically reduces one serialization and improves performance. However, the test found that the performance did not improve, but the memory peak increased due to the addition of a memory copy.	2022-06-13 20:41:48 +08:00
HappenLee	c426c2e4b1	[Vectorized-Load] Support vectorized load table with materialized view (#9923 ) * [Vectorized-Load] Support vectorized load table with materialized view * fix ut Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-02 14:59:01 +08:00
Adonis Ling	f377c26bf7	[refactor][be] Optimize headers (#9708 )	2022-05-30 16:12:10 +08:00
Adonis Ling	2a11a4ab99	[feature-wip][array-type] Support more sub types. (#9466 ) Please refer to #9465	2022-05-26 08:41:34 +08:00
spaces-x	73e31a2179	[stream-load-vec]: memtable flush only if necessary after aggregated (#9459 ) Co-authored-by: weixiang <weixiang06@meituan.com>	2022-05-25 21:12:24 +08:00
Gabriel	8470543144	[Improvement] fix typo (#9743 )	2022-05-25 19:29:01 +08:00
Zhengguo Yang	f5bef328fe	[fix] disable transfer data large than 2GB by brpc (#9770 ) because of brpc and protobuf cannot transfer data large than 2GB, if large than 2GB will overflow, so add a check before send	2022-05-25 18:41:13 +08:00
HappenLee	8fa677b59c	[Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666 ) * [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner 1. fix bug of vjson scanner not support `range_from_file_path` 2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different 3. fix bug of vparquest filter_block reference of column in not 1 4. refactor code to simple all the code It only changed vectorized load, not original row based load. Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-20 11:43:03 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
HappenLee	d330bc3806	[Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709 ) (#9280 ) Implement vectorized stream load. Added fe configuration option `enable_vectorized_load` to enable vectorized stream load. Co-authored-by: tengjp@outlook.com Co-authored-by: mrhhsg@gmail.com Co-authored-by: minghong.zhou@163.com Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>	2022-04-29 09:50:51 +08:00
Pxl	951c2a90eb	[fix](Lateral-View)(Vectorized) core dump on lateral-view with nullable column (#9191 )	2022-04-26 10:24:11 +08:00
Zhengguo Yang	ae680b4248	[UDF] support RPC udaf part 1: support create RPC udaf in fe (#8510 )	2022-04-21 17:38:58 +08:00
Pxl	dda7604e16	[Bug][Storage-vectorized] fix code dump on outer join with not nullable column (#9112 )	2022-04-21 11:02:04 +08:00
Amos Bird	0bf72caf68	[Bug][Vectorized] Fix UB when doing ORDER BY. (#9023 )	2022-04-15 14:02:29 +08:00
zbtzbtzbt	6ed59bb98b	[refactor](code_style) remove useless inline #8933 1.Member functions defined in a class are inline by default (implicitly), and do not need to be added 2.inline is a keyword used for implementation, which has no effect when placed before the function declaration	2022-04-10 18:29:55 +08:00
HappenLee	fcefed7c1c	[Bug][Vectorized] Fix core bug of segment vectorized (#8800 ) * [Bug][Vectorized] Fix core bug of segment vectorized 1. Read table with delete condition 2. Read table with default value HLL/Bitmap Column * refactor some code Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-04-03 19:50:25 +08:00
Pxl	2760bcbcc1	[fix] fix core dump on deep_copy_tuple when data is null (#8620 )	2022-03-24 09:15:38 +08:00
Pxl	7fc22c2456	[fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581 )	2022-03-24 09:12:42 +08:00
Adonis Ling	2580da4f72	[feature-wip](array-type) Support insertion for vectorized engine. (#8494 ) (#8590 ) Please refer to #8493	2022-03-22 15:48:13 +08:00
camby	a498463ab5	[feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217 ) (#8584 ) Usage Example: 1. create table for test; ``` `CREATE TABLE `array_test` ( `k1` tinyint(4) NOT NULL COMMENT "", `k2` smallint(6) NULL COMMENT "", `k3` ARRAY<int(11)> NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`k1`) COMMENT "OLAP" DISTRIBUTED BY HASH(`k1`) BUCKETS 5 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );` ``` 2. insert some data ``` `insert into array_test values(1, 2, [1, 2]);` `insert into array_test values(2, 3, null);` `insert into array_test values(3, null, null);` `insert into array_test values(4, null, []);` ``` 3. open vectorized `set enable_vectorized_engine=true;` 4. query array data `select * from array_test;` +------+------+--------+ \| k1 \| k2 \| k3 \| +------+------+--------+ \| 4 \| NULL \| [] \| \| 2 \| 3 \| NULL \| \| 1 \| 2 \| [1, 2] \| \| 3 \| NULL \| NULL \| +------+------+--------+ 4 rows in set (0.061 sec) Code Changes include： 1. add column_array, data_type_array codes; 2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory; 3. support create data_type for ARRAY date type; 4. RowBlockV2::convert_to_vec_block support ARRAY date type; 5. VMysqlResultWriter::append_block support ARRAY date type; 6. vectorized::Block serialize and deserialize support ARRAY date type;	2022-03-22 15:21:44 +08:00
HappenLee	c18717df53	[fix](vectorized) Fix core dump of mutable block different of block (#8280 )	2022-03-05 15:27:36 +08:00
Pxl	0ee53be883	[fix][improvement](runtime-filter) fix string type length limit error && add runtime filter decimal support (#8282 )	2022-03-03 22:44:49 +08:00
HappenLee	b241bc4e9d	[fix][Vectorized] Fix exchange node merge sort null first order wrong (#8291 )	2022-03-02 10:19:06 +08:00
wangbo	d17ed5e27a	[vectorization](storage)support seq column in storage layer (#8186 ) [vectorization](storage)support seq column in storage layer (#8186)	2022-02-23 12:23:31 +08:00
awakeljw	b1e7343532	[Vectorized] [HashJoin] Opt HashJoin Performance (#8119 ) Co-authored-by: lihaopeng <happenlee@hotmail.com>	2022-02-23 10:28:16 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
HappenLee	bcde1f265a	[Function][Vectorized] Support least/greast function (#8107 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-02-18 11:57:07 +08:00
wangbo	b9f0b5565c	[refactor](storage) refactor some interfaces of storage layer column (#8064 ) 1 format binary plain 2 remove batch_set_null_bitmap 3 fix segiter return value 4 set insert_many_binary_data args	2022-02-18 10:54:51 +08:00
zhangstar333	0003822da7	[feature](vec) add ColumnHLL to support hll type (#7828 )	2022-02-17 10:44:42 +08:00
Mingyu Chen	3048ce8a4f	[improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939 ) This PR mainly changes: 1. Change the define of PBlock The new PBlock consists of a set of PColumnMeta and a binary buffer. The PColumnMeta records the metadata information of all columns in the Block, while the buffer stores the serialized binary data of all columns. 2. Refactor the serialize/deserialize method of data type Rewrite the `serialize()/deserialize()` of IDataType. And also add a new method `get_uncompressed_serialized_bytes()` to get the total length of uncompressed serialized data of a column. 3. Rewrite the serialize/deserialize method of Block Now, when serializing a Block to PBlock, it will first get the total length of uncompressed serialized data of all columns in this Block, and then allocate the memory to write the serialized data to the buffer. 4. Use brpc attachment to transmit the serialized column data	2022-02-08 11:11:42 +08:00
HappenLee	ef233701b3	[feature](vec)(load) Support vtablet sink to enable insert into by using vec query engine (#7957 ) Support vtablet sink to enable insert into query in vec query engine	2022-02-08 11:04:09 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00

35 Commits