doris

Author	SHA1	Message	Date
lihangyu	c21eb315b0	[feature](thrift api) support expr in MemoryScratchSink and make arrow::Schema recalculate with block info (#24603 )	2023-10-18 07:51:56 -05:00
Gabriel	b38b8b4494	[pipelineX](fix) Fix BE crash caused by join and constant expr (#24862 )	2023-09-25 21:01:09 +08:00
zhangstar333	911bd0e818	[bug](if) fix if function not handle const nullable value (#22823 ) fix if function not handle const nullable value	2023-08-15 10:16:48 +08:00
Pxl	85c5d7c6a9	[Chore](materialized-view) add ssb_flat mv test case (#20869 ) add ssb_flat mv test case	2023-06-19 10:51:50 +08:00
yiguolei	31a4f96f01	[refactor](exprcontext) move close to expr context's dector method (#20747 ) The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.	2023-06-14 18:01:07 +08:00
Pxl	8e39f0cf6b	[Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298 ) storage function name and result is nullable in agg state type	2023-06-04 22:44:48 +08:00
Pxl	bbb3af6ce6	[Feature](agg_state) support agg_state combinators (#19969 ) support agg_state combinators state/merge/union	2023-05-29 13:07:29 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
YueW	08ec5e2eb5	[fix](function) fix result column is nullable type when fast execute (#19889 )	2023-05-24 10:27:50 +08:00
Pxl	d64be9565d	[Bug](function) fix function in get wrong result when input const column (#19791 ) fix function in get wrong result when input const column	2023-05-22 10:58:29 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
zclllyybb	fb377a9da9	[Improvement](functions)Optimized some datetime function's return value (#18369 )	2023-04-19 15:51:11 +08:00
zhangstar333	2519931a04	[vectorized](function) support time_to_sec function (#18354 ) support time_to_sec function	2023-04-13 19:31:12 +08:00
yiguolei	77ab2fac20	[refactor](functioncontext) remove function context impl class (#17715 ) * [refactor](functioncontext) remove function context impl class Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-03-14 11:21:45 +08:00
HappenLee	7d035486ad	[Opt](vec) opt the fast execute logic to remove useless function call (#16532 )	2023-02-09 14:12:40 +08:00
HappenLee	c3a6eb4f9a	[Refactor](function) remove useless function get to create column (#16333 ) remove unless create_column to redurce the unless new operator	2023-02-04 22:54:14 +08:00
HappenLee	aaae1497cd	[Refactor](function) opt the exec of function with null column (#16256 )	2023-02-01 15:56:31 +08:00
yiguolei	e49766483e	[refactor](remove unused code) remove many xxxVal structure (#16143 ) remove many xxxVal structure remove BetaRowsetWriter::_add_row remove anyval_util.cpp remove non-vectorized geo functions remove non-vectorized like predicate Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-28 14:17:43 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
Gabriel	d062ca2944	[refactor](vectorized) remove unnecessary vectorization check (#15984 )	2023-01-17 12:21:46 +08:00
YueW	edecc2e706	[feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211 ) * [feature-wip](inverted index)inverted index api: reader * [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL * [feature-wip](inverted index) Adapt to index meta * [enhance] add more metrics * [enhance] add fulltext match query check for column type and index parser * [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node	2022-12-30 21:48:14 +08:00
YueW	305dd15fea	[improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035 ) Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data. Based on that situation, this pr will do: 1. this pr in order to support bitmap index apply compound predicates, query sql like: select * from tb where a > 'hello' or b < 100; select * from tb where a > 'hello' or b < 100 or c > 'ok'; select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200); select * from tb where (not a> 'hello') or b < 100; ... above sql，column a and b and c has created bitmap_index. 2. this optimization can reduce reading data by index 3. set config enable_index_apply_compound_predicates to use this optimization	2022-12-28 20:08:57 +08:00
Pxl	c18a471303	[Optimize](predicate) update inplace on VcompoundPred (#14402 ) select count(*) from lineorder where lo_orderkey<100000000 OR lo_orderkey>100000000 AND lo_orderkey<200000000 OR lo_orderkey >200000000; 0.6s -> 0.5s	2022-11-21 09:12:30 +08:00
Zhengguo Yang	12652ebb0e	[UDF](java udf) using config to enable java udf instead of macro at compile time (#14062 ) * [UDF](java udf) useing config to enable java udf instead of macro at compile time	2022-11-11 09:03:52 +08:00
TengJianPing	5a700223fe	[fix](function) fix coredump cause by return type mismatch of vectorized repeat function (#13868 ) Will not support repeat function during upgrade in vectorized engine.	2022-11-03 09:53:02 +08:00
Gabriel	17b809210a	[Bug](runtime filter) fix bug for late-arrival runtime filters (#12049 )	2022-08-26 09:13:10 +08:00
AlexYue	8b10a1a3f7	[enhancement](VSlotRef) enhance column_id check in execute function during runtime (#11862 ) The column id check in VSlotRef::execute function before is too strict for fuzzy test to continuously produce random query. Temporarily loosen the check logic. Moreover, there exists some careless call to VExpr::get_const_col, it might return a nullptr but not every function call checks if it's valid. It's an underlying problem.	2022-08-18 09:12:26 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
Zhengguo Yang	ae680b4248	[UDF] support RPC udaf part 1: support create RPC udaf in fe (#8510 )	2022-04-21 17:38:58 +08:00
Gabriel	b89e4c7bba	[feature-wip](java-udf) support java UDF with fixed-length input and output (#8516 ) This feature is propsoed in [DSIP-1](https://cwiki.apache.org/confluence/display/DORIS/DSIP-001%3A+Java+UDF). This PR support fixed-length input and output Java UDF. Phase I in DIP-1 is done after this PR. To support Java UDF effeciently, I use no data copy in JNI call and all compute operations are off-heap in Java. To achieve that, I use a UdfExecutor instead. For users, a UDF class must have a public evaluate method.	2022-03-23 10:32:50 +08:00
Zhengguo Yang	f8d086d87f	[feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519 ) Support implement UDF through GRPC protocol. This brings several benefits: 1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf 2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large. Create function like ``` CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES ( "SYMBOL"="add_int", "OBJECT_FILE"="127.0.0.1:9999", "TYPE"="RPC" ); ``` Function service need to implement `check_fn` and `fn_call` methods Note: THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!	2022-02-08 09:25:09 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00

32 Commits