Commit Graph

454 Commits

Author SHA1 Message Date
e08de52ee7 [chore](compile) using PCH for compilation acceleration under clang (#19303) 2023-05-08 19:51:06 +08:00
28016c53f0 [profile](rf) refactor profile of runtime filters (#19134)
* [profile](rf) refactor profile of runtime filters


---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-04-28 08:46:42 +08:00
aabcab9dbe [Improvement](runtime filter) Improve merge phase (#18828) 2023-04-26 21:01:20 +08:00
8d7a9fd21b [refactor](exceptionsafe) add factory creator to some class (#18978)
make vexprecontext,vexpr,function,query context,runtimestate thread safe.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-24 10:32:11 +08:00
b81b470d4f [fix](planner) fix pr "using crchash replace murmurhash in the runtime filter" (#18759) 2023-04-23 10:33:35 +08:00
293e115536 [Improvement](bloom filter) initialize bloom filter with adaptive size (#18785) 2023-04-20 10:06:40 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
3de4d64657 [chore](hashtable) Use doris' Allocator to replace std::allocator in phmap (#18735) 2023-04-18 09:58:28 +08:00
c904384672 Revert "[refactor](planner) using crchash replace murmurhash in the runtime filter (#18472)" (#18730)
This reverts commit a8315b86ca5543a6cc5b3eab97e4f0953b984247.
2023-04-17 20:25:18 +08:00
9e960f4c4f [chore](build) Use include-what-you-use to optimize includes (#18681)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-17 11:44:58 +08:00
4cde3d4f21 [Enhancement](Expr) Change small fix container size of In set to 8. (#18492)
In #17976, we introduced small fix container to optimize the in expr. This PR will change small fix container size of In set to 8, which has better performance when size > 8 by the perf test.
2023-04-14 18:19:45 +08:00
4335c9998f [chore](ARM) Add some vectorization compatibility code on aarch64 (#18553)
update sse2noen to support more sse code on arm cpus
2023-04-13 10:15:33 +08:00
a8315b86ca [refactor](planner) using crchash replace murmurhash in the runtime filter (#18472)
When the be_exec_version is less than 2, murmurhash will still be used, otherwise crc32 will be used. When the be_exec_version is upgraded to 2, please remove.
2023-04-10 14:12:39 +08:00
Pxl
c9b4eaea76 [Chore](storage) change FieldType to enum class #18500 2023-04-10 08:53:44 +08:00
a01d824256 [Improvement](bloom filter) inline function call (#18396) 2023-04-06 10:21:48 +08:00
a724443eb9 [Improvement](predicate) optimize short-circuit predicates (#18278)
For scan node with no vectorized predicate, the input column for the first short-circuit predicate is dense and we don't need to access the selector column.

This PR improve performance by ~30% on TPCH Q3.
2023-04-04 10:21:41 +08:00
a77921d767 [refactor](typesystem) remove unused rpc common file and using function rpc (#18270)
rpc common is duplicate, all its method is included in function rpc. So that I remove it.
get_field_type is never used, remove it.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-31 18:13:25 +08:00
b7af110f61 [Bug](bloomfilter) Fix bloom filter for date type (#18205) 2023-03-30 14:15:06 +08:00
6b6682cd96 [Enhancement](Expr) Opt In Set by small size fixed container to improve performance. (#17976) 2023-03-28 23:10:39 +08:00
77c9550420 [fix](bitmapfilter) fix bitmap filter timeout unit error (#18110) 2023-03-25 21:46:32 +08:00
7ae51c856e [refactor](unify exception) unify exception definition and error code (#18006)
* [refactor](unify exception) unify exception definition and error code


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-25 12:41:07 +08:00
bd8e3e6405 [refactor](date) unify DateTimeValue and VecDateTimeValue (#17670) 2023-03-20 16:27:08 +08:00
77ab2fac20 [refactor](functioncontext) remove function context impl class (#17715)
* [refactor](functioncontext) remove function context impl class


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-03-14 11:21:45 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
Pxl
e2ac06d6d6 [Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300)
1. change PipelineTaskState to enum class
2. remove some row-based code on FoldConstantExecutor::_get_result
3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned)
4. add Wunused-template check, and remove some unused function, change some static function to inline function.
2023-03-08 12:41:15 +08:00
4692d6764c [refactor](remove string val) remove string val structure, it is same with string ref (#17461)
remove stringval, decimalv2val, bigintval
2023-03-08 10:42:20 +08:00
9477c48ef8 [refactor](functioncontext) remove duplicate type definition in function context (#17421)
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-06 16:07:09 +08:00
17f4990bd3 [enhancement](functioncontext) function context should use shared ptr and simply function context (#17311)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-02 16:23:54 +08:00
b8ebcdff78 [Bug](bloomfilter) Fix wrong result using bloomfilter with date type (#17225) 2023-03-01 12:29:20 +08:00
113023fb86 (Enhancement)[load-json] support simdjson in new json reader (#16903)
be config:
enable_simdjson_reader=true

related PR #11665
2023-02-21 11:31:00 +08:00
e04c13b7a6 [enhancement](exception safe) make function state exception safe (#16771) 2023-02-20 23:01:45 +08:00
e1ef03b9d3 [Improvement](static variable) Fix exprs/MathFunctions static variable (#16687)
Use static constexpr variable in impl file to avoid multi-addressing
Remove unused my_double_round in vec/functions/math.cpp
2023-02-14 14:46:29 +08:00
cf739e7496 [Enhancement](Stmt) Set insert_into timeout session variable separately (#16343) 2023-02-12 16:56:10 +08:00
0142ef8b95 [improvement](scanner) Supports bthread scanner (#16031) 2023-02-09 10:24:56 +08:00
9114896178 [DecimalV3](opt) opt the function of decimalv3 to_string logic (#16427) 2023-02-07 13:28:07 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
7cf7706eb1 [Bug](runtimefilter) Fix wrong runtime filter on datetime (#16102) 2023-01-28 18:16:06 +08:00
e49766483e [refactor](remove unused code) remove many xxxVal structure (#16143)
remove many xxxVal structure
remove BetaRowsetWriter::_add_row
remove anyval_util.cpp
remove non-vectorized geo functions
remove non-vectorized like predicate
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-28 14:17:43 +08:00
adb758dcac [refactor](remove non vec code) remove json functions string functions match functions and some code (#16141)
remove json functions code
remove string functions code
remove math functions code
move MatchPredicate to olap since it is only used in storage predicate process
remove some code in tuple, Tuple structure should be removed in the future.
remove many code in collection value structure, they are useless
2023-01-26 16:21:12 +08:00
615a5e7b51 [refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-25 12:53:05 +08:00
6e8eedc521 [refactor](remove unused code) remove storage buffer and orc reader (#16137)
remove olap storage byte buffer
remove orc reader
remove time operator
remove read_write_util
remove aggregate funcs
remove compress.h and cpp
remove bhp_lib

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-24 22:29:32 +08:00
79ad74637d [refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136) 2023-01-24 10:45:35 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
6485221ffb [Feature-WIP](inverted index)(bkd) Support try query before query bkd to improve query efficiency (#16075) 2023-01-20 11:19:36 +08:00
3894de49d2 [Enhancement](topn) support two phase read for topn query (#15642)
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
2023-01-19 10:01:33 +08:00
95c91fab2e [refactor](vec) delete non-vec runtime filter (#16016)
* [refactor](vec) delete non-vec runtime filter

* update
2023-01-18 17:49:20 +08:00
0fbdf8e3e1 [Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772) 2023-01-12 15:08:21 +08:00
d0e8f84279 [feature](vectorized) Support MemoryScratchSink on vectorized engine (#15612) 2023-01-10 10:38:35 +08:00