Commit Graph

53 Commits

Author SHA1 Message Date
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
d86c67863d Remove unused code (#21735) 2023-07-12 14:48:13 +08:00
90dd8716ed [refactor](multicast) change the way multicast do filter, project and shuffle (#21412)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>

1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
2023-07-04 16:51:07 +08:00
b86dd11a7d [fix](pipeline) refactor olap table sink close (#20771)
For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close()
only after all node channels are done or canceled, pending_finish() will return false, close() will start.
this will avoid block pipeline on close().

In close, check the index channel intolerable failure status after each node channel failure,
if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.
2023-07-04 11:27:51 +08:00
81abdeffbc [Improvement](pipeline) Improve shared scan performance (#20785) 2023-06-21 14:36:05 +08:00
d6b7640cf0 [fix](inverted index) fix check failed for block erase temp column (#20924) 2023-06-18 19:27:48 +08:00
ab32299ba4 [feature](nereids) Support multi target rf #20714
Support multi target runtime filter, mainly for set operation, such as union/intersect/except.
2023-06-16 20:26:00 +08:00
Pxl
17a395f5e3 [Bug](runtime-filter) fix runtime filter not register on vdata_gen_scan_node (#20787)
fix runtime filter not register on vdata_gen_scan_node
2023-06-15 14:06:14 +08:00
a6f625676b [profile](remove child) child is for node, should not be used to organize counters (#20676)
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.

                          -  MemoryUsage:  
                              -  HashTable:  23.98  KB
                              -  SerializeKeyArena:  446.75  KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-12 10:00:35 +08:00
1fc48e83f2 [fix](executor)Fix duplicate timer and add open timer #20448
1 Currently, Node's total timer couter has timed twice(in Open and alloc_resource), this may cause timer in profile is not correct.
2 Add more timer to find more code which may cost much time.
2023-06-06 08:55:52 +08:00
f0513a861d [Improve](Scan) add a session variable to make scan run serial (#20220)
Parallel scanning can result in some read amplification, for example, select * from xx where limit 1 actually requires only one row of data. However, due to parallel scanning of multiple tablets, read amplification occurs, leading to performance bottlenecks in high-concurrency scenarios. This PR Adding a SessionVariable to enforce serial scanning can help mitigate this issue.
2023-06-01 15:06:35 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
6efe6ef6e8 [Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#19389)
Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed.
Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value.

ssb flat test
previous
lineorder 1.2G:
load time: 3s, query time: 0.355s
lineorder 5.8G:
load time: 330s, query time: 0.970s
load time: 349s, query time: 0.949s
load time: 349s, query time: 0.955s
load time: 360s, query time: 0.889s (pipeline enabled)
after
lineorder 1.2G:
load time: 3s, query time: 0.349s
lineorder 5.8G:
load time: 342s, query time: 0.929s
load time: 337s, query time: 0.913s
load time: 345s, query time: 0.946s
load time: 346s, query time: 0.865s (pipeline enabled)
2023-05-23 18:17:21 +08:00
4e4fb33995 [refactor](conjuncts) simplify conjuncts in exec node (#19254)
Co-authored-by: yiguolei <yiguolei@gmail.com>
Currently, exec node save exprcontext**, but the object is in object pool, the code is very unclear. we could just use exprcontext*.
2023-05-04 18:04:32 +08:00
a262f42a28 [refactor](exceptionsafe) make scanner and scancontext exception safe (#19057) 2023-04-27 09:23:01 +08:00
3736530585 [refactor](query context) rename query fragments context to query context and make query context safe (#18950)
* [refactor](query context) rename query fragments context to query context and make query context safe

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 22:53:56 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
eb128753ac [Opt](pipeline) opt pipeline shared scan (#18715) 2023-04-17 13:06:39 +08:00
40a352959d [Pipeline](exec) Support shared scan in colo agg (#18457) 2023-04-13 17:25:41 +08:00
b4b126b817 [Feature](parquet-reader) Implements dict filter functionality parquet reader. (#17594)
Implements dict filter functionality parquet reader to improve performance.
2023-03-16 20:29:27 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
f9baf9c556 [improvement](scan) Support pushdown execute expr ctx (#15917)
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
2023-03-10 08:35:32 +08:00
4692d6764c [refactor](remove string val) remove string val structure, it is same with string ref (#17461)
remove stringval, decimalv2val, bigintval
2023-03-08 10:42:20 +08:00
9477c48ef8 [refactor](functioncontext) remove duplicate type definition in function context (#17421)
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-06 16:07:09 +08:00
f3ab55d27d [Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569) 2023-02-14 00:12:45 +08:00
646ba2cc88 [bugfix](scannode) 1. make rows_read correct 2. use single scanner if has limit clause (#16473)
make rows_read correct so that the scheduler could using this correctly.
use single scanner if has limit clause. Move it from fragment context to scannode.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-02-09 14:12:18 +08:00
0142ef8b95 [improvement](scanner) Supports bthread scanner (#16031) 2023-02-09 10:24:56 +08:00
7d34512501 [Bug](pipeline) Fix DCHECK failure (#15928) 2023-01-17 12:01:20 +08:00
b1caa68706 [Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823)
Issue Number: Step2 of DSIP-023: Add inverted index for full text search
implementation of inverted index reader

dependency pr: #14211 #15807 #15821
2023-01-17 09:13:56 +08:00
c4475a8dbc [Enhencement](jdbc scanner) add profile for jdbc scanner (#15914) 2023-01-14 10:28:59 +08:00
edecc2e706 [feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211)
* [feature-wip](inverted index)inverted index api: reader

* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL

* [feature-wip](inverted index) Adapt to index meta

* [enhance] add more metrics

* [enhance] add fulltext match query check for column type and index parser

* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
2022-12-30 21:48:14 +08:00
305dd15fea [improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035)
Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data.

Based on that situation, this pr will do:

1. this pr in order to support bitmap index apply compound predicates, query sql like:
select * from tb where a > 'hello' or b < 100;
select * from tb where a > 'hello' or b < 100 or c > 'ok';
select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200);
select * from tb where (not a> 'hello') or b < 100;
...
above sql,column a and b and c has created bitmap_index.

2. this optimization can reduce reading data by index
3. set config enable_index_apply_compound_predicates to use this optimization
2022-12-28 20:08:57 +08:00
0e651365ca [profile](scanner) add per scanner running time profile (#15321)
* [profile](scanner) add per scanner running time profile


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-26 08:55:07 +08:00
b085ff49f0 [refactor](non-vec) delete non-vec data sink (#15283)
* [refactor](non-vec) delete non-vec data sink

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-12-23 14:10:47 +08:00
2445ac9520 [Bug](runtimefilter) Fix BE crash due to init failure (#15228) 2022-12-21 15:36:22 +08:00
13bc8c2ef8 [Pipeline](runtime filter) Support runtime filters on pipeline engine (#15040) 2022-12-18 21:48:00 +08:00
wxy
af50461211 [fix](statistics) fix CpuTimeMS in audit log when enable_vectorized_engine=true. (#14853)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-12-09 21:13:05 +08:00
fcea89bcf4 [fix](const_expr) fix coredump caused by unsupported cast const expr (#14825) 2022-12-06 10:31:15 +08:00
b30cd86e9e [Refactor](pipeline) Refactor operator and builder code of pipeline (#14787) 2022-12-05 18:35:00 +08:00
8c0e13ab51 [improvement](profile) add detail memory counter for exec nodes (#14806)
* [improvement](profile) improve accuraccy of memory usage and add detail memory counter

* fix
2022-12-05 11:51:52 +08:00
12304bc0ee [Pipeline](exec) Support pipeline exec engine (#14736)
Co-authored-by: Lijia Liu <liutang123@yeah.net>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: shee <13843187+qzsee@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>

## Problem Summary:

### 1. Design

DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine

### 2. How to use:

Set the environment variable `set enable_pipeline_engine = true; `
2022-12-02 17:11:34 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
287ff50a6f [Bug](datev2) Fix compatible error between datev2 and date (#13024) 2022-09-29 18:01:55 +08:00
820ec435ce [feature-wip](parquet-reader) refactor parquet_predicate (#12896)
This change serves the  following purposes:
1.  use ScanPredicate instead of TCondition for external table, it can reuse old code branch.
2. simplify and delete some useless old code
3.  use ColumnValueRange to save predicate
2022-09-28 21:27:13 +08:00
d80b7b9689 [feature-wip](new-scan) support more load situation (#12953) 2022-09-27 21:48:32 +08:00
ec2b3bf220 [feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793)
Refactor of scanners. Support broker load.
This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner.
Work still in progress.
2022-09-21 12:49:56 +08:00
5978fd9647 [refactor](file scanner)Refactor file scanner. (#12602)
Refactor the scanners for hms external catalog, work in progress.
Use VFileScanner, will remove NewFileParquetScanner, NewFileOrcScanner and NewFileTextScanner after fully tested.
Query for parquet file has been tested, still need to add readers for orc file, text file and load logic as well.
2022-09-19 15:23:51 +08:00