Commit Graph

64 Commits

Author SHA1 Message Date
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
21ea0055fc [improvement](scanner) use batch size of session instead of limit to improve performance of reading (#22240) 2023-07-26 18:57:42 +08:00
103c473b96 [Bug](pipeline) fix pipeline shared scan + topn optimization (#21940) 2023-07-25 12:48:27 +08:00
9cad929e96 [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. (#21741)
* [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query.

related pr #20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.

* not use unused rowsets
2023-07-13 11:46:12 +08:00
d86c67863d Remove unused code (#21735) 2023-07-12 14:48:13 +08:00
7f0e37069f [improvement](olap) filter the whole segment by dictionary (#21239) 2023-06-29 10:34:29 +08:00
0f470fec0e [Bug](topn opt) Fix Two-Phase read when some rowset swept (#20732)
* [Bug](topn opt) Fix Two-Phase read when some rowset swept

If this is a Two-Phase read query, and we need to delay the release of Rowset by row->update_delayed_expired_timestamp() to expand the lifespan of rowsets. This is necessary to avoid data loss during the second phase reading, where some stale rowsets may be swept and result in missing data.
2023-06-14 15:46:29 +08:00
2dddab03a1 [compatibility](schema cache) ensure schema version when using schema cache (#20729)
When FE is old version, be is new version, issue a schema change(add column) and
then query, old version of FE query without schema version could result in reading
stale schema from schema cache
2023-06-13 15:19:26 +08:00
841094960f [fix](olapscanner) fix coredump caused by concurrent acccess of olap scan node _conjuncts (#20534)
=3073084==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60601897db80 at pc 0x55b2c993666e bp 0x7d1fbbfb66b0 sp 0x7d1fbbfb66a8
READ of size 8 at 0x60601897db80 thread T610 (_scanner_scan)
    #0 0x55b2c993666d in std::__shared_ptr<doris::vectorized::VExprContext, (__gnu_cxx::_Lock_policy)2>::get() const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1291:16
    #1 0x55b2dae86ec5 in doris::vectorized::VExprContext::clone(doris::RuntimeState*, std::shared_ptr<doris::vectorized::VExprContext>&) /mnt/disk2/tengjianping/doris-master/be/src/vec/exprs/vexpr_context.cpp:98:5
    #2 0x55b2e757b6d8 in doris::vectorized::VScanner::prepare(doris::RuntimeState*, std::vector<std::shared_ptr<doris::vectorized::VExprContext>, std::allocator<std::shared_ptr<doris::vectorized::VExprContext>>> const&) /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/vscanner.cpp:47:13
    #3 0x55b2e78e8155 in doris::vectorized::NewOlapScanner::init() /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/new_olap_scanner.cpp:109:5
    #4 0x55b2e7551c81 in doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/scanner_scheduler.cpp:279:27
    #5 0x55b2e7554d5e in doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()::operator()() const /mnt/disk2/tengjianping/doris-master/be/src/vec/exec/scan/scanner_scheduler.cpp:202:31
    #6 0x55b2e7554c14 in void std::__invoke_impl<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&>(std::__invoke_other, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #7 0x55b2e7554bb4 in std::enable_if<is_invocable_r_v<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&>, void>::type std::__invoke_r<void, doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&>(doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
    #8 0x55b2e7554a1c in std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_0::operator()() const::'lambda0'()>::_M_invoke(std::_Any_data const&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9
    #9 0x55b2c80f2cd2 in std::function<void ()>::operator()() const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
    #10 0x55b2e755f3e4 in doris::PriorityWorkStealingThreadPool::work_thread(int) /mnt/disk2/tengjianping/doris-master/be/src/util/priority_work_stealing_thread_pool.hpp:135:17
    #11 0x55b2e7563c72 in void std::__invoke_impl<void, void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&>(std::__invoke_memfun_deref, void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74:14
    #12 0x55b2e7563b44 in std::__invoke_result<void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&>::type std::__invoke<void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&>(void (doris::PriorityWorkStealingThreadPool::* const&)(int), doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14
    #13 0x55b2e7563b14 in decltype(std::__invoke((*this)._M_pmf, std::forward<doris::PriorityWorkStealingThreadPool*&>(fp), std::forward<int&>(fp))) std::_Mem_fn_base<void (doris::PriorityWorkStealingThreadPool::*)(int), true>::operator()<doris::PriorityWorkStealingThreadPool*&, int&>(doris::PriorityWorkStealingThreadPool*&, int&) const /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:131:11
    #14 0x55b2e7563ae4 in void std::__invoke_impl<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&>(std::__invoke_other, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #15 0x55b2e7563a54 in std::enable_if<is_invocable_r_v<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&>, void>::type std::__invoke_r<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&>(std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)>&, doris::PriorityWorkStealingThreadPool*&, int&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
    #16 0x55b2e75639c3 in void std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>::__call<void, 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:570:11
    #17 0x55b2e756382d in void std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>::operator()<>() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:629:17
    #18 0x55b2e7563744 in void std::__invoke_impl<void, std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>(std::__invoke_other, std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>&&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #19 0x55b2e7563704 in std::__invoke_result<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>::type std::__invoke<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>(std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>&&) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14
    #20 0x55b2e75636dc in void std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13
    #21 0x55b2e75636b4 in std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>>::operator()() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11
    #22 0x55b2e7563638 in std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::PriorityWorkStealingThreadPool::*)(int)> (doris::PriorityWorkStealingThreadPool*, int)>>>>::_M_run() /mnt/disk2/tengjianping/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13
    #23 0x55b2eb41d0ef in execute_native_thread_routine /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/thread.cc:82:18
    #24 0x7f1dfd4e1179 in start_thread pthread_create.c
    #25 0x7f1dfdd7bdf2 in clone (/lib64/libc.so.6+0xfcdf2) (BuildId: 20ee73ce1b6ac38a52440bab82ec7e28f0f5c5b9)
2023-06-07 17:00:29 +08:00
de08c4a57b [enhance](match) Support match query without inverted index (#19936) 2023-05-30 15:02:57 +08:00
ab8125d56f [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037)
* [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema

1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse.

2. Make some variables wrapped with std::unique<unique_ptr>

Performance:
| 状态              | QPS | 平均响应时间 (avg) | P99 响应时间 |
|------------------|-----|------------------|-------------|
| 开启 SchemaCache | 501 | 20ms             | 34ms        |
| 关闭 SchemaCache | 321 | 31ms             | 61ms        |

* handle schema change with schema version

* remove useless header

* rebase
2023-05-29 17:34:53 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
Pxl
15a7420661 [Chore](ub) fix some undefined behaviors (#19986)
/home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType'

/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment

/home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0

/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null
2023-05-26 14:08:40 +08:00
92a6122f74 [feature](profile)Add the filtering information of the Bloom filter in profile. (#19789) 2023-05-26 10:56:58 +08:00
92bf485abd [Bug] Fix doris pipeline shared scan and top n opt (#19599) 2023-05-15 10:00:44 +08:00
4e4fb33995 [refactor](conjuncts) simplify conjuncts in exec node (#19254)
Co-authored-by: yiguolei <yiguolei@gmail.com>
Currently, exec node save exprcontext**, but the object is in object pool, the code is very unclear. we could just use exprcontext*.
2023-05-04 18:04:32 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
60c0bbe272 [fix](profile) fix show load query profile (#18487)
Sometimes, `show load profile` will only show part of the insert opertion's profile.
This is because we assume that for all load operation(including insert), there is only one fragment in the plan.
But actually, there will be more than 1 fragment in plan. eg:

`insert into tbl1 select * from tbl1 limit 1` will have 2 fragments.

This PR mainly changes:

1. modify the `show load profile`
   Before:  `show load profile "/queryid/taskid/instanceid";`
   After: `show load profile "/queryid/taskid/fragmentid/instanceid";`

2. Modify the display of `ReadColumns` in OlapScanNode
    Because for wide table, the line of `ReadColumns` may be too long for show in profile.
    So I wrap it and each line contains at most 10 columns names.

3. Fix tvf not working with pipeline engine, follow up #18376
2023-04-09 08:41:18 +08:00
af80e65094 [Improve](FileCahe) Support the file cache profile in olap scan node and Update the profile (#17710)
We want to use file cache for caching cold data in S3.
When reading them, we want to know where the data come from and the time taken to read the datas.
So we support the metrics in olap scan node.
And for clearing the information, i also update the fields about the metrics.
2023-04-04 10:18:30 +08:00
4e1e0ce06d [bugfix](topn) fix topn optimzation wrong result for NULL values (#18121)
1. add PassNullPredicate to fix topn wrong result for NULL values
2. refactor RuntimePredicate to avoid using TCondition
3. refactor using ordering_exprs in fe and vsort_node
2023-03-31 10:01:34 +08:00
fd5dd9a391 [Opt](Pipeline) opt pipeline code in mult tablet (#17999) 2023-03-27 10:02:48 +08:00
e8b9587fe6 [Improvement](dict) compute hash only if needed (#18058) 2023-03-24 11:45:58 +08:00
c29582bd57 [pipeline](split by segment)support segment split by scanner (#17738)
* support segment split by scanner

* change code by cr
2023-03-16 15:25:52 +08:00
f9baf9c556 [improvement](scan) Support pushdown execute expr ctx (#15917)
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
2023-03-10 08:35:32 +08:00
3a877857ae [improvement](inverted index)Remove searcher bitmap timer to improve query speed (#17407)
Timer becomes a bottleneck when the query hit volume is very high.
2023-03-08 14:03:36 +08:00
9477c48ef8 [refactor](functioncontext) remove duplicate type definition in function context (#17421)
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-06 16:07:09 +08:00
707f814fc2 [fix](inverted index) fix still execute match query after drop inverted index (#17293)
background:
At the moment, match query must with inverted index,

problem description:
After drop inverted index which is the only index in table, there still can use match query for this index column.

fix it:
The index should be updated on BE regardless of whether the indexes_desc from FE is empty.
2023-03-02 11:12:54 +08:00
84413f33b8 [enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127) 2023-02-27 23:31:28 +08:00
c0bb2e33a8 [improvement](scan) separate scanner into local and remote scanner pool (#16891)
There are 2 kinds for scanner thread pool, local and remote.
Local is for local file read, specially for olap scanner.
Remote is for other external data source, such as file scanner, jdbc scanner.

This PR mainly changes:

For olap scanner, use cold or hot rowset to decide whether to use local or remote pool.
For other scanner, user remote pool by default.
Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512,
indicate the max thread number of the remote scanner thread pool

This will alleviate the problem of interaction between olap queries with load job and external queries.
2023-02-21 14:13:09 +08:00
Pxl
ea78184551 [Feature](Materialized-View) support multiple slot on one column in materialized view (#16378) 2023-02-14 16:10:50 +08:00
f3ab55d27d [Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569) 2023-02-14 00:12:45 +08:00
09b7c22f6b [Opt](exec) remove unless null key when no split in convert key range (#16624) 2023-02-11 15:44:35 +08:00
aba843bb2b [Improvement](inverted index) inverted index query match bitmap cache (#16578)
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. 

Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
2023-02-11 13:38:58 +08:00
43eca4f209 [Feature-WIP](inverted index) Implementation for alter inverted index. (#16371)
implementation for add/drop inverted index.
2023-02-10 17:56:17 +08:00
737c73dcf0 [Improvement](topn) order by key topn query optimization (#15663) 2023-02-06 15:36:05 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
bb0d4ba787 [BugFix](sort) use correct agg function when using 2 phase sort for agg table (#16185) 2023-02-01 20:07:43 +08:00
4b6a4b3cf7 [refactor](remove unused code) Remove unused mempool declare or function params (#16222)
* Remove unused mempool declare or function params

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-30 13:03:18 +08:00
3894de49d2 [Enhancement](topn) support two phase read for topn query (#15642)
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
2023-01-19 10:01:33 +08:00
b1caa68706 [Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823)
Issue Number: Step2 of DSIP-023: Add inverted index for full text search
implementation of inverted index reader

dependency pr: #14211 #15807 #15821
2023-01-17 09:13:56 +08:00
bdec4d5ac2 [enhancement](profile) add read columns to scanner profile (#15902) 2023-01-16 19:32:46 +08:00
3fec5ff0f5 [refactor](scan-pool) move scan pool from env to scanner scheduler (#15604)
The origin scan pools are in exec_env.
But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used.
All scan task will be submitted to the scan pool in scanner_scheduler.

BTW, reorganize the scan pool into 3 kinds:

local scan pool
For olap scan node

remote scan pool
For file scan node

limited scan pool
For query which set cpu resource limit or with small limit clause

TODO:
Use bthread to unify all IO task.

Some trivial issues:

fix bug that the memtable flush size printed in log is not right
Add RuntimeProfile param in VScanner
2023-01-11 09:38:42 +08:00
Pxl
1514b5ab5c [Feature](Materialized-View) support advanced Materialized-View (#15212) 2023-01-09 09:53:11 +08:00
9d1f02c580 [Improvement](topn) runtime prune for topn query (#15558) 2023-01-05 20:10:12 +08:00
edecc2e706 [feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211)
* [feature-wip](inverted index)inverted index api: reader

* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL

* [feature-wip](inverted index) Adapt to index meta

* [enhance] add more metrics

* [enhance] add fulltext match query check for column type and index parser

* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
2022-12-30 21:48:14 +08:00
305dd15fea [improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035)
Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data.

Based on that situation, this pr will do:

1. this pr in order to support bitmap index apply compound predicates, query sql like:
select * from tb where a > 'hello' or b < 100;
select * from tb where a > 'hello' or b < 100 or c > 'ok';
select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200);
select * from tb where (not a> 'hello') or b < 100;
...
above sql,column a and b and c has created bitmap_index.

2. this optimization can reduce reading data by index
3. set config enable_index_apply_compound_predicates to use this optimization
2022-12-28 20:08:57 +08:00
Pxl
bba77fa9dd [Enhancement](profile) enhance column predicates display on profile (#14664) 2022-12-01 13:07:12 +08:00
7873bc95a6 [Enhancement](bitmapfilter) Support bitmap filter to apply zone_map index to filter pages (#14635) 2022-12-01 10:41:09 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
d55faa7f6a [feature](remote)Only query can use local cache when reading remote files. (#13865)
When calling select on remote files, download cache files to local disk.
When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.
2022-11-14 10:30:15 +08:00