Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows.
Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
make rows_read correct so that the scheduler could using this correctly.
use single scanner if has limit clause. Move it from fragment context to scannode.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled
Issue Number: close #xxx
This pr fix two bugs:
_jdbc_scanner may be nullptr in vjdbc_connector.cpp, so we use another method to count jdbc statistic. close [Enhencement](jdbc scanner) add profile for jdbc scanner #15914
In the batch insertion scenario, oracle database does not support syntax insert into tables values (...),(...); , what it supports is:
insert all
into table(col1,col2) values(c1v1, c2v1)
into table(col1,col2) values(c1v2, c2v2)
SELECT 1 FROM DUAL;
Support iceberg schema evolution for parquet file format.
Iceberg use unique id for each column to support schema evolution.
To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side.
Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.
TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.
After the second phase read, Block will contain all the data needed for the query
The origin scan pools are in exec_env.
But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used.
All scan task will be submitted to the scan pool in scanner_scheduler.
BTW, reorganize the scan pool into 3 kinds:
local scan pool
For olap scan node
remote scan pool
For file scan node
limited scan pool
For query which set cpu resource limit or with small limit clause
TODO:
Use bthread to unify all IO task.
Some trivial issues:
fix bug that the memtable flush size printed in log is not right
Add RuntimeProfile param in VScanner
Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")`
we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info of a table. The other iceberg metadata will be supported later when needed.
One of the usage:
Before we use following sql to time travel:
`select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`;
`select * from ice_table FOR VERSION AS OF "snapshot_id"`;
we can use the snapshots metadata to get the `committed time` or `snapshot_id`,
and then, we can use it as the time or version in time travel clause
The main purpose of this pr is to import `fileCache` for lakehouse reading remote files.
Use the local disk as the cache for reading remote file, so the next time this file is read,
the data can be obtained directly from the local disk.
In addition, this pr includes a few other minor changes
Import File Cache:
1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy.
2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`.
Other changes:
1. Add a new interface `fs()` for `FileReader`.
2. `IOContext` adds some statistical information to count the situation of `FileCache`
Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
Now in ScannerContext::push_back_scanner_and_reschedule, _num_running_scanners-- is before _num_scheduling_ctx++.
InPipScannerContext::can_finish, we check _num_running_scanners == 0 && _num_scheduling_ctx == 0 without obtaining _transfer_lock.
In follow case, PipScannerContext::can_finish will return wrong result.
_num_running_scanners--
Check _num_running_scanners == 0 && _num_scheduling_ctx == 0` return true.
_num_scheduling_ctx++
So, we can set _num_running_scanners-- in the last of this func.
Describe your changes.
PipScannerContext::get_block_from_queue not block.
Set _num_running_scanners-- in the last of ScannerContext::push_back_scanner_and_reschedule.
A deleted file may belong to multiple data files. Each data file will read a full amount of deleted files,
so a deleted file may be read repeatedly. The deleted files can be cached, and multiple data files
can reuse the first read content.
The performance is improved by 60% in the case of single thread, and by 30% in the case of multithreading.
* [feature-wip](inverted index)inverted index api: reader
* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL
* [feature-wip](inverted index) Adapt to index meta
* [enhance] add more metrics
* [enhance] add fulltext match query check for column type and index parser
* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
**Optimize**
PR #14470 has used `Expr` to filter delete rows to match current data file,
but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files)
to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows.
In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path`
columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column.
After testing, the performance of iceberg v2's MOR is improved by 30%+.
**Fix Bug**
Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr`
and the batch_eof is generated from next batch.
Current column default value is used only for load task. But in the case of Iceberg schema change,
query task is also possible to read the default value for columns not exist in old schema.
This pr is to support default value for query task.
Manually tested the broker load and external emr regression cases.