Problem:
1. FE will split the parquet file into split. So a file can have several splits.
2. BE will scan each split, read the footer of the parquet file.
3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice.
This PR mainly changes:
1. Use kv cache to cache the footer of parquet file.
2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache.
3. In cache, the key is "meta_file_path", the value is parsed thrift footer.
The KV Cache is sharded into mutlti sub cache.
So that different file can use different sub cache, avoid blocking each other
In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system
Not test:
- cold & host data separation case.
There are many type definitions in BE. Should unify the type system and simplify the development.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:
Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:
Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
The _src_block_mem_reuse variable actually not work, since the _src_block is cleared each time when we call get_block.
But current code may cause core dump, see issue #17587. Because we insert some result column generated by expr into dest block, and such a column holds a pointer to some column in original schema. When clearing the data of _src_block, some column's data in dest block is also cleared.
e.g. coalesce will return a result column which holds a pointer to some original column, see issue #17588
the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong
Co-authored-by: tongyang.hty <hantongyang@douyu.tv>
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
fix heap-use-after-free
The OrcReader has a internal FileInputStream, If the file is empty, the memory of FileInputStream will leak.
Besides, there is a Statistics instance in FileInputStream. FileInputStream maybe delete if the orc reader
is inited failed, but Statistics maybe used when orc reader is closed, causing heap-use-after-free error.
Potential memory leak
When init file scanner in file scan node, the file scanner prepare failed, the memory of file scanner will leak.
background:
At the moment, match query must with inverted index,
problem description:
After drop inverted index which is the only index in table, there still can use match query for this index column.
fix it:
The index should be updated on BE regardless of whether the indexes_desc from FE is empty.
In previous implementation, when querying tvf, FE will get schema from BE.
And BE will try to open the first file to get its schema info, but for orc or parquet format,
if the file is empty, it will return error.
But even for an empty file, we can still get schema info from file's footer.
So we should handle the empty file to get schema info correctly.
Also modify the catalog doc to add some FAQ.
There are 2 kinds for scanner thread pool, local and remote.
Local is for local file read, specially for olap scanner.
Remote is for other external data source, such as file scanner, jdbc scanner.
This PR mainly changes:
For olap scanner, use cold or hot rowset to decide whether to use local or remote pool.
For other scanner, user remote pool by default.
Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512,
indicate the max thread number of the remote scanner thread pool
This will alleviate the problem of interaction between olap queries with load job and external queries.
* [improvement](block exception safe) make block queue exception safe
This is part of exception safe: #16366.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows.
Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
make rows_read correct so that the scheduler could using this correctly.
use single scanner if has limit clause. Move it from fragment context to scannode.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled