Commit Graph

2684 Commits

Author SHA1 Message Date
370b92a2ea [Bug](compile) fix clang build for be (#12183) 2022-08-30 17:58:54 +08:00
9a74ad1702 [feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842)
We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE.
This PR:
- add projections on each ExecNode in BE
- translate PhysicalProject into projections on PlanNode in FE
- do column prune on ScanNode in FE

Co-authored-by: HappenLee <happenlee@hotmail.com>
2022-08-30 16:17:10 +08:00
a16cf0e2c8 [feature-wip](scan) add profile for new olap scan node (#12042)
Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner.
Fix some blocking bug of new scan framework.
TODO:

Memtracker
Opentelemetry spen
The new framework is still disabled by default, so it will not effect other feature.
2022-08-30 10:55:48 +08:00
8370115cf6 [enhancement](memtracker) Improve performance of tracking real physical memory of PODArray #12168 2022-08-30 10:22:12 +08:00
2f192019d3 [bugfix](delete hanlder) delete predicate is merged and could not find schema cause core dump (#12161)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-30 09:18:21 +08:00
Pxl
67e94d2aea [Enhancement](compaction) add compaction use time count (#12141) 2022-08-30 09:18:02 +08:00
09b8d32421 [fix](memtracker) Fix mem limit exceed return wrong format (#12139) 2022-08-29 21:07:02 +08:00
ed131b8eb0 [Bugfix](coredump) fix coredump cause by fmt::format param malformt (#12138)
fix coredump cause by fmt::format param malformt
2022-08-29 12:45:22 +08:00
af09c1f4eb [Improvement](window funnel) restrict timestamp to datetime type in window funnel (#12123) 2022-08-29 12:14:04 +08:00
5f7d6e8f2b [Refactor](predicate) Unify Conditions and ColumnPredicate (#11985) 2022-08-29 12:11:22 +08:00
62e3bd338e [refactor](BE) return error status when vslot_ref contains invalid slot_id (#12106)
In current implementation, we detect invalid slot at execute phase. At execute phase, it is hard to get useful information for further debug. This pr moves error detection ahead to prepare phase, so that we can log related tuple descriptors.
2022-08-29 12:07:08 +08:00
db07e51cd3 [refactor](status) Refactor status handling in agent task (#11940)
Refactor TaggableLogger
Refactor status handling in agent task:
Unify log format in TaskWorkerPool
Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status
Premature return with the opposite condition to reduce indention
2022-08-29 12:06:01 +08:00
ac425d4bf3 [fix](remote)Fix bug for cache reader (#12104) 2022-08-29 11:28:17 +08:00
44c4a45f72 [fix](array-type) fix the wrong data when use stream load to import '\N' (#12102)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-29 09:53:37 +08:00
dec576a991 [feature-wip](parquet-reader) generate null values and NullMap for parquet column (#12115)
Generate null values and NullMap for the nullable column by analyzing the definition levels.
2022-08-29 09:30:32 +08:00
6e6269c682 [Improvement](load) accelerate streamload and compaction (#12119)
* [Improvement](load) accelerate streamload and compaction
2022-08-28 23:10:47 +08:00
a6e2e2f3bc [feature](remote)Add cache files cleaner for remote olap files (#11959) 2022-08-26 23:59:36 +08:00
0b5bb565a7 [feature-wip](parquet-reader) parquet dictionary decoder (#11981)
Parse parquet data with dictionary encoding.

Using the PLAIN_DICTIONARY enum value is deprecated in the Parquet 2.0 specification.
Prefer using RLE_DICTIONARY in a data page and PLAIN in a dictionary page for Parquet 2.0+ files.
refer: https://github.com/apache/parquet-format/blob/master/Encodings.md
2022-08-26 19:24:37 +08:00
f3f17eb222 [Bugfix](load) fix be will coredump when parsing malformed json file using simdjson (#12062)
* [Bugfix](load) fix be will coredump when parsing malformed json file using simdjson
2022-08-26 18:01:19 +08:00
fba2658a1d [fix](array-type) fix the be core dump when use collect_list result to insert (#12045)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-26 18:00:43 +08:00
Pxl
3af0745c8f [Bug](function) fix aggFnParams set not correct (#12006) 2022-08-26 14:29:56 +08:00
22157077e9 [fix](memtracker) Optimize the return msg of process memory limit exceed #12086
Return the real process memory information when the process exceeds mem limit
Optimize the memory exceed limit log printing logic
process tracker does not participate in process memory limit.
2022-08-26 14:28:46 +08:00
9caaa4bfbd [fix](memory) fix set disable_chunk_allocator_in_vec=false performance #12092 2022-08-26 14:28:12 +08:00
ccff3f5711 [bugfix](light weight schema change) support delete condition in schema change (#11869)
* [bugfix](light weight schema change) support delete condition in schema change


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-26 11:45:55 +08:00
82ca62dfcc [fix](memory) Fix disable_mem_pools to disable cache #12087 2022-08-26 11:43:19 +08:00
0f4a1e811b [Enhancement](table_function) table function node enhancement (#12038)
* table function node enhancement

* also avoid copy for non-vec table function node

* fix table function node output slots calculation while lateral view involves subquery

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-26 10:37:15 +08:00
ba11d8dc67 [feature-wip](unique-key-merge-on-write) fix bugs on tablet clone #12067 2022-08-26 10:37:00 +08:00
0c16740f5c [feature-wip](parquet-reader) parquert scanner can read data (#11970)
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-08-26 09:43:46 +08:00
721d418a2f [feature-wip](unique-key-merge-on-write) fix that version is awlays 0 when update delete bitmap (#12044) 2022-08-26 09:41:55 +08:00
e5bfbbe761 [feature-wip](unique-key-merge-on-write) support alter table column for MoW (#12052) 2022-08-26 09:40:11 +08:00
17b809210a [Bug](runtime filter) fix bug for late-arrival runtime filters (#12049) 2022-08-26 09:13:10 +08:00
e3ab2caef8 [improvement](sink) Support local exchange for multi fragment instances (#12017) 2022-08-25 19:28:23 +08:00
588dc5f12a [feature](cold_on_s3) Show remote data usage via SHOW BACKENDS and SHOW TABLETS statements (#11450) 2022-08-25 15:36:15 +08:00
003fdf2b36 [fix](scan) use serial scan thread token only for scan node (#12058)
Only the scan node's limit is less than 1024, we can use serial thread token to submit scanners.
Or it will slow down the query.
2022-08-25 14:54:02 +08:00
Pxl
620d33a763 [Enchancement](optimize) set result_size_hint to filter_block (#11972) 2022-08-25 11:42:52 +08:00
73a3471fbd [minor](conjuncts) remove row-based conjuncts from vectorized engine (#12053) 2022-08-25 10:13:20 +08:00
54fc038dc5 [Fix](remote) Fix thread safety issue in cache (#11984) 2022-08-24 18:14:14 +08:00
f875684345 [fix](agg) Crashing caused by serialization in streaming aggregation (#12027) 2022-08-24 14:38:25 +08:00
1304a17600 [fix](memtracker) Improve performance of tracking real physical memory of PodArray #12021 2022-08-24 14:24:14 +08:00
fb3c00c943 [Improvement](storage) reuse schema and rowblockv2 on single scanner_thread (#11392)
* support reuse rowblockv2 on single thread
2022-08-24 13:42:10 +08:00
ba85c06a68 [feature-wip](unique-key-merge-on-write) fix that IndexedColumnIterator next batch may return empty result (#11928) 2022-08-24 08:53:44 +08:00
3abc4f357f [Bug](bitmap) intersect_count function use in string cause ASAN error (#11936) 2022-08-24 08:51:53 +08:00
5d627e41a4 [fix](array-type) fix the be core dump when import number larger than uint64 (#11853)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-24 08:51:12 +08:00
1fc5515a78 [enhancement](memory) Remove unused reservation tracker (#11969) 2022-08-24 08:49:34 +08:00
d06edd4b8b [minor](runtime-filter) add DCHECK for runtimefilter bug (#11996)
Not a fix, just add debug info to try find root cause of #11995
2022-08-24 07:53:30 +08:00
cbbf4e10ff [fix](array-type) fix be occasional coredump when use stream load (#11997)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-23 21:54:00 +08:00
1056a6d8c7 [bug](compaction) fix bug of coredump of filter delete chose wrong filter column (#12002)
* [bug](compaction) fix bug of coredump of filter delete chose wrong filter column

* clang format
2022-08-23 21:52:11 +08:00
55fdb555be [bugfix](dict) fix coredump of dict colum range predicate when there is null value (#11967) 2022-08-23 16:07:48 +08:00
60fddd56e7 [feature-wip](unique-key-merge-on-write) opt lock and only save valid delete_bitmap (#11953)
1. use rlock in most logic instead of wrlock
2. filter stale rowset's delete bitmap in save meta
3. add a delete_bitmap lock to handle compaction and publish_txn confict

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-23 14:43:40 +08:00
05da3d947f [feature-wip](new-scan) add scanner scheduling framework (#11582)
There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including:

Runtime filter
Predicate pushdown
Scanner generation and scheduling
So I intend to unify the common logic of all ScanNodes.
Different data sources only need to implement different Scanners for data access.
So that the future optimization for scan can be applied to the scan of all data sources,
while also reducing the code duplication.

This PR mainly adds 4 new class:

VScanner
All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods.

VScanNode
The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling.

ScannerContext
ScannerContext is responsible for recording the execution status
of a group of Scanners corresponding to a ScanNode.
Including how many scanners are being scheduled, and maintaining
a producer-consumer blocks queue between scanners and scan nodes.

ScannerContext is also the scheduling unit of ScannerScheduler.
ScannerScheduler schedules a ScannerContext at a time,
and submits the Scanners to the scanner thread pool for data scanning.

ScannerScheduler
Unified responsible for all Scanner scheduling tasks

Test:
This work is still in progress and default is disabled.
I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data.
The QPS can reach about 9000.
I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready.
Co-authored-by: morningman <morningman@apache.org>
2022-08-23 08:45:18 +08:00