Commit Graph

649 Commits

Author SHA1 Message Date
1fc5515a78 [enhancement](memory) Remove unused reservation tracker (#11969) 2022-08-24 08:49:34 +08:00
cbbf4e10ff [fix](array-type) fix be occasional coredump when use stream load (#11997)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-23 21:54:00 +08:00
05da3d947f [feature-wip](new-scan) add scanner scheduling framework (#11582)
There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including:

Runtime filter
Predicate pushdown
Scanner generation and scheduling
So I intend to unify the common logic of all ScanNodes.
Different data sources only need to implement different Scanners for data access.
So that the future optimization for scan can be applied to the scan of all data sources,
while also reducing the code duplication.

This PR mainly adds 4 new class:

VScanner
All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods.

VScanNode
The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling.

ScannerContext
ScannerContext is responsible for recording the execution status
of a group of Scanners corresponding to a ScanNode.
Including how many scanners are being scheduled, and maintaining
a producer-consumer blocks queue between scanners and scan nodes.

ScannerContext is also the scheduling unit of ScannerScheduler.
ScannerScheduler schedules a ScannerContext at a time,
and submits the Scanners to the scanner thread pool for data scanning.

ScannerScheduler
Unified responsible for all Scanner scheduling tasks

Test:
This work is still in progress and default is disabled.
I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data.
The QPS can reach about 9000.
I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready.
Co-authored-by: morningman <morningman@apache.org>
2022-08-23 08:45:18 +08:00
c22d097b59 [improvement](compress) Support compress/decompress block with lz4 (#11955) 2022-08-22 17:35:43 +08:00
b1fd701493 [fix](memtracker) Improve memory tracking accuracy for exec nodes (#11947) 2022-08-22 08:56:05 +08:00
f66e42f848 [optimization](array-type) support the decimal/datetime as the nest type of array in print_value (#11784)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-19 17:59:09 +08:00
8eb9ac3b04 [impovement](sink) print load_id when sink fails (#11893) 2022-08-19 08:48:02 +08:00
b300b4faa0 [enhancement](memtracker) Optimize readability of mem exceed limit error message #11877 2022-08-18 14:39:41 +08:00
50ef6e35be [enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835) 2022-08-17 17:50:48 +08:00
c715209a7e [refactor](dpp) remove original dpp writer (#11838)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-17 10:42:29 +08:00
fadc78c6cf [fix](str_to_date) str_to_date support format without leading zero (#11817)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-16 18:23:16 +08:00
c124470408 [enhancement](memory) Fix too much cache leads to less memory available for queries (#11751)
Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache.

For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock.

Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.
2022-08-16 14:35:57 +08:00
7d836cf0c7 [fix](memtracker) Fix flush memtable to reduce load channel mem not executed (#11771)
The memory value automatically tracked by the tcmalloc hook in the DeltaWriter is smaller than the value recorded manually in the memtable, because the first 4096-byte Chunk requested by each MemPool when the memtable is initialized is not tracked to the DeltaWriter by the hook.

The values ​​of the two are not equal, causing the mem_consumption() == _mem_table->memory_usage branch judgment to fail.
2022-08-16 14:30:45 +08:00
2a1803c646 [enhancement](memtracker) Optimize query memory accuracy (#11740)
Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory.

At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.
2022-08-16 14:23:28 +08:00
d2bb3ad08e [fix](memtracker) Fix core in logout task mem tracker (#11797) 2022-08-16 11:28:06 +08:00
5104982614 [enhancement](tracing) append the profile counter to trace. (#11458)
1. append the profile counter and infos to span attributes.
2. output traceid to audit log.
2022-08-15 21:36:38 +08:00
805c13aaa1 [fix](backup) fix backup restore raise Storage backend not initialized. error (#11736)
fix backup restore raise Storage backend not initialized. error
2022-08-15 13:24:38 +08:00
ab9529f6b5 [enhancement](array-type) support export files in 'select into outfile' (#11703)
this pr is used to support export array type in 'select into outfile'.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-15 12:34:31 +08:00
ec5d4e3d17 print physical memory and virtual memory separately. (#11747) 2022-08-13 13:56:49 +08:00
4047c3577d [enhancement](Status) Optimize Status implementation 2022-08-12 11:39:35 +08:00
9b9ed1aef1 [data lake](arrow scanner)Fix file arrow scanner column index out of range core. (#11691) 2022-08-12 11:34:29 +08:00
2068bf2dea [Refactor](predicate) Use primitive type as template argument for predicate (#11647) 2022-08-11 12:06:44 +08:00
c8418d13b5 [improvement](config)Use session variable to replace configuration for 'enable_function_pushdown' (#11641) 2022-08-10 19:25:02 +08:00
Pxl
ec3c911f97 [Feature][Materialized-View] support materialized view on vectorized engine (#10792) 2022-08-04 14:07:48 +08:00
ecbf87d77b [bugfix](memtracker)fix exceed memory limit log (#11485) 2022-08-04 10:22:20 +08:00
1db8a2d136 [bugfix](runtimefilter)fix runtimefilter access violation when stub is nullptr (#11180) 2022-08-02 16:57:17 +08:00
f730a048b1 [feature-wip](load) Support single replica load (#10298)
During load process, the same operation are performed on all replicas such as sort and aggregation,
which are resource-intensive.
Concurrent data load would consume much CPU and memory resources.
It's better to perform write process (writing data into MemTable and then data flush) on single replica
and synchronize data files to other replicas before transaction finished.
2022-08-02 11:44:18 +08:00
abbf75d302 [doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307) 2022-08-02 11:34:06 +08:00
bd6e3cf132 [improvement]lock_times_limit (#11404)
Co-authored-by: songning03 <songning03@meituan.com>
2022-08-02 10:59:58 +08:00
4f5e1601df [bug](scanner) Improve limit query performance on olapScannode and avoid infinite loop (#11301)
1. Fix a bug that query large column table may cause infinite loop
2. Optimize the query logic with limit, for the case where the limit value is relatively small, reduce the parallelism of the scanner, reduce unnecessary resource consumption, and increase the number of similar queries that the system can carry at the same time, and increase the query speed by more than 60%
2022-08-01 13:50:12 +08:00
73d8f5901d fix mem tracker limiter (#11376) 2022-08-01 09:44:04 +08:00
18864ab7fe weak relationship between MemTracker and MemTrackerLimiter (#11347) 2022-07-30 18:33:54 +08:00
d6f937cb01 (performance)[scanner] Isolate local and remote queries using different scanner… (#11006) 2022-07-29 19:14:46 +08:00
19b34c09b1 [fix] (mem tracker) Fix runtime instance tracker null pointer (#11272) 2022-07-28 14:58:13 +08:00
72d2feae99 [feature-wip] Support all date functions for datev2/datetimev2 (#11265)
* [feature-wip] (datetimev2) support convert_tz function

* [feature-wip] Support all date functions for datev2/datetimev2
2022-07-28 08:18:59 +08:00
Pxl
4e6a59df4c [Improvement][chore] add const to all operator== (#11251) 2022-07-27 21:46:47 +08:00
b6bdb3bdbc [fix] (mem tracker) Fix MemTracker accuracy (#11190) 2022-07-27 18:59:24 +08:00
829d534e12 [Improvement] Replace switch with constexpr to boost date functions (#11134) 2022-07-23 22:58:59 +08:00
babab5d535 [feature-wip] support datetimev2 (#11085) 2022-07-23 16:07:59 +08:00
ad31b6c902 [bugfix and improvement]fix mem tracker for load and simplify some macros (#11125) 2022-07-22 21:59:36 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
d9b6e07e9d [Vectorized] Support ODBC sink for vec exec engine (#11045)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-07-20 19:09:41 +08:00
e5663f9872 [Bug](array-type) Fix the core dump caused by unaligned __int128 (#11020)
Fix the core dump caused by unaligned __int128 and change DEFAULT_ALIGNMENT
2022-07-20 16:37:27 +08:00
56e036e68b [feature-wip](multi-catalog) Support runtime filter for file scan node (#11000)
* [feature-wip](multi-catalog) Support runtime filter for file scan node

Co-authored-by: morningman <morningman@apache.org>
2022-07-20 12:36:57 +08:00
d5fa66d9a3 [Enhancement] [Memory] Limit memory usage use process actual physical memory (#10924) 2022-07-19 11:08:39 +08:00
842ff2b1e2 [refactor] Refactor time LUT (#10982) 2022-07-19 08:23:29 +08:00
ec5996f1f8 [improvement]do not acquire mutex in metric hook (#10941) 2022-07-18 08:52:24 +08:00
ad4751972c [feature-wip] Support in predicate for datev2 type (#10810) 2022-07-15 14:32:40 +08:00
3d52bff8d1 [improvement]output query_id when be core dumped. (#10822) 2022-07-14 10:55:28 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00