Commit Graph

2541 Commits

Author SHA1 Message Date
73a3471fbd [minor](conjuncts) remove row-based conjuncts from vectorized engine (#12053) 2022-08-25 10:13:20 +08:00
54fc038dc5 [Fix](remote) Fix thread safety issue in cache (#11984) 2022-08-24 18:14:14 +08:00
f875684345 [fix](agg) Crashing caused by serialization in streaming aggregation (#12027) 2022-08-24 14:38:25 +08:00
1304a17600 [fix](memtracker) Improve performance of tracking real physical memory of PodArray #12021 2022-08-24 14:24:14 +08:00
fb3c00c943 [Improvement](storage) reuse schema and rowblockv2 on single scanner_thread (#11392)
* support reuse rowblockv2 on single thread
2022-08-24 13:42:10 +08:00
ba85c06a68 [feature-wip](unique-key-merge-on-write) fix that IndexedColumnIterator next batch may return empty result (#11928) 2022-08-24 08:53:44 +08:00
3abc4f357f [Bug](bitmap) intersect_count function use in string cause ASAN error (#11936) 2022-08-24 08:51:53 +08:00
5d627e41a4 [fix](array-type) fix the be core dump when import number larger than uint64 (#11853)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-24 08:51:12 +08:00
1fc5515a78 [enhancement](memory) Remove unused reservation tracker (#11969) 2022-08-24 08:49:34 +08:00
d06edd4b8b [minor](runtime-filter) add DCHECK for runtimefilter bug (#11996)
Not a fix, just add debug info to try find root cause of #11995
2022-08-24 07:53:30 +08:00
cbbf4e10ff [fix](array-type) fix be occasional coredump when use stream load (#11997)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-23 21:54:00 +08:00
1056a6d8c7 [bug](compaction) fix bug of coredump of filter delete chose wrong filter column (#12002)
* [bug](compaction) fix bug of coredump of filter delete chose wrong filter column

* clang format
2022-08-23 21:52:11 +08:00
55fdb555be [bugfix](dict) fix coredump of dict colum range predicate when there is null value (#11967) 2022-08-23 16:07:48 +08:00
60fddd56e7 [feature-wip](unique-key-merge-on-write) opt lock and only save valid delete_bitmap (#11953)
1. use rlock in most logic instead of wrlock
2. filter stale rowset's delete bitmap in save meta
3. add a delete_bitmap lock to handle compaction and publish_txn confict

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-23 14:43:40 +08:00
05da3d947f [feature-wip](new-scan) add scanner scheduling framework (#11582)
There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including:

Runtime filter
Predicate pushdown
Scanner generation and scheduling
So I intend to unify the common logic of all ScanNodes.
Different data sources only need to implement different Scanners for data access.
So that the future optimization for scan can be applied to the scan of all data sources,
while also reducing the code duplication.

This PR mainly adds 4 new class:

VScanner
All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods.

VScanNode
The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling.

ScannerContext
ScannerContext is responsible for recording the execution status
of a group of Scanners corresponding to a ScanNode.
Including how many scanners are being scheduled, and maintaining
a producer-consumer blocks queue between scanners and scan nodes.

ScannerContext is also the scheduling unit of ScannerScheduler.
ScannerScheduler schedules a ScannerContext at a time,
and submits the Scanners to the scanner thread pool for data scanning.

ScannerScheduler
Unified responsible for all Scanner scheduling tasks

Test:
This work is still in progress and default is disabled.
I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data.
The QPS can reach about 9000.
I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready.
Co-authored-by: morningman <morningman@apache.org>
2022-08-23 08:45:18 +08:00
b55195bd80 [FixAssist](compaction) add DCHECK in BlockReader::_unique_key_next_block to reason problem (#11951) 2022-08-22 22:33:31 +08:00
c22d097b59 [improvement](compress) Support compress/decompress block with lz4 (#11955) 2022-08-22 17:35:43 +08:00
0b33824eef [fix][Vectorized] Fix nullptr deref in data sink (#11473)
brpc cache may return nullptr.
2022-08-22 11:44:55 +08:00
92cef580f3 [enhancement](memory) Reduce virtual memory used by PaddedPODArray (#11816) 2022-08-22 11:33:07 +08:00
6d925054de [feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845)
1. Spark can set the timestamp precision by the following configuration:
spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS
DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision.
2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal
2022-08-22 10:15:35 +08:00
dc8f64b3e3 [improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801) 2022-08-22 10:12:06 +08:00
915d8989c5 [feature](spark-load)Spark load supports string type data import (#11927) 2022-08-22 08:56:59 +08:00
b1fd701493 [fix](memtracker) Improve memory tracking accuracy for exec nodes (#11947) 2022-08-22 08:56:05 +08:00
83ea4ea984 [refractor](bitmap) bitmap serialize and deserialize refractor (#11921)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-22 08:52:20 +08:00
5eb5444476 [fix](memtracker) Remove useless memory exceed check #11939 2022-08-22 08:40:19 +08:00
982c5f06b5 [fix](build) Resolve the conflicts when building be with java-udf (#11938) 2022-08-20 18:24:32 +08:00
Pxl
64dc3b360f [Bug](function) fix dcheck fail on close vexpr ctx (#11908) 2022-08-19 19:11:10 +08:00
f66e42f848 [optimization](array-type) support the decimal/datetime as the nest type of array in print_value (#11784)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-19 17:59:09 +08:00
1b0b5b5f09 [Enhancement](load) add hidden_columns in stream load param (#11625)
Stream load will ignore invisible columns if no http header columns
specified, but in some case user cannot get all columns if columns
changed frequently。
Add a hidden_columns header to support hidden columns import。User can
set hidden_columns such as __DORIS_DELETE_SIGN__ and add this column
in stream load data so we can delete this line.
For example:
curl -u root -v --location-trusted -H "hidden_columns: __DORIS_DELETE_SIGN__" -H
"format: json" -H "strip_outer_array: true" -H "jsonpaths: [\"$.id\",
\"$.name\",\"$.__DORIS_DELETE_SIGN__\"]" -T 1.json
http://{beip}:{be_port}/api/test/test1/_stream_load

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-19 14:57:11 +08:00
01bd7f224b [bugifx](compaction) fix filter_delete if schema has sequence column (#11909)
introduced in #11721. Use last column as delete sign, but if sequence column
exist, it's wrong.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-19 14:56:06 +08:00
1f9eec5462 [Regression](datev2) Add test cases for datev2/datetimev2 (#11831) 2022-08-19 10:57:55 +08:00
Pxl
089fe01aea [Feature](vectorized alter table) set vectorized alter table to default open (#11897) 2022-08-19 10:57:00 +08:00
7a505cf040 [remote-udaf](optimize) Optimize RPC exception handling logic (#11680) 2022-08-19 10:25:01 +08:00
fcae979798 [fix](memtracker) Fix PartitionedAggregationNode DCHECK when mem exceed limit (#11902) 2022-08-19 09:56:49 +08:00
8eb9ac3b04 [impovement](sink) print load_id when sink fails (#11893) 2022-08-19 08:48:02 +08:00
124b4f7694 [feature-wip](parquet-reader) row group reader ut finish (#11887)
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-08-18 17:18:14 +08:00
Pxl
c0dc51b453 [Bug](Vectorzed alter table)modify schema change cast validate (#11864) 2022-08-18 16:05:48 +08:00
1da39771e3 [Bug](runtime filter) Fix bug for runtime filter in concurrent scanners (#11848) 2022-08-18 14:47:08 +08:00
b8a33d2629 [Improvement](load) turn enable_vectorized_load on by default (#11833) 2022-08-18 14:43:09 +08:00
Pxl
cac317430f [Bug](aggregation) fix core dump on 2nd phase aggregate (#11843) 2022-08-18 14:42:34 +08:00
b300b4faa0 [enhancement](memtracker) Optimize readability of mem exceed limit error message #11877 2022-08-18 14:39:41 +08:00
d505d1a5ae [Vectorized](compaction) filter delete data in base compaction (#11721)
* [Vectorized](compaction) filter delete data in base compaction


Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-08-18 14:22:59 +08:00
e1a1a04c2f [Enhancement](Doe) Be query es use fe generate dsl. (#11840) 2022-08-18 10:31:17 +08:00
cfb90b39c7 (vec-stream-load-json) simdjson throw execption lead to core dump (#11880)
when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases

Signed-off-by: eldenmoon <15605149486@163.com>
2022-08-18 10:27:34 +08:00
881670566c [fix]Fix the coredump when an IOError occurs in be (#11857) 2022-08-18 09:13:41 +08:00
8b10a1a3f7 [enhancement](VSlotRef) enhance column_id check in execute function during runtime (#11862)
The column id check in VSlotRef::execute function before is too strict for fuzzy test to continuously produce random query. Temporarily loosen the check logic.
Moreover, there exists some careless call to VExpr::get_const_col, it might return a nullptr but not every function call checks if it's valid. It's an underlying problem.
2022-08-18 09:12:26 +08:00
582be130dd [Feature] (ODBC) support read/write emoji of utf16 via odbc table (#11863)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-08-18 09:09:02 +08:00
11dc5cad83 [feature-wip](unique-key-merge-on-write) add min/max key in segment (#11830)
some feature:
1. add min max key in segment footer to speed up get_row_ranges_by_keys
2. do not load pk bloom filter in query

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-17 18:11:39 +08:00
50ef6e35be [enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835) 2022-08-17 17:50:48 +08:00
7df8c6f493 [vectorized](improvement) improve agg function of bitmap_union with f… (#11822)
* [vectorized](improvement) improve agg function of bitmap_union with fastuinon
2022-08-17 14:13:01 +08:00