doris

Author	SHA1	Message	Date
yixiutt	60fddd56e7	[feature-wip](unique-key-merge-on-write) opt lock and only save valid delete_bitmap (#11953 ) 1. use rlock in most logic instead of wrlock 2. filter stale rowset's delete bitmap in save meta 3. add a delete_bitmap lock to handle compaction and publish_txn confict Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-23 14:43:40 +08:00
Mingyu Chen	05da3d947f	[feature-wip](new-scan) add scanner scheduling framework (#11582 ) There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including: Runtime filter Predicate pushdown Scanner generation and scheduling So I intend to unify the common logic of all ScanNodes. Different data sources only need to implement different Scanners for data access. So that the future optimization for scan can be applied to the scan of all data sources, while also reducing the code duplication. This PR mainly adds 4 new class: VScanner All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods. VScanNode The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling. ScannerContext ScannerContext is responsible for recording the execution status of a group of Scanners corresponding to a ScanNode. Including how many scanners are being scheduled, and maintaining a producer-consumer blocks queue between scanners and scan nodes. ScannerContext is also the scheduling unit of ScannerScheduler. ScannerScheduler schedules a ScannerContext at a time, and submits the Scanners to the scanner thread pool for data scanning. ScannerScheduler Unified responsible for all Scanner scheduling tasks Test: This work is still in progress and default is disabled. I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data. The QPS can reach about 9000. I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready. Co-authored-by: morningman <morningman@apache.org>	2022-08-23 08:45:18 +08:00
Yongqiang YANG	b55195bd80	[FixAssist](compaction) add DCHECK in BlockReader::_unique_key_next_block to reason problem (#11951 )	2022-08-22 22:33:31 +08:00
Jerry Hu	c22d097b59	[improvement](compress) Support compress/decompress block with lz4 (#11955 )	2022-08-22 17:35:43 +08:00
Amos Bird	0b33824eef	[fix][Vectorized] Fix nullptr deref in data sink (#11473 ) brpc cache may return nullptr.	2022-08-22 11:44:55 +08:00
Xinyi Zou	92cef580f3	[enhancement](memory) Reduce virtual memory used by PaddedPODArray (#11816 )	2022-08-22 11:33:07 +08:00
Ashin Gau	6d925054de	[feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845 ) 1. Spark can set the timestamp precision by the following configuration: spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision. 2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal	2022-08-22 10:15:35 +08:00
Jerry Hu	dc8f64b3e3	[improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801 )	2022-08-22 10:12:06 +08:00
jiafeng.zhang	915d8989c5	[feature](spark-load)Spark load supports string type data import (#11927 )	2022-08-22 08:56:59 +08:00
Xinyi Zou	b1fd701493	[fix](memtracker) Improve memory tracking accuracy for exec nodes (#11947 )	2022-08-22 08:56:05 +08:00
camby	83ea4ea984	[refractor](bitmap) bitmap serialize and deserialize refractor (#11921 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-22 08:52:20 +08:00
Xinyi Zou	5eb5444476	[fix](memtracker) Remove useless memory exceed check #11939	2022-08-22 08:40:19 +08:00
Adonis Ling	982c5f06b5	[fix](build) Resolve the conflicts when building be with java-udf (#11938 )	2022-08-20 18:24:32 +08:00
Pxl	64dc3b360f	[Bug](function) fix dcheck fail on close vexpr ctx (#11908 )	2022-08-19 19:11:10 +08:00
carlvinhust2012	f66e42f848	[optimization](array-type) support the decimal/datetime as the nest type of array in print_value (#11784 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-19 17:59:09 +08:00
yixiutt	1b0b5b5f09	[Enhancement](load) add hidden_columns in stream load param (#11625 ) Stream load will ignore invisible columns if no http header columns specified, but in some case user cannot get all columns if columns changed frequently。 Add a hidden_columns header to support hidden columns import。User can set hidden_columns such as __DORIS_DELETE_SIGN__ and add this column in stream load data so we can delete this line. For example: curl -u root -v --location-trusted -H "hidden_columns: __DORIS_DELETE_SIGN__" -H "format: json" -H "strip_outer_array: true" -H "jsonpaths: [\"$.id\", \"$.name\",\"$.__DORIS_DELETE_SIGN__\"]" -T 1.json http://{beip}:{be_port}/api/test/test1/_stream_load Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-19 14:57:11 +08:00
yixiutt	01bd7f224b	[bugifx](compaction) fix filter_delete if schema has sequence column (#11909 ) introduced in #11721. Use last column as delete sign, but if sequence column exist, it's wrong. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-19 14:56:06 +08:00
Gabriel	1f9eec5462	[Regression](datev2) Add test cases for datev2/datetimev2 (#11831 )	2022-08-19 10:57:55 +08:00
Pxl	089fe01aea	[Feature](vectorized alter table) set vectorized alter table to default open (#11897 )	2022-08-19 10:57:00 +08:00
chenlinzhong	7a505cf040	[remote-udaf](optimize) Optimize RPC exception handling logic (#11680 )	2022-08-19 10:25:01 +08:00
Xinyi Zou	fcae979798	[fix](memtracker) Fix PartitionedAggregationNode DCHECK when mem exceed limit (#11902 )	2022-08-19 09:56:49 +08:00
Yongqiang YANG	8eb9ac3b04	[impovement](sink) print load_id when sink fails (#11893 )	2022-08-19 08:48:02 +08:00
slothever	124b4f7694	[feature-wip](parquet-reader) row group reader ut finish (#11887 ) Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-08-18 17:18:14 +08:00
Pxl	c0dc51b453	[Bug](Vectorzed alter table)modify schema change cast validate (#11864 )	2022-08-18 16:05:48 +08:00
Gabriel	1da39771e3	[Bug](runtime filter) Fix bug for runtime filter in concurrent scanners (#11848 )	2022-08-18 14:47:08 +08:00
Gabriel	b8a33d2629	[Improvement](load) turn `enable_vectorized_load` on by default (#11833 )	2022-08-18 14:43:09 +08:00
Pxl	cac317430f	[Bug](aggregation) fix core dump on 2nd phase aggregate (#11843 )	2022-08-18 14:42:34 +08:00
Xinyi Zou	b300b4faa0	[enhancement](memtracker) Optimize readability of mem exceed limit error message #11877	2022-08-18 14:39:41 +08:00
HappenLee	d505d1a5ae	[Vectorized](compaction) filter delete data in base compaction (#11721 ) * [Vectorized](compaction) filter delete data in base compaction Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-18 14:22:59 +08:00
Stalary	e1a1a04c2f	[Enhancement](Doe) Be query es use fe generate dsl. (#11840 )	2022-08-18 10:31:17 +08:00
lihangyu	cfb90b39c7	(vec-stream-load-json) simdjson throw execption lead to core dump (#11880 ) when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases Signed-off-by: eldenmoon <15605149486@163.com>	2022-08-18 10:27:34 +08:00
zxealous	881670566c	[fix]Fix the coredump when an IOError occurs in be (#11857 )	2022-08-18 09:13:41 +08:00
AlexYue	8b10a1a3f7	[enhancement](VSlotRef) enhance column_id check in execute function during runtime (#11862 ) The column id check in VSlotRef::execute function before is too strict for fuzzy test to continuously produce random query. Temporarily loosen the check logic. Moreover, there exists some careless call to VExpr::get_const_col, it might return a nullptr but not every function call checks if it's valid. It's an underlying problem.	2022-08-18 09:12:26 +08:00
HappenLee	582be130dd	[Feature] (ODBC) support read/write emoji of utf16 via odbc table (#11863 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-18 09:09:02 +08:00
yixiutt	11dc5cad83	[feature-wip](unique-key-merge-on-write) add min/max key in segment (#11830 ) some feature: 1. add min max key in segment footer to speed up get_row_ranges_by_keys 2. do not load pk bloom filter in query Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-17 18:11:39 +08:00
AlexYue	50ef6e35be	[enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835 )	2022-08-17 17:50:48 +08:00
zhangstar333	7df8c6f493	[vectorized](improvement) improve agg function of bitmap_union with f… (#11822 ) * [vectorized](improvement) improve agg function of bitmap_union with fastuinon	2022-08-17 14:13:01 +08:00
Gabriel	18b84b2dfe	[Bug](compile) fix compiling problem (#11851 ) fix compiling problem	2022-08-17 13:44:57 +08:00
Gabriel	ba3e0b3f96	[feature](compaction) allow to set disable_auto_compaction for tables (#11743 )	2022-08-17 11:05:47 +08:00
Xin Liao	12c4d1f4dd	[feature-wip](unique-key-merge-on-write) unique key table with MOW supports sequence column (#11808 )	2022-08-17 10:56:14 +08:00
Xin Liao	c3e6a841c1	[feature-wip](unique-key-merge-on-write) fix that sort segments by segment id in descending order (#11811 )	2022-08-17 10:54:30 +08:00
wangbo	3a49156e30	[performance] (vectorization)optimize In Expr (#11826 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-17 10:46:37 +08:00
yiguolei	c715209a7e	[refactor](dpp) remove original dpp writer (#11838 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-17 10:42:29 +08:00
Lightman	3e13b7d2c2	[Bugfix](light-shema-change) fix _finish_clone dead lock (#11823 ) In engine_clone_task.cpp, it use tablet->tablet_schema() to create rowset, but in the method, it need a lock that already locked in engine_clone_task.cpp:514. It use cloned_tablet_meta->tablet_schema() originally, but modified in #11131. It need to revert to use cloned_tablet_meta->tablet_schema().	2022-08-17 09:10:08 +08:00
camby	fadc78c6cf	[fix](str_to_date) str_to_date support format without leading zero (#11817 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-16 18:23:16 +08:00
plat1ko	30175010c7	Fix nullptr in perform_remote_tablet_gc (#11820 )	2022-08-16 16:50:21 +08:00
slothever	f39f57636b	[feature-wip](parquet-reader) update column read model and add page index (#11601 )	2022-08-16 15:04:07 +08:00
lihangyu	01383c3217	[Enhancement](stream-load-json) using simdjson to parse json (#11665 ) Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.	2022-08-16 14:49:50 +08:00
plat1ko	fecfdd78bf	[enhancement](status) Fix Status related macros to enable RVO or move ctor (#11753 )	2022-08-16 14:40:35 +08:00
Xinyi Zou	c124470408	[enhancement](memory) Fix too much cache leads to less memory available for queries (#11751 ) Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache. For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock. Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.	2022-08-16 14:35:57 +08:00

1 2 3 4 5 ...

2636 Commits