doris

Author	SHA1	Message	Date
yixiutt	60fddd56e7	[feature-wip](unique-key-merge-on-write) opt lock and only save valid delete_bitmap (#11953 ) 1. use rlock in most logic instead of wrlock 2. filter stale rowset's delete bitmap in save meta 3. add a delete_bitmap lock to handle compaction and publish_txn confict Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-23 14:43:40 +08:00
Jerry Hu	c22d097b59	[improvement](compress) Support compress/decompress block with lz4 (#11955 )	2022-08-22 17:35:43 +08:00
Ashin Gau	6d925054de	[feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845 ) 1. Spark can set the timestamp precision by the following configuration: spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision. 2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal	2022-08-22 10:15:35 +08:00
camby	83ea4ea984	[refractor](bitmap) bitmap serialize and deserialize refractor (#11921 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-22 08:52:20 +08:00
slothever	124b4f7694	[feature-wip](parquet-reader) row group reader ut finish (#11887 ) Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-08-18 17:18:14 +08:00
HappenLee	d505d1a5ae	[Vectorized](compaction) filter delete data in base compaction (#11721 ) * [Vectorized](compaction) filter delete data in base compaction Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-18 14:22:59 +08:00
yixiutt	11dc5cad83	[feature-wip](unique-key-merge-on-write) add min/max key in segment (#11830 ) some feature: 1. add min max key in segment footer to speed up get_row_ranges_by_keys 2. do not load pk bloom filter in query Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-17 18:11:39 +08:00
Gabriel	ba3e0b3f96	[feature](compaction) allow to set disable_auto_compaction for tables (#11743 )	2022-08-17 11:05:47 +08:00
Xin Liao	12c4d1f4dd	[feature-wip](unique-key-merge-on-write) unique key table with MOW supports sequence column (#11808 )	2022-08-17 10:56:14 +08:00
slothever	f39f57636b	[feature-wip](parquet-reader) update column read model and add page index (#11601 )	2022-08-16 15:04:07 +08:00
lihangyu	01383c3217	[Enhancement](stream-load-json) using simdjson to parse json (#11665 ) Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.	2022-08-16 14:49:50 +08:00
ZenoYang	288b440b14	[improvement](vectorized) Improve count distinct performance by using fastunion (#11516 ) Improve count distinct performance by using fastunion. Testing our user real data has a 10-40% performance improvement.	2022-08-16 12:18:46 +08:00
Ashin Gau	0b9bfd15b7	[feature-wip](parquet-reader) parquet physical type to doris logical type (#11769 ) Two improvements have been added: 1. Translate parquet physical type into doris logical type. 2. Decode parquet column chunk into doris ColumnPtr, and add unit tests to show how to use related API.	2022-08-15 16:08:11 +08:00
Gabriel	77e241cbb0	[refactor](date) Use uint32 as predicate type for date type (#11708 ) Use uint32 as predicate type for date type	2022-08-15 11:12:33 +08:00
pengxiangyu	e5c2bb9699	[fix](remote)Fix bug for Cache Reader (#11629 )	2022-08-12 13:40:32 +08:00
plat1ko	4047c3577d	[enhancement](Status) Optimize Status implementation	2022-08-12 11:39:35 +08:00
yiguolei	ea57bf6370	[refactor](delete predicate) Unify delete to segmentiterator (#11650 ) * remove seek columns and unify delete columns in rowset reader Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-11 15:12:43 +08:00
Gabriel	2068bf2dea	[Refactor](predicate) Use primitive type as template argument for predicate (#11647 )	2022-08-11 12:06:44 +08:00
Ashin Gau	8f5aed27ec	[feature-wip](parquet-reader)read and decode parquet physical type (#11637 ) # Proposed changes Read and decode parquet physical type. 1. The encoding type of boolean is bit-packing, this PR introduces the implementation of bit-packing from Impala 2. Create a parquet including all the primitive types supported by hive ## Remaining Problems 1. At present, only physical types are decoded, and there is no corresponding and conversion methods with doris logical. 2. No parsing and processing Decimal type / Timestamp / Date. 3. Int_8 / Int_16 is stored as Int_32. How to resolve these types.	2022-08-11 10:17:32 +08:00
Xin Liao	aaaf6915e4	[feature-wip](unique-key-merge-on-write) fix rowid conversion ut that may create a directory under an incorrect path (#11628 )	2022-08-10 08:17:47 +08:00
Kang	f9b151744d	optimize topn query if order by columns is prefix of sort keys of table (#10694 ) * [feature](planner): push limit to olapscan when meet sort. * if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key and read_orderby_key_reverse for olap scanner * There is a common query pattern to find latest time serials data. eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100 If the ORDER BY columns is the prefix of the sort key of table, it can be greatly optimized to read much fewer data instead of read all data between t1 and t2. By leveraging the same order of ORDER BY columns and sort key of table, just read the LIMIT N rows for each related segment and merge N rows. 1. set read_orderby_key to true for read_params and _reader_context if olap_scan_node's sort info is set. 2. set read_orderby_key_reverse to true for read_params and _reader_context if is_asc_order is false. 3. rowset reader force merge read segments if read_orderby_key is true. 4. block reader and tablet reader force merge read rowsets if read_orderby_key is true. 5. for ORDER BY DESC, read and compare in reverse order 5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and reverse the result block before return to caller. 5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return opposite result for _is_reverse order in its compare function. Co-authored-by: jackwener <jakevingoo@gmail.com>	2022-08-09 09:08:44 +08:00
yixiutt	0a5fd99d02	[feature-wip](unique-key-merge-on-write) speed up publish_txn (#11557 ) In our origin design, we calc delete bitmap in publish txn, and this operation will cost too much time as it will load segment data and lookup row key in pre rowset and segments.And publish version task should run in order, so it'll lead to timeout in publish_txn. In this pr, we seperate delete_bitmap calculation to tow part, one of it will be done in flush mem table, so this work can run parallel. And we calc final delete_bitmap in publish_txn, get a rowset_id set that should be included and remove rowsets that has been compacted, the rowset difference between memtable_flush and publish_txn is really small so publish_txn become very fast.In our test, publish_txn cost about 10ms. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-08 18:57:55 +08:00
Ashin Gau	37d1180cca	[feature-wip](parquet-reader)decode parquet data (#11536 )	2022-08-08 12:44:06 +08:00
Xin Liao	1e6a3610a7	[feature-wip](unique-key-merge-on-write) optimize rowid conversion and add ut (#11541 )	2022-08-08 10:41:44 +08:00
slothever	e8a344b683	[feature-wip](parquet-reader) add predicate filter and column reader (#11488 )	2022-08-08 10:21:24 +08:00
yiguolei	321107cb40	[refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475 ) * Using tabletschema shared ptr instead of raw ptrs Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-05 11:04:38 +08:00
yiguolei	de4466624d	[refactor](schema change)Remove delete from sc (#11441 ) * not need call delete handler to filter rows since they are filtered in rowset reader * need not call delete eval in schema change and remove related code Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-03 03:29:41 +08:00
weizuo93	f730a048b1	[feature-wip](load) Support single replica load (#10298 ) During load process, the same operation are performed on all replicas such as sort and aggregation, which are resource-intensive. Concurrent data load would consume much CPU and memory resources. It's better to perform write process (writing data into MemTable and then data flush) on single replica and synchronize data files to other replicas before transaction finished.	2022-08-02 11:44:18 +08:00
Mingyu Chen	abbf75d302	[doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307 )	2022-08-02 11:34:06 +08:00
Ashin Gau	44a1a20e65	[feature-wip](parquet-reader)parse parquet schema (#11381 ) Analyze schema elements in parquet FileMetaData, and generate the hierarchy of nested fields. For exmpale: 1. primitive type ``` // thrift: optional int32 <column-name>; // sql definition: <column-name> int32; ``` 2. nested type ``` // thrift: optional group <column-name> (LIST) { repeated group bag { optional group array_element (LIST) { repeated group bag { optional int32 array_element } } } } // sql definition: <column-name> array<array<int32>> ```	2022-08-02 10:56:13 +08:00
weizuo93	5c1cd058f2	[Feature] Add interface to check tablet segment lost (#10711 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2022-08-02 09:40:04 +08:00
Lightman	b35daf0a04	[improvement](light-schema-change) Support tablet schema cache (#11131 )	2022-08-01 12:18:00 +08:00
Xinyi Zou	73d8f5901d	fix mem tracker limiter (#11376 )	2022-08-01 09:44:04 +08:00
zhannngchen	9333e79ae0	[feature-wip](unique-key-merge-on-write) Add support for tablet migration, DSIP-018[5/3] (#11283 )	2022-07-30 19:50:11 +08:00
plat1ko	a6537a90cd	[Enhancement] Garbage collection of unused data on remote storage backend (#10731 ) * [Feature](cold_on_s3) support unused remote rowset gc * return aborted when skip drop tablet * perform unused remote rowset gc	2022-07-29 14:38:39 +08:00
slothever	e4bc3f6b6f	[feature-wip] (parquet-reader) add parquet reader impl template (#11285 )	2022-07-29 14:30:31 +08:00
spaces-x	b260a02215	[fix](be): fix stack overflow in unhex function (#11204 ) * [fix](be): fix stack overflow in unhex function	2022-07-28 14:59:54 +08:00
Gabriel	328a225050	[feature-wip] (datetimev2) support window funnel and modify valid dat… (#11277 ) * [feature-wip] (datetimev2) support window funnel and modify valid date range	2022-07-28 14:06:26 +08:00
Pxl	1b4a2c287e	[Improvement][chore] replace from_decv2_to_packed128 to decv2.value (#11261 )	2022-07-28 10:41:27 +08:00
Gabriel	72d2feae99	[feature-wip] Support all date functions for datev2/datetimev2 (#11265 ) * [feature-wip] (datetimev2) support convert_tz function * [feature-wip] Support all date functions for datev2/datetimev2	2022-07-28 08:18:59 +08:00
Pxl	4e6a59df4c	[Improvement][chore] add const to all operator== (#11251 )	2022-07-27 21:46:47 +08:00
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
Xin Liao	d4fb27125a	[feature-wip](unique-key-merge-on-write) row id conversion for compaction (#11149 )	2022-07-27 16:32:13 +08:00
yixiutt	01e108cb7b	[feature-wip](unique-key-merge-on-write) update delete bitmap while publish version (#11195 ) 1.make version publish work in version order 2.update delete bitmap while publish version, load current version rowset primary key and search in pre rowsets 3.speed up publish version task by parallel tablet publish task Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-07-27 16:26:42 +08:00
Xin Liao	eab8382b4a	[feature-wip](unique-key-merge-on-write) add the implementation of primary key index update, DSIP-018 (#11057 )	2022-07-27 14:17:56 +08:00
Gabriel	d67029c830	[feature-wip] (datetimev2) support `cast` between datetimev2 with different scales (#11198 ) * [feature-wip] (datetimev2) support `cast` between datetimev2 with different scale	2022-07-26 22:36:13 +08:00
Gabriel	823088a9eb	[FOLLOW-UP] (datetimev2) complete date function ut and built-in function declaration (#11154 )	2022-07-26 17:48:57 +08:00
Adonis Ling	bbe08b34ba	[Bug](be-ut) Fix the timezone dependency in UT (#11148 )	2022-07-25 18:15:05 +08:00
Gabriel	829d534e12	[Improvement] Replace `switch` with `constexpr` to boost date functions (#11134 )	2022-07-23 22:58:59 +08:00
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00

1 2 3 4 5 ...

773 Commits