doris

Author	SHA1	Message	Date
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
lihangyu	fcd25b53bf	[Optimize](Random distribution) Improve the performance of tablet sin… (#17389 ) The current distribution model for Doris is as follows: OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block. This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable. This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet	2023-03-10 10:52:40 +08:00
Xin Liao	849b5b7b8f	[fix](sequence) fix that the result is wrong when load multiple duplicate keys (#17575 )	2023-03-09 20:59:23 +08:00
Pxl	2bc014d83a	[Enchancement](function) remove unused params on aggregate function (#16886 ) remove unused params on aggregate function	2023-02-20 11:08:45 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
lihangyu	1d8265c5a3	[refactor](row-store) make row store column a hidden column in meta (#16251 ) This could simplfy storage engine logic and make code more readable, and we could analyze the hidden `__DORIS_ROW_STORE_COL__` length etc..	2023-02-02 20:56:13 +08:00
zhannngchen	69f34cd1c3	[fix](load) sequence column do not compare correctly in memtable (#16211 )	2023-02-02 11:00:23 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
yiguolei	4b6a4b3cf7	[refactor](remove unused code) Remove unused mempool declare or function params (#16222 ) * Remove unused mempool declare or function params --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-30 13:03:18 +08:00
yiguolei	0b5e71d3b4	[refactor](refactor field) remove unused method (#16068 )	2023-01-19 10:16:09 +08:00
Xinyi Zou	97fcad76f8	[enhancement](memtracker) Improve readability (#15716 )	2023-01-16 16:30:35 +08:00
zbtzbtzbt	fe5e5d2bf4	[refactor] separate agg and flush in memtable (#15713 )	2023-01-11 10:07:34 +08:00
Mingyu Chen	3fec5ff0f5	[refactor](scan-pool) move scan pool from env to scanner scheduler (#15604 ) The origin scan pools are in exec_env. But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used. All scan task will be submitted to the scan pool in scanner_scheduler. BTW, reorganize the scan pool into 3 kinds: local scan pool For olap scan node remote scan pool For file scan node limited scan pool For query which set cpu resource limit or with small limit clause TODO: Use bthread to unify all IO task. Some trivial issues: fix bug that the memtable flush size printed in log is not right Add RuntimeProfile param in VScanner	2023-01-11 09:38:42 +08:00
zbtzbtzbt	ba54634d55	[refactor] delete non vec load from memtable (#15667 ) * [refactor] delete non vec load from memtable delete non vec load from memtable totally. remove function keys_type() in memtable. Co-authored-by: zhoubintao <1229701101@qq.com>	2023-01-09 08:41:58 +08:00
yiguolei	b23d068281	[refactor](remove-non-vec) Remove non vec load from memtable and delta writer (#15517 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-30 21:22:58 +08:00
yiguolei	06d0035c02	[refactor](non-vec)remove schema change related non-vec code (#15313 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-23 18:33:04 +08:00
Xin Liao	efdc73777a	[enhancement](load) verify the number of rows between different replicas when load data to avoid data inconsistency (#15101 ) It is very difficult to investigate the data inconsistency of multiple replicas. When loading data, the number of rows between replicas is checked to avoid some data inconsistency problems.	2022-12-21 09:50:13 +08:00
xueweizhang	c4de619110	[fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable (#15018 ) when some case(need modify be.conf), a memtable may flush many segments and then calc delete bitmap with new data. but now, it just only load one segment with max sgement id and this bug will not cala delte bitmap with all data of all segment of one memtable, and will get many rows with same key from merge-on-write table.	2022-12-15 20:44:49 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Xinyi Zou	e1f0fa069c	[enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580 )	2022-11-29 10:15:25 +08:00
Xinyi Zou	a73f4dfdc1	[fix](memtracker) Fix scanner thread ending after fragment thread causing mem tracker null pointer #14143	2022-11-10 15:42:53 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
xy720	f329d33666	[chore](fix) Fix some spell errors in be's comments. #13452	2022-10-20 08:56:01 +08:00
Xin Liao	9e42804298	[feature-wip](unique-key-merge-on-write) unique key with merge on write table support schema change (#12886 )	2022-10-09 11:31:53 +08:00
Xinyi Zou	c55d08fa2f	[fix](memtracker) Refactor load channel mem tracker to improve accuracy (#12791 ) The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence. In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0. In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.	2022-09-21 20:16:19 +08:00
Xin Liao	bac58a4774	[feature-wip](unique-key-merge-on-write) fix calculate delete bitmap when flush memtable (#12668 )	2022-09-17 17:04:03 +08:00
Xin Liao	554ba40b13	[feature-wip](unique-key-merge-on-write) update delete bitmap when increamental clone (#12364 )	2022-09-09 17:03:27 +08:00
yixiutt	60fddd56e7	[feature-wip](unique-key-merge-on-write) opt lock and only save valid delete_bitmap (#11953 ) 1. use rlock in most logic instead of wrlock 2. filter stale rowset's delete bitmap in save meta 3. add a delete_bitmap lock to handle compaction and publish_txn confict Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-23 14:43:40 +08:00
yixiutt	0a5fd99d02	[feature-wip](unique-key-merge-on-write) speed up publish_txn (#11557 ) In our origin design, we calc delete bitmap in publish txn, and this operation will cost too much time as it will load segment data and lookup row key in pre rowset and segments.And publish version task should run in order, so it'll lead to timeout in publish_txn. In this pr, we seperate delete_bitmap calculation to tow part, one of it will be done in flush mem table, so this work can run parallel. And we calc final delete_bitmap in publish_txn, get a rowset_id set that should be included and remove rowsets that has been compacted, the rowset difference between memtable_flush and publish_txn is really small so publish_txn become very fast.In our test, publish_txn cost about 10ms. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-08 18:57:55 +08:00
Xinyi Zou	18864ab7fe	weak relationship between MemTracker and MemTrackerLimiter (#11347 )	2022-07-30 18:33:54 +08:00
zhannngchen	70c7e3d7aa	[feature-wip](unique-key-merge-on-write) remove AggType on unique table with MoW, enable preAggreation, DSIP-018[5/2] (#11205 ) remove AggType on unique table with MoW, enable preAggreation	2022-07-28 17:03:05 +08:00
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
HappenLee	8551ceaa1b	[Bug][Vectorized] Fix use-after-free bug of memtable shrink (#11197 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-26 16:10:44 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Yongqiang YANG	f78db1d773	release memory allocated in agg function in vec stream load (#10739 ) release memory allocated in agg function in vec stream load When a load is cancelled, memory allocated by agg functions should be freeed.	2022-07-16 15:32:53 +08:00
Pxl	4190f7354c	[Bug][Memtable] fix core dump on int128 because not aligned by 16 byte (#10775 ) * fix core dump on int128 because not aligned by 16 byte * update	2022-07-13 08:30:58 +08:00
HappenLee	502ac4e76b	[Load][Vectorized] opt the mem use of aggregate function in load to speed up (#10448 ) opt the mem use of aggregate function in load to speed up	2022-07-10 13:34:25 +08:00
yiguolei	89e56ea67f	[refactor] remove alpha rowset related code and vectorized row batch related code (#10584 )	2022-07-05 20:33:34 +08:00
Xinyi Zou	6ad024a2bf	[fix] (mem tracker) Refactor memtable mem tracker, fix flush memtable DCHECK failed (#10156 ) 1. Added memory leak detection for `DeltaWriter` and `MemTable` mem tracker 2. Modify memtable mem tracker to virtual to avoid frequent recursive consumption of parent tracker. 3. Disable memtable flush thread attach memtable tracker, ensure that memtable mem tracker is completely accurate. 4. Modify `memory_verbose_track=false`. At present, there is a performance problem in the frequent switch thread mem tracker. - Because the mem tracker exists as a shared_ptr in the thread local. Each time it is switched, the atomic variable use_count in the shared_ptr of the current tracker will be -1, and the tracker to be replaced use_count +1, multi-threading Frequent changes to the same tracker shared_ptr are slow. - TODO: 1. Reduce unnecessary thread mem tracker switch, 2. Consider using raw pointers for mem tracker in thread local.	2022-06-19 16:48:42 +08:00
Pxl	f2aa5f32b8	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change (#9811 ) Some pre-refactorings or interface additions for schema change	2022-06-07 15:04:57 +08:00
HappenLee	c426c2e4b1	[Vectorized-Load] Support vectorized load table with materialized view (#9923 ) * [Vectorized-Load] Support vectorized load table with materialized view * fix ut Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-02 14:59:01 +08:00
HappenLee	7199102d7c	[Opt][VecLoad] Opt the vec stream load performance (#9772 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-31 11:53:32 +08:00
Pxl	13c1d20426	[Bug] [Vectorized] add padding when load char type data (#9734 )	2022-05-26 16:51:01 +08:00
spaces-x	73e31a2179	[stream-load-vec]: memtable flush only if necessary after aggregated (#9459 ) Co-authored-by: weixiang <weixiang06@meituan.com>	2022-05-25 21:12:24 +08:00
Shuangchi He	73c4ec7167	Fix some typos in be/. (#9681 )	2022-05-19 20:55:39 +08:00
yixiutt	c9ab5e22fe	[fixbug](vec-load) fix core of segment_writer while it is not thread-safe (#9569 ) introduce in stream-load-vec #9280, it will cause multi-thread operate to same segment_write cause BetaRowset enable multi-thread of memtable flush, memtable flush call rowset_writer.add_block, it use member variable _segment_writer to write, so it will cause multi-thread in segment write. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-05-18 11:29:15 +08:00
plat1ko	4cd579b155	[refactor] Check status precise_code instead of construct OLAPInternalError (#9514 ) * check status precise_code instead of construct OLAPInternalError * move is_io_error to Status	2022-05-12 15:39:29 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
Zhengguo Yang	2ccaa6338c	[enhancement](load) optimize load string data and dict page write (#9123 ) * [enhancement](load) optimize load string data and dict page write	2022-05-07 10:27:27 +08:00

1 2

92 Commits