doris

Author	SHA1	Message	Date
Pxl	bbb3af6ce6	[Feature](agg_state) support agg_state combinators (#19969 ) support agg_state combinators state/merge/union	2023-05-29 13:07:29 +08:00
Yongqiang YANG	e0d9f7f955	[enhancement](load) add some profile items for load (#20141 )	2023-05-29 09:54:03 +08:00
Xinyi Zou	56360ba04a	[fix](memory) Load flush memtable no check memory exceed #20036	2023-05-26 09:57:00 +08:00
sinemora	e3929820d9	[performance](load) use vector instead of skiplist when insert agg keys (#19099 )	2023-05-23 20:11:50 +08:00
lihangyu	fd4fa5c64e	[Optimize](row store) optimize serialization and deserialization (#19691 ) 1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column 2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id	2023-05-18 16:22:38 +08:00
yangshijie	ed8a4b4120	[feature-wip](duplicate_no_keys) skip sort function if the table is duplicate without keys (#19483 )	2023-05-11 14:44:16 +08:00
Pxl	dfad7b6b38	[Feature](generic-aggregation) some prowork of generic aggregation (#19343 ) some prowork of generic aggregation	2023-05-09 21:42:21 +08:00
DeadlineFen	e08de52ee7	[chore](compile) using PCH for compilation acceleration under clang (#19303 )	2023-05-08 19:51:06 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
Xinyi Zou	f23c93b3c6	[fix](memory) Fix AggFunc memory leak due to incorrect destroy (#19126 )	2023-04-27 14:58:32 +08:00
huanghaibin	9756be6bf0	[improvement](stream-load) use vector instead of skiplist when insert dup keys (#18686 )	2023-04-23 09:40:09 +08:00
lihangyu	8cc0af150a	[Fix](dynamic table) fix dynamic table with insert into and column al… (#18808 ) 1. The num_rows should be correctly set 2. insert into has no dynamic column	2023-04-21 11:19:00 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
lihangyu	3c3364ba27	[chore](row store) ignore serialize block to row column if no row store column (#18601 )	2023-04-13 10:02:33 +08:00
Xinyi Zou	dd78001cc1	[fix](memory) Fix memtable flush mem tracker #18330	2023-04-03 20:37:14 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
lihangyu	fcd25b53bf	[Optimize](Random distribution) Improve the performance of tablet sin… (#17389 ) The current distribution model for Doris is as follows: OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block. This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable. This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet	2023-03-10 10:52:40 +08:00
Xin Liao	849b5b7b8f	[fix](sequence) fix that the result is wrong when load multiple duplicate keys (#17575 )	2023-03-09 20:59:23 +08:00
Pxl	2bc014d83a	[Enchancement](function) remove unused params on aggregate function (#16886 ) remove unused params on aggregate function	2023-02-20 11:08:45 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
lihangyu	1d8265c5a3	[refactor](row-store) make row store column a hidden column in meta (#16251 ) This could simplfy storage engine logic and make code more readable, and we could analyze the hidden `__DORIS_ROW_STORE_COL__` length etc..	2023-02-02 20:56:13 +08:00
zhannngchen	69f34cd1c3	[fix](load) sequence column do not compare correctly in memtable (#16211 )	2023-02-02 11:00:23 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
yiguolei	4b6a4b3cf7	[refactor](remove unused code) Remove unused mempool declare or function params (#16222 ) * Remove unused mempool declare or function params --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-30 13:03:18 +08:00
yiguolei	0b5e71d3b4	[refactor](refactor field) remove unused method (#16068 )	2023-01-19 10:16:09 +08:00
Xinyi Zou	97fcad76f8	[enhancement](memtracker) Improve readability (#15716 )	2023-01-16 16:30:35 +08:00
zbtzbtzbt	fe5e5d2bf4	[refactor] separate agg and flush in memtable (#15713 )	2023-01-11 10:07:34 +08:00
Mingyu Chen	3fec5ff0f5	[refactor](scan-pool) move scan pool from env to scanner scheduler (#15604 ) The origin scan pools are in exec_env. But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used. All scan task will be submitted to the scan pool in scanner_scheduler. BTW, reorganize the scan pool into 3 kinds: local scan pool For olap scan node remote scan pool For file scan node limited scan pool For query which set cpu resource limit or with small limit clause TODO: Use bthread to unify all IO task. Some trivial issues: fix bug that the memtable flush size printed in log is not right Add RuntimeProfile param in VScanner	2023-01-11 09:38:42 +08:00
zbtzbtzbt	ba54634d55	[refactor] delete non vec load from memtable (#15667 ) * [refactor] delete non vec load from memtable delete non vec load from memtable totally. remove function keys_type() in memtable. Co-authored-by: zhoubintao <1229701101@qq.com>	2023-01-09 08:41:58 +08:00
yiguolei	b23d068281	[refactor](remove-non-vec) Remove non vec load from memtable and delta writer (#15517 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-30 21:22:58 +08:00
yiguolei	06d0035c02	[refactor](non-vec)remove schema change related non-vec code (#15313 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-23 18:33:04 +08:00
Xin Liao	efdc73777a	[enhancement](load) verify the number of rows between different replicas when load data to avoid data inconsistency (#15101 ) It is very difficult to investigate the data inconsistency of multiple replicas. When loading data, the number of rows between replicas is checked to avoid some data inconsistency problems.	2022-12-21 09:50:13 +08:00
xueweizhang	c4de619110	[fix](merge-on-write) calc delete bitmap need all segments which _do_flush in one memtable (#15018 ) when some case(need modify be.conf), a memtable may flush many segments and then calc delete bitmap with new data. but now, it just only load one segment with max sgement id and this bug will not cala delte bitmap with all data of all segment of one memtable, and will get many rows with same key from merge-on-write table.	2022-12-15 20:44:49 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Xinyi Zou	e1f0fa069c	[enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580 )	2022-11-29 10:15:25 +08:00
Xinyi Zou	a73f4dfdc1	[fix](memtracker) Fix scanner thread ending after fragment thread causing mem tracker null pointer #14143	2022-11-10 15:42:53 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
xy720	f329d33666	[chore](fix) Fix some spell errors in be's comments. #13452	2022-10-20 08:56:01 +08:00
Xin Liao	9e42804298	[feature-wip](unique-key-merge-on-write) unique key with merge on write table support schema change (#12886 )	2022-10-09 11:31:53 +08:00
Xinyi Zou	c55d08fa2f	[fix](memtracker) Refactor load channel mem tracker to improve accuracy (#12791 ) The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence. In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0. In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.	2022-09-21 20:16:19 +08:00
Xin Liao	bac58a4774	[feature-wip](unique-key-merge-on-write) fix calculate delete bitmap when flush memtable (#12668 )	2022-09-17 17:04:03 +08:00
Xin Liao	554ba40b13	[feature-wip](unique-key-merge-on-write) update delete bitmap when increamental clone (#12364 )	2022-09-09 17:03:27 +08:00
yixiutt	60fddd56e7	[feature-wip](unique-key-merge-on-write) opt lock and only save valid delete_bitmap (#11953 ) 1. use rlock in most logic instead of wrlock 2. filter stale rowset's delete bitmap in save meta 3. add a delete_bitmap lock to handle compaction and publish_txn confict Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-23 14:43:40 +08:00
yixiutt	0a5fd99d02	[feature-wip](unique-key-merge-on-write) speed up publish_txn (#11557 ) In our origin design, we calc delete bitmap in publish txn, and this operation will cost too much time as it will load segment data and lookup row key in pre rowset and segments.And publish version task should run in order, so it'll lead to timeout in publish_txn. In this pr, we seperate delete_bitmap calculation to tow part, one of it will be done in flush mem table, so this work can run parallel. And we calc final delete_bitmap in publish_txn, get a rowset_id set that should be included and remove rowsets that has been compacted, the rowset difference between memtable_flush and publish_txn is really small so publish_txn become very fast.In our test, publish_txn cost about 10ms. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-08 18:57:55 +08:00
Xinyi Zou	18864ab7fe	weak relationship between MemTracker and MemTrackerLimiter (#11347 )	2022-07-30 18:33:54 +08:00
zhannngchen	70c7e3d7aa	[feature-wip](unique-key-merge-on-write) remove AggType on unique table with MoW, enable preAggreation, DSIP-018[5/2] (#11205 ) remove AggType on unique table with MoW, enable preAggreation	2022-07-28 17:03:05 +08:00
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
HappenLee	8551ceaa1b	[Bug][Vectorized] Fix use-after-free bug of memtable shrink (#11197 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-26 16:10:44 +08:00

1 2 3

108 Commits