doris

Author	SHA1	Message	Date
Kaijie Chen	38885d4b00	[fix](load) fix memtable agg functions (#38017 ) (#38021 ) backport #38017	2024-07-17 23:04:57 +08:00
Kaijie Chen	359e50fc58	[fix](load) change tablet schema pointer to shared_ptr in memtable (#37927 ) (#37939 ) backport #37927	2024-07-16 22:32:03 +08:00
Kaijie Chen	005304953e	[performance](load) do not copy input_block in memtable (#36939 ) (#37407 ) cherry-pick #36939	2024-07-09 15:59:44 +08:00
yiguolei	f38ecd349c	[enhancement](memory) return error if allocate memory failed during add rows method (#35085 ) * return error when add rows failed * f --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-05-22 00:53:34 +08:00
abmdocrt	42425808a1	[Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788 " (#35056 ) * [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788) * Problem: For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas. Cause: Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE. Solution: Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs. * 2 * [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)	2024-05-20 15:43:46 +08:00
lihangyu	0a79c547ff	[Refactor](Sink) Remove is_append mode in table sink (#34684 ) Remove the is_append mode from the sink component due to the following reasons: 1. The performance improvement from this mode is relatively minor, approximately 10%, as demonstrated in previous benchmarks. 2. The mode complicates maintenance. It requires a separate data writing path to avoid copying, which increases complexity and poses a risk of potential data loss. I've already test the compability with previous version	2024-05-11 11:20:10 +08:00
Pxl	8fd6d4c41b	[Chore](build) add -Wconversion and remove some unused code (#33127 ) add -Wconversion and remove some unused code	2024-04-10 15:26:08 +08:00
Xinyi Zou	cf7595d423	[opt](memory) Optimize mem tracker accuracy (#32039 ) (#33140 )	2024-04-10 11:42:19 +08:00
Pxl	e96b3db6f8	[Improvement](memory) clear arena when finalize one row #30788	2024-02-16 10:12:24 +08:00
Kaijie Chen	5153137b83	[fix](metrics) fix bvar memtable_input_block_allocated_size (#28725 )	2023-12-21 21:16:14 +08:00
lihangyu	b2d16856b4	[Fix](memtable) fix `shrink_memtable_by_agg` without duplicated keys (#28660 ) remove duplicated logic: ``` vectorized::Block in_block = _input_mutable_block.to_block(); _put_into_output(in_block); ``` `_input_mutable_block.to_block()` will move `_input_mutable_block`, and lead to `flush` with empty block	2023-12-19 20:45:16 +08:00
Kaijie Chen	9434ee5710	[fix](load) fix memtracking orphan too large (#28600 )	2023-12-19 12:41:19 +08:00
lihangyu	d11365da9c	[Fix](memtable) fix `shrink_memtable_by_agg` should also update `_row_in_blocks` (#28536 ) Otherwise using the stale `_row_in_blocks` will result in heap-buffer-overflow ``` ==2695213==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62900122e210 at pc 0x56524744aecf bp 0x7f62c595ef7 0 sp 0x7f62c595ef68 READ of size 8 at 0x62900122e210 thread T1627 (MemTableFlushTh) #0 0x56524744aece in doris::vectorized::ColumnVector<long>::insert_indices_from(doris::vectorized::IColumn const&, unsigned int const, unsigned int const) /mnt/disk2/lihangyu/doris/be/src/vec/columns/column_vector.cpp:378:33 #1 0x5652472a7538 in doris::vectorized::ColumnNullable::insert_indices_from(doris::vectorized::IColumn const&, unsigned int const, unsigned int const) /mnt/disk2/lihangyu/doris/be/src/vec/columns/column_nullable.cpp:310:25 #2 0x56524782a62a in doris::vectorized::MutableBlock::add_rows(doris::vectorized::Block const, unsigned int const, unsigned int const) /mnt/disk2/lihangyu/doris/be/src/vec/core/block.cpp:961:14 #3 0x565233f187ae in doris::MemTable::_put_into_output(doris::vectorized::Block&) /mnt/disk2/lihangyu/doris/be/src/olap/memtable.cpp:248:27 #4 0x565233f1db66 in doris::MemTable::to_block() /mnt/disk2/lihangyu/doris/be/src/olap/memtable.cpp:496:13 #5 0x565233efae60 in doris::FlushToken::_do_flush_memtable(doris::MemTable, int, long) /mnt/disk2/lihangyu/doris/be/src/olap/memtable_flush_executor.cpp:121:62 #6 0x565233efc8d6 in doris::FlushToken::_flush_memtable(doris::MemTable, int, long) /mnt/disk2/lihangyu/doris/be/src/olap/memtable_flush_executor.cpp:150:16 #7 0x565233f0c5eb in doris::MemtableFlushTask::run() /mnt/disk2/lihangyu/doris/be/src/olap/memtable_flush_executor.cpp:58:23 ```	2023-12-18 10:31:16 +08:00
Kaijie Chen	d4f9e12ec7	[fix](load) fix memtable mem_tracker too large (#25205 )	2023-12-06 21:04:21 +08:00
Pxl	e3d2425d47	[Improvement](join) remove insert_indices_from_join and special judge for -1 (#27779 ) remove insert_indices_from_join and special judge for -1	2023-12-04 11:03:22 +08:00
meiyi	553e4a8903	[feature-wip](merge-on-write) MOW table support different primary keys and sort keys (#24788 )	2023-11-24 16:37:30 +08:00
bobhan1	1514f78b87	[refactor](partial-update) Split partial update infos from tablet schema (#25147 )	2023-10-17 14:21:40 +08:00
Adonis Ling	08f305dd79	[chore](build) Fix compilation errors reported by GCC-13 (#25439 ) 1. Fix lots of compilation errors reported by GCC-13. 2. Fix the workflow BE UT (macOS).	2023-10-15 07:57:36 -05:00
lihangyu	527293aa41	[refactor](dynamic table) remove dynamic table (#23298 )	2023-08-23 14:15:14 +08:00
zhengyu	24c1953e91	[fix](debug) add bvar counter for memtable & loadchannel (#22578 ) * [fix](debug) add bvar counter for memtable & loadchannel Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> * format code Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> --------- Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-08-04 13:58:28 +08:00
HHoflittlefish777	ee754307bb	[refactor](load) refactor memtable flush actively (#21634 )	2023-07-30 21:31:54 +08:00
zhannngchen	50c8563f35	[fix](partial update) fix some bugs of sequence column (#21896 )	2023-07-22 15:26:48 +08:00
bobhan1	2a721be4f7	[fix](partial update) correct col_nums when init agg state in memtable (#21592 )	2023-07-07 14:03:33 +08:00
Kaijie Chen	dac2b638c6	[refactor](load) move memtable flush logic to flush token and rowset writer (#21547 )	2023-07-06 17:04:30 +08:00
Pxl	f90e8fcb26	[Chore](storage) add debug info for TabletColumn::get_aggregate_function (#21408 )	2023-07-03 10:02:44 +08:00
Kaijie Chen	1c961f2272	[refactor](load) move generate_delete_bitmap from memtable to beta rowset writer (#21329 )	2023-07-01 17:22:45 +08:00
Xin Liao	48065fce19	[bugfix](merge-on-write) optimize rowset tree and tablet header lock (#20911 )	2023-06-18 19:26:02 +08:00
zhannngchen	ce9a20a375	[enhancement](merge-on-write) format logs about MoW and add more stats for publish (#20853 )	2023-06-17 23:14:28 +08:00
Kaijie Chen	5f4ccb1f2e	[fix](load) fix generate delete bitmap in memtable flush (#20446 ) 1. Generate delete bitmap for one segment at a time. 2. Generate delete bitmap before segment compaction. Fix #20445	2023-06-06 09:48:30 +08:00
Kaijie Chen	b0bbff0fd1	[performance](load) improve memtable sort performance (#20392 )	2023-06-04 20:33:15 +08:00
Kaijie Chen	a869056567	[performance](load) support parallel memtable flush for unique key tables (#20308 )	2023-06-02 13:49:53 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
yangshijie	1aefc26ca0	[Bug](memtable) fix a bug occurred when we were inserting data into duplicate table without keys (#20233 )	2023-05-31 18:21:36 +08:00
Pxl	bbb3af6ce6	[Feature](agg_state) support agg_state combinators (#19969 ) support agg_state combinators state/merge/union	2023-05-29 13:07:29 +08:00
Yongqiang YANG	e0d9f7f955	[enhancement](load) add some profile items for load (#20141 )	2023-05-29 09:54:03 +08:00
Xinyi Zou	56360ba04a	[fix](memory) Load flush memtable no check memory exceed #20036	2023-05-26 09:57:00 +08:00
sinemora	e3929820d9	[performance](load) use vector instead of skiplist when insert agg keys (#19099 )	2023-05-23 20:11:50 +08:00
lihangyu	fd4fa5c64e	[Optimize](row store) optimize serialization and deserialization (#19691 ) 1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column 2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id	2023-05-18 16:22:38 +08:00
yangshijie	ed8a4b4120	[feature-wip](duplicate_no_keys) skip sort function if the table is duplicate without keys (#19483 )	2023-05-11 14:44:16 +08:00
Pxl	dfad7b6b38	[Feature](generic-aggregation) some prowork of generic aggregation (#19343 ) some prowork of generic aggregation	2023-05-09 21:42:21 +08:00
DeadlineFen	e08de52ee7	[chore](compile) using PCH for compilation acceleration under clang (#19303 )	2023-05-08 19:51:06 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
Xinyi Zou	f23c93b3c6	[fix](memory) Fix AggFunc memory leak due to incorrect destroy (#19126 )	2023-04-27 14:58:32 +08:00
huanghaibin	9756be6bf0	[improvement](stream-load) use vector instead of skiplist when insert dup keys (#18686 )	2023-04-23 09:40:09 +08:00
lihangyu	8cc0af150a	[Fix](dynamic table) fix dynamic table with insert into and column al… (#18808 ) 1. The num_rows should be correctly set 2. insert into has no dynamic column	2023-04-21 11:19:00 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
lihangyu	3c3364ba27	[chore](row store) ignore serialize block to row column if no row store column (#18601 )	2023-04-13 10:02:33 +08:00
Xinyi Zou	dd78001cc1	[fix](memory) Fix memtable flush mem tracker #18330	2023-04-03 20:37:14 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00

1 2 3

141 Commits