Commit Graph

8 Commits

Author SHA1 Message Date
40c53931e5 [fix](vec) VMergeIterator add key same label for agg table (#14722) 2023-01-02 22:54:21 +08:00
e640f49b6d [refactor](non-vec) remove non vectorized predicate and row_block (#15348)
remove non vectorized predicate and row_block
2022-12-25 21:45:00 +08:00
83a99a0f8b [refactor](non-vec) Remove non vec code from be (#15278)
* [refactor](removecode) remove some non-vectorization
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-22 23:28:30 +08:00
a7180c5ad8 [fix](segcompaction) fix segcompaction failed for newly created segment (#15022) (#15023)
Currently, newly created segment could be chosen to be compaction
candidate, which is prone to bugs and segment file open failures. We
should skip last (maybe active) segment while doing segcompaction.
2022-12-19 14:17:58 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
94a6ffb906 [feature](compaction) support vertical_compaction & ordered_data_compaction (#14524) 2022-12-01 22:15:41 +08:00
24b51b9035 [fix](compaction) segcompaction coredump if the rowset starts with a big segment (#14174) (#14176)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-14 09:54:08 +08:00
554f566217 [enhancement](compaction) introduce segment compaction (#12609) (#12866)
## Design

### Trigger

Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare.

### Target Selection

We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction.

### Compaction Process

A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings:

- build a MergeIterator from the target segments
- create a new segment writer
- for each block readed from MergeIterator, the Writer append it

### SegID handling

SegID must remain consecutive after segment compaction. 

If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4:

- we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3
- delete seg_0, seg_1, seg_2 and seg_3
- rename seg_0-3 to seg_0
- rename seg_4 to seg_1

It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.
2022-11-04 14:12:51 +08:00