[Storage]
This PR implements thread-safe `Rowset::load()` for both AlphaRowset and BetaRowset. The main changes are
1. Introduce `DorisCallOnce<ReturnType>` to be the replacement for `DorisInitOnce` . It works for both Status and OLAPStatus.
2. `segment_v2::ColumnReader::init()` is now implemented by DorisCallOnce.
3. `segment_v2::Segment` is now created by a factory open() method. This guarantees all Segment instances are in opened state.
4. `segment_v2::Segment::_load_index()` is now implemented by DorisCallOnce.
5. Implement thread-safe load() for AlphaRowset and BetaRowset
After replacing Arena with MemPool, we can achieve one copy for string
value read from segment v2. We can exchange MemPool's chunk between
RowBlockV2 and RowBlock. This change only replace Arena, this work will
be done in other change list.
major change
change data format of binary dict page, appending (dict page data) and (dict page offset) to binary dict page;
add new decoding method for new binary dict page format
add ut for segment test
set the elements of initial array to 0 ,when calling arena.AllocateNewBlock
hard code way to choose dict coding for string
0919 commit major change
change dict file format:when saving binary dict page, separate dict page from dict page,one dict page may have multi data pages;when reading a binary dict page,one ColumnReader keeps one dict page
loading dict when calling column_reader._read_page
3.rollback BinaryDictPage
no longer using memset(0) to inital column_zonemap.max_value
0926 17 commit major change
init column_zone_map min value column_zone_map slice's data array;
set char/varchar column_zone_map'max value size to 0
add ut for char column zone map query hit/miss
0929 10 commit major change
allocate mem for column_zone_map 's max and min value
direct copy content to column_zone_map's max and min value
I add ChunkAllocator in this CL to put unused memory chunk to a chunk
pool other than return it to system allocator. Now we only change
MemPool's chunk allocation and free to this.
And two configuration are introduduced too. 'chunk_reserved_bytes_limit'
is the limit of how many bytes this chunk pool can reserve in total and
its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if
chunk is allocated via mmap and default value is false.
And in my test case with default configuration a simple like
"select * from table limit 10", this can improve throughput from 280 QPS
to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0,
which means this is disabled, the throughput is the same with origin's.
support multiple key ranges in RowwiseIterator and StorageReadOptions
remove unused fields and member functions in RowBlock and ColumnData
read num_rows_per_block from short key index footer
BetaRowsetWriter is used to write rowset in V2 segment format.
This PR contains several interface changes
1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc
2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names
3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool
4. RowsetWriter.garbage_collection() is removed because it's not used by clients
5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality
We create a new segment format for BetaRowset. New format merge
data file and index file into one file. And we create a new format
for short key index. In origin code index is stored in format like
RowCusor which is not efficient to compare. Now we encode multiple
column into binary, and we assure that this binary is sorted same
with the key columns.