Commit Graph

16 Commits

Author SHA1 Message Date
9c2d149c36 add profile for segment v2 (#2015) 2019-10-22 09:43:16 +08:00
8aa2cbe12d Load Rowset only once in a thread-safe manner (#2022)
[Storage]
This PR implements thread-safe `Rowset::load()` for both AlphaRowset and BetaRowset. The main changes are 

1. Introduce `DorisCallOnce<ReturnType>` to be the replacement for `DorisInitOnce` . It works for both Status and OLAPStatus.
2. `segment_v2::ColumnReader::init()` is now implemented by DorisCallOnce.
3. `segment_v2::Segment` is now created by a factory open() method. This guarantees all Segment instances are in opened state.
4. `segment_v2::Segment::_load_index()` is now implemented by DorisCallOnce.
5. Implement thread-safe load() for AlphaRowset and BetaRowset
2019-10-21 16:05:12 +08:00
05643dc403 Replace Arena with MemPool (#2012)
After replacing Arena with MemPool, we can achieve one copy for string
value read from segment v2. We can exchange MemPool's chunk between
RowBlockV2 and RowBlock. This change only replace Arena, this work will
be done in other change list.
2019-10-19 15:53:24 +08:00
d68b1b287c Support segment-level zone map (#1931) 2019-10-13 22:06:09 +08:00
80e9b21fb0 Make Segment v2 use string's real length(#1943) (#1944) 2019-10-13 13:23:43 +08:00
8aa8e08f27 v2 segment support string encode(#1766) (#1816)
major change

change data format of binary dict page, appending (dict page data) and (dict page offset) to binary dict page;
add new decoding method for new binary dict page format
add ut for segment test
set the elements of initial array to 0 ,when calling arena.AllocateNewBlock
hard code way to choose dict coding for string
0919 commit major change

change dict file format:when saving binary dict page, separate dict page from dict page,one dict page may have multi data pages;when reading a binary dict page,one ColumnReader keeps one dict page
loading dict when calling column_reader._read_page
3.rollback BinaryDictPage
no longer using memset(0) to inital column_zonemap.max_value
0926 17 commit major change

init column_zone_map min value column_zone_map slice's data array;
set char/varchar column_zone_map'max value size to 0
add ut for char column zone map query hit/miss
0929 10 commit major change

allocate mem for column_zone_map 's max and min value
direct copy content to column_zone_map's max and min value
2019-09-30 16:25:31 +08:00
8d0fee7e64 Add default value column iterator #1834 (#1835) 2019-09-24 14:39:10 +08:00
fe27969978 add delete predicate filter(#1636) (#1745)
Delete predicate can be used to prune data by zone map.
2019-09-24 14:38:19 +08:00
11eafe524f Add ChunkAllocator to accelerate chunk allocation (#1792)
I add ChunkAllocator in this CL to put unused memory chunk to a chunk
pool other than return it to system allocator. Now we only change
MemPool's chunk allocation and free to this.

And two configuration are introduduced too. 'chunk_reserved_bytes_limit'
is the limit of how many bytes this chunk pool can reserve in total and
its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if
chunk is allocated via mmap and default value is false.

And in my test case with default configuration a simple like
"select * from table limit 10", this can improve throughput from 280 QPS
to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0,
which means this is disabled, the throughput is the same with origin's.
2019-09-13 08:27:24 +08:00
dad4def708 Support estimate size for v2 segment writer (#1787) 2019-09-12 15:15:39 +08:00
5653822298 Writer magic number in footer instead of header (#1771) 2019-09-10 09:54:13 +08:00
f76dad289e Basic implementation for BetaRowsetReader (#1718) 2019-09-03 13:52:16 +08:00
ae22d5e682 Support multiple key ranges in RowwiseIterator and StorageReadOptions (#1704)
support multiple key ranges in RowwiseIterator and StorageReadOptions
remove unused fields and member functions in RowBlock and ColumnData
read num_rows_per_block from short key index footer
2019-08-27 17:57:42 +08:00
6d040a33af Add zone map page(#1390) (#1633) 2019-08-24 00:57:30 +08:00
af8256be2a Implement BetaRowsetWriter (#1590)
BetaRowsetWriter is used to write rowset in V2 segment format.

This PR contains several interface changes
1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc
2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names
3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool
4. RowsetWriter.garbage_collection() is removed because it's not used by clients
5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality
2019-08-12 16:41:47 +08:00
b2e678dfc1 Support Segment for BetaRowset (#1577)
We create a new segment format for BetaRowset. New format merge
data file and index file into one file. And we create a new format
for short key index. In origin code index is stored in format like
RowCusor which is not efficient to compare. Now we encode multiple
column into binary, and we assure that this binary is sorted same
with the key columns.
2019-08-06 17:15:11 +08:00