doris

Author	SHA1	Message	Date
kangpinghuang	9c2d149c36	add profile for segment v2 (#2015 )	2019-10-22 09:43:16 +08:00
Dayue Gao	8aa2cbe12d	Load Rowset only once in a thread-safe manner (#2022 ) [Storage] This PR implements thread-safe `Rowset::load()` for both AlphaRowset and BetaRowset. The main changes are 1. Introduce `DorisCallOnce<ReturnType>` to be the replacement for `DorisInitOnce` . It works for both Status and OLAPStatus. 2. `segment_v2::ColumnReader::init()` is now implemented by DorisCallOnce. 3. `segment_v2::Segment` is now created by a factory open() method. This guarantees all Segment instances are in opened state. 4. `segment_v2::Segment::_load_index()` is now implemented by DorisCallOnce. 5. Implement thread-safe load() for AlphaRowset and BetaRowset	2019-10-21 16:05:12 +08:00
ZHAO Chun	05643dc403	Replace Arena with MemPool (#2012 ) After replacing Arena with MemPool, we can achieve one copy for string value read from segment v2. We can exchange MemPool's chunk between RowBlockV2 and RowBlock. This change only replace Arena, this work will be done in other change list.	2019-10-19 15:53:24 +08:00
Lijia Liu	d68b1b287c	Support segment-level zone map (#1931 )	2019-10-13 22:06:09 +08:00
wangbo	80e9b21fb0	Make Segment v2 use string's real length(#1943 ) (#1944 )	2019-10-13 13:23:43 +08:00
wangbo	8aa8e08f27	v2 segment support string encode(#1766 ) (#1816 ) major change change data format of binary dict page, appending (dict page data) and (dict page offset) to binary dict page; add new decoding method for new binary dict page format add ut for segment test set the elements of initial array to 0 ,when calling arena.AllocateNewBlock hard code way to choose dict coding for string 0919 commit major change change dict file format:when saving binary dict page, separate dict page from dict page,one dict page may have multi data pages;when reading a binary dict page,one ColumnReader keeps one dict page loading dict when calling column_reader._read_page 3.rollback BinaryDictPage no longer using memset(0) to inital column_zonemap.max_value 0926 17 commit major change init column_zone_map min value column_zone_map slice's data array; set char/varchar column_zone_map'max value size to 0 add ut for char column zone map query hit/miss 0929 10 commit major change allocate mem for column_zone_map 's max and min value direct copy content to column_zone_map's max and min value	2019-09-30 16:25:31 +08:00
kangpinghuang	8d0fee7e64	Add default value column iterator #1834 (#1835 )	2019-09-24 14:39:10 +08:00
kangpinghuang	fe27969978	add delete predicate filter(#1636 ) (#1745 ) Delete predicate can be used to prune data by zone map.	2019-09-24 14:38:19 +08:00
ZHAO Chun	11eafe524f	Add ChunkAllocator to accelerate chunk allocation (#1792 ) I add ChunkAllocator in this CL to put unused memory chunk to a chunk pool other than return it to system allocator. Now we only change MemPool's chunk allocation and free to this. And two configuration are introduduced too. 'chunk_reserved_bytes_limit' is the limit of how many bytes this chunk pool can reserve in total and its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if chunk is allocated via mmap and default value is false. And in my test case with default configuration a simple like "select * from table limit 10", this can improve throughput from 280 QPS to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0, which means this is disabled, the throughput is the same with origin's.	2019-09-13 08:27:24 +08:00
wubiao	dad4def708	Support estimate size for v2 segment writer (#1787 )	2019-09-12 15:15:39 +08:00
Dayue Gao	5653822298	Writer magic number in footer instead of header (#1771 )	2019-09-10 09:54:13 +08:00
Dayue Gao	f76dad289e	Basic implementation for BetaRowsetReader (#1718 )	2019-09-03 13:52:16 +08:00
Dayue Gao	ae22d5e682	Support multiple key ranges in RowwiseIterator and StorageReadOptions (#1704 ) support multiple key ranges in RowwiseIterator and StorageReadOptions remove unused fields and member functions in RowBlock and ColumnData read num_rows_per_block from short key index footer	2019-08-27 17:57:42 +08:00
kangpinghuang	6d040a33af	Add zone map page(#1390 ) (#1633 )	2019-08-24 00:57:30 +08:00
Dayue Gao	af8256be2a	Implement BetaRowsetWriter (#1590 ) BetaRowsetWriter is used to write rowset in V2 segment format. This PR contains several interface changes 1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc 2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names 3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool 4. RowsetWriter.garbage_collection() is removed because it's not used by clients 5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality	2019-08-12 16:41:47 +08:00
ZHAO Chun	b2e678dfc1	Support Segment for BetaRowset (#1577 ) We create a new segment format for BetaRowset. New format merge data file and index file into one file. And we create a new format for short key index. In origin code index is stored in format like RowCusor which is not efficient to compare. Now we encode multiple column into binary, and we assure that this binary is sorted same with the key columns.	2019-08-06 17:15:11 +08:00

16 Commits