doris

Author	SHA1	Message	Date
thinker	2a2f12ca51	[refactor & fix](exce & olap) refactor reader: rename Reader to TabletReader (#7544 ) 1. Consider the responsibility of Reader, Rename Reader to TabletReader, I think the new name TabletReader can represent its function exactly, it is more suitable and meaningful 2. add virtual keyword for the destructor of OlapScanner, because VOlapScanner is derived from it 3. refactor struct ReaderParams and KeysParam as TabletReader's inner struct，guard by TabletReader name scope, it's also more reasonable 4. reduce OlapScanner's member data amount, just use _parent->member_data is simpler 5. bugfix: TupleReader has the same memeber data _collect_iter to its parent class Reader, this usage is dangerous, the writer may make some mistake, so i delete TupleReader::_collect_iter to fix it. 6. call set_tablet_reader() in OlapScanner::prepare() to setup _tablet_reader, VOlapScanner should override set_tablet_reader to new BlockReader instead, use this way to avoid new Reader twice by reset unique_ptr _tablet_reader 7. if the member data is a inseparable part of a class, i suggest using normal variable while not pointer variable, because pointer bring a indirect lay and must handle coping and destructing carefully, it's not necessary 8. some other small changes for readability or design	2022-01-06 00:00:32 +08:00
HappenLee	4afdcdb939	[performance](reader) Opt the unique reader to reduce unnecessary compare and function call (#7348 )	2021-12-16 10:36:43 +08:00
HappenLee	91a3150910	[fix](reader) Fix the bug that reader call _capture_rs_readers function twice (#7224 )	2021-11-26 10:17:33 +08:00
HappenLee	8b557c0e70	[Refactor] Refact code of sequence column (#7007 )	2021-11-15 11:10:45 +08:00
HappenLee	c3b133bdb3	[Refactor] Refactor the reader code (#6866 ) 1. Removed useless redundant code logic 2. Change reader to interface, add tuple reader to simplify the structure of reader	2021-10-30 18:15:28 +08:00
Zhengguo Yang	ca3eb6490e	push down conditions on unique table value columns to base rowset (#6457 )	2021-08-26 09:14:49 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
caiconghui	d1007afe80	Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361 ) * [Optimize] optimize the speed of converting integer to string * Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-04 10:55:19 +08:00
huangmengbin	2d78c31d49	[Enhance] improve performance of init_scan_key by sharing the schema (#6099 ) Co-authored-by: huangmengbin <huangmengbin@bytedance.com>	2021-07-21 10:50:31 +08:00
Mingyu Chen	68f988b78a	[Optimize] Use flat_hash_set to replace unorderd_set in InPredicate (#6216 ) Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-07-15 11:15:11 +08:00
stdpain	290a844e04	[optimize] Optimize bloomfilter performance (#6180 ) refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5% import block bloom filter, for avx version obtained 40% performance improvement before: bloomfilter size:default, about 2000W item cost about 1s400ms after: bloomfilter size:524288, about 2000W item cost about 400ms	2021-07-10 10:12:12 +08:00
stdpain	149def9e42	[Feature] Support RuntimeFilter in Doris (BE Implement) (#6077 ) 1. support in/bloomfilter/minmax 2. support broadcast/shuffle/bucket shuffle/colocate join 3. opt memory use and cpu cache miss while build runtime filter 4. opt memory use in left semi join (works well on tpcds-95)	2021-07-04 20:59:05 +08:00
Yingchun Lai	6d6c3d9703	[Enhancement] Reduce memory consumption by releasing readers earier (#5811 ) We created multiple rowset readers to read data of one tablet, after one rowset reader has reached EOF, it can be released to reduce resource (typically memory) consumption. As the same, we can release segment reader when it reach EOF.	2021-06-16 09:37:50 +08:00
Mingyu Chen	9c7d8d2e98	[Bug] Fix bug that isPreAggregation is incorrectly set (#5608 ) 1. The MaterializedViewSelector should be reset for each scan node 2. On the BE side, columns with delete conditions must be added to the return column.	2021-04-09 14:13:06 +08:00
stdpain	bfeb717abe	[Refactor] fix some warning in gcc higher than 7 make decimal12_t as a POD type (#5547 )	2021-03-23 09:37:10 +08:00
HappenLee	462efeaf39	[Performance Optimization and Refactor] (#5358 ) (#5364 ) 1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2 2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2	2021-02-07 22:41:33 +08:00
Mingyu Chen	ea7f61e1c7	[Bug] Duplicate results when reading aggregation table (#5307 ) Previously, we introduced an optimization logic for the aggr table, that is, in the case of only one rowset and nonoverlapping, the data can be read directly without merging. But this logic has bugs.	2021-02-04 09:21:35 +08:00
Zhengguo Yang	4ffc61be32	fix apply condition to unique table value columns incorrectly (#5302 )	2021-01-29 10:34:47 +08:00
stdpain	8ee4c48f13	[Compile] fix compile error in gcc10 (#5294 )	2021-01-26 09:13:11 +08:00
HappenLee	a5298d617d	[Performance Improve] Push Down _conjunctf of 'not in' and '!=' to Storage Engine. (#5207 )	2021-01-23 21:07:01 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
Yingchun Lai	49f7eb69bf	[Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030 ) * [Refactor] Refactor DeleteHandler and Cond module (#4925) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-08 10:01:18 +08:00
Zhengguo Yang	b9dabc3b5b	[Enhance] Push down predicate on value column of unique table to base rowset (#5022 )	2020-12-06 08:50:37 +08:00
Mingyu Chen	c440aa07d1	Revert "[Refactor] Refactor DeleteHandler and Cond module (#4925 )" (#5028 ) This reverts commit 9c9992e0aa28ee85364eebf86a6675f1073e08fb. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-12-05 21:39:49 +08:00
Yingchun Lai	9c9992e0aa	[Refactor] Refactor DeleteHandler and Cond module (#4925 ) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-04 12:13:30 +08:00
Zhengguo Yang	1f236a5339	[BUG] Fix core when schema change (#5018 )	2020-12-04 09:53:19 +08:00
Zhengguo Yang	df1f06e60b	Optimized the read performance of the table when have multi versions (#4958 ) * Optimized the read performance of the table when have multi versions, changed the merge method of the unique table, merged the cumulative version data first, and then merged with the base version. For the data with only one base version, read directly without merging	2020-12-01 12:25:11 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Yingchun Lai	d1c2b3ed0d	[Optimize] Add an unordered_map for TabletSchema to speed up column name lookup (#4779 ) Reduce column name lookup for TabletSchema and Tablet from O(N) to O(1).	2020-11-03 19:53:44 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Youngwb	068707484d	Support sequence column for UNIQUE_KEYS Table (#4256 ) * add sequence col Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>	2020-09-04 10:10:17 +08:00
ZhangYu0123	c201cf6e4f	Support batch delete[part 2] (#4425 ) support batch delete for read compaction	2020-08-25 14:05:04 +08:00
lichaoyong	75ebe2b363	[Bug] Compaction row number cannot be matched between input rowsets and output rowsets. (#4139 ) Unique Key table will load duplicate rows for different loads. If exists duplicate row between loads. Compaction will merge this rows. The statistics should take this merged number into consideration. Now, We missed the merged number. So it will encounter error when compaction.	2020-07-23 10:28:56 +08:00
yangzhg	d3d835844f	[Performance] Improve performance of unique table read (#3974 ) Implements #3971 the test table as list: ``` mysql> desc test; +------------+---------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +------------+---------+------+-------+---------+---------+ \| rid \| BIGINT \| No \| true \| 0 \| \| \| qid \| BIGINT \| No \| true \| 0 \| \| \| qidDeleted \| TINYINT \| No \| false \| 0 \| REPLACE \| \| type \| TINYINT \| No \| false \| 0 \| REPLACE \| \| uid \| BIGINT \| No \| false \| 0 \| REPLACE \| \| toUid \| BIGINT \| No \| false \| 0 \| REPLACE \| \| status \| INT \| No \| false \| 0 \| REPLACE \| \| createTime \| INT \| No \| false \| 0 \| REPLACE \| \| source \| INT \| No \| false \| 0 \| REPLACE \| \| misFlag \| INT \| No \| false \| 0 \| REPLACE \| \| anonymous \| TINYINT \| No \| false \| 0 \| REPLACE \| \| uv \| TINYINT \| No \| false \| 1 \| REPLACE \| +------------+---------+------+-------+---------+---------+ 12 rows in set (0.00 sec) mysql> select count() from test; +----------+ \| count() \| +----------+ \| 1093760 \| +----------+ 1 row in set (1.00 sec) ``` There is 29 versions at present ![image](https://user-images.githubusercontent.com/9098473/85992244-2aa26c80-ba27-11ea-918a-04701a58dbdf.png) I run the query `select sum(uv) from test` for 10 times, the average ScanTime reduced from `9s277ms` to `8s206ms`	2020-07-02 13:56:08 +08:00
Yingchun Lai	73c3de4313	[refactor] Simple refactor on class Reader (#3691 ) This is a simple refactor patch on class Reader without any functional changes. Main refactor points: - Remove some useless return value - Use range loop - Use empty() instead of size() for some STL containers size judgement - Use in-class initialization instead of initialize in constructor function - Some other small refactor	2020-06-03 19:55:53 +08:00
kangkaisen	6c33f80544	Add disable_storage_page_cache config (#2890 ) 1. when read column data page: for compaction, schema_change, check_sum: we don't use page cache for query and config::disable_storage_page_cache is false, we use page cache 2. when read column index page if config::disable_storage_page_cache is false, we use page cache	2020-02-16 19:13:30 +08:00
Lijia Liu	99ad56d1bf	Support bitmap index for more type (#2630 ) For #2589 1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict. 2. decimal(decimal12_t) also uses frame of reference code as dict. 3. float/double use bitshuffle code as dict.	2020-01-31 21:09:29 +08:00
Mingyu Chen	13e5fdd512	[AlphaRowset] set num_segments field in rowset meta if missing (#2658 ) the num segments should be read from rowset meta pb. But the previous code error caused this value not to be set in some cases. So when init the rowset meta and find that the num_segments is 0(not set), we will try to calculate the num segments from AlphaRowsetExtraMetaPB, and then set the num_segments field. This should only happen in some rowsets converted from old version. and for all newly created rowsets, the num_segments field must be set.	2020-01-07 21:46:02 +08:00
kangpinghuang	f14cdacfd1	Fix single column read bug (#2122 )	2019-11-07 10:24:02 +08:00
kangkaisen	4e8d728e75	Remove unused code and unnecessary check (#1918 )	2019-09-30 18:35:30 +08:00
kangkaisen	cafb9f1e62	Replace Arena with MemPool first step (#1899 )	2019-09-28 01:12:22 +08:00
kangkaisen	b246d93128	Avoid SerDe for aggregation query with object pool (#1854 )	2019-09-26 13:51:13 +08:00
ZHAO Chun	a349409838	Move compare from RowCursor to row (#1764 )	2019-09-09 14:51:13 +08:00
kangkaisen	1e4dd77d2a	Add bitmap agg type and udaf (#1610 )	2019-08-26 14:24:42 +08:00
Dayue Gao	da8b9aad9a	Remove preaggregation and index stream cache stuff out of RowsetReaderContext (#1698 )	2019-08-26 14:19:03 +08:00
ZHAO Chun	c5edf9dae0	Unify Field and ColumnSchema in Storage (#1561 ) Currently, we have Field and ColumnSchema to access column data in a row. These two classes are mostly the same. So we should unify these to one class. Now, Field has offset information, which is an row attribute, so we remove offset in Field. RowCursor now has some logic which belong to Schema, so in this patch I add Schema attribute to RowCursor to make RowCursor simple. After this change, only Schema will handle Field/ColumnSchema. I extract some logic from RowCursor to be/src/olap/row.h, then we can use same logic to handle different types of row. Each type of row has same function that to get Cell of this row. A cell represent a column content with a null indicator.	2019-07-30 14:01:57 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
lichaoyong	8d87e36ff8	Place _init_seek_columns() in right place (#1302 )	2019-06-13 20:54:45 +08:00
ZHAO Chun	6ce8087916	Fix bug that RowCusor do NOT match with RowBlock's layout (#1249 )	2019-06-04 22:20:10 +08:00
李超勇	ff95f23615	Remove OLAP_LOG_DEBUG AND OLAP_LOG_TRACE log format (#378 ) Use VLOG(3) and VLOG(10) instead	2018-12-03 10:08:21 +08:00

1 2

61 Commits