Commit Graph

61 Commits

Author SHA1 Message Date
2a2f12ca51 [refactor & fix](exce & olap) refactor reader: rename Reader to TabletReader (#7544)
1. Consider the responsibility of Reader,  Rename Reader to TabletReader, I think the new name TabletReader can represent its function exactly,  it is more suitable and meaningful
2. add virtual keyword for the destructor of OlapScanner, because VOlapScanner is derived from it
3. refactor struct ReaderParams and KeysParam as TabletReader's inner struct,guard by TabletReader name scope, it's also more reasonable
4. reduce OlapScanner's member data amount, just use _parent->member_data is simpler
5. bugfix: TupleReader has the same memeber data _collect_iter to its parent class Reader, this usage is dangerous, the writer may make some mistake, so i delete TupleReader::_collect_iter to fix it.
6. call set_tablet_reader() in OlapScanner::prepare() to setup _tablet_reader, VOlapScanner should override set_tablet_reader to new BlockReader instead,  use this way to avoid new Reader twice by reset unique_ptr _tablet_reader
7. if the member data is a inseparable part of a class, i suggest using normal variable while not pointer variable, because pointer bring a indirect lay and must handle coping and destructing carefully, it's not necessary
8. some other small changes for readability or design
2022-01-06 00:00:32 +08:00
4afdcdb939 [performance](reader) Opt the unique reader to reduce unnecessary compare and function call (#7348) 2021-12-16 10:36:43 +08:00
91a3150910 [fix](reader) Fix the bug that reader call _capture_rs_readers function twice (#7224) 2021-11-26 10:17:33 +08:00
8b557c0e70 [Refactor] Refact code of sequence column (#7007) 2021-11-15 11:10:45 +08:00
c3b133bdb3 [Refactor] Refactor the reader code (#6866)
1. Removed useless redundant code logic
2. Change reader to interface, add tuple reader to simplify the structure of reader
2021-10-30 18:15:28 +08:00
ca3eb6490e push down conditions on unique table value columns to base rowset (#6457) 2021-08-26 09:14:49 +08:00
8738ce380b Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391) 2021-08-18 09:05:40 +08:00
d1007afe80 Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361)
* [Optimize] optimize the speed of converting integer to string

* Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-04 10:55:19 +08:00
2d78c31d49 [Enhance] improve performance of init_scan_key by sharing the schema (#6099)
Co-authored-by: huangmengbin <huangmengbin@bytedance.com>
2021-07-21 10:50:31 +08:00
68f988b78a [Optimize] Use flat_hash_set to replace unorderd_set in InPredicate (#6216)
Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2021-07-15 11:15:11 +08:00
290a844e04 [optimize] Optimize bloomfilter performance (#6180)
refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5%
import block bloom filter, for avx version obtained 40% performance improvement
before: bloomfilter size:default, about 2000W item cost about 1s400ms
after: bloomfilter size:524288, about 2000W item cost about 400ms
2021-07-10 10:12:12 +08:00
149def9e42 [Feature] Support RuntimeFilter in Doris (BE Implement) (#6077)
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
2021-07-04 20:59:05 +08:00
6d6c3d9703 [Enhancement] Reduce memory consumption by releasing readers earier (#5811)
We created multiple rowset readers to read data of one tablet,
after one rowset reader has reached EOF, it can be released to
reduce resource (typically memory) consumption.
As the same, we can release segment reader when it reach EOF.
2021-06-16 09:37:50 +08:00
9c7d8d2e98 [Bug] Fix bug that isPreAggregation is incorrectly set (#5608)
1. The MaterializedViewSelector should be reset for each scan node
2. On the BE side, columns with delete conditions must be added to the return column.
2021-04-09 14:13:06 +08:00
bfeb717abe [Refactor] fix some warning in gcc higher than 7 make decimal12_t as a POD type (#5547) 2021-03-23 09:37:10 +08:00
462efeaf39 [Performance Optimization and Refactor] (#5358) (#5364)
1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2
2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2
2021-02-07 22:41:33 +08:00
ea7f61e1c7 [Bug] Duplicate results when reading aggregation table (#5307)
Previously, we introduced an optimization logic for the aggr table,
that is, in the case of only one rowset and nonoverlapping,
the data can be read directly without merging.
But this logic has bugs.
2021-02-04 09:21:35 +08:00
4ffc61be32 fix apply condition to unique table value columns incorrectly (#5302) 2021-01-29 10:34:47 +08:00
8ee4c48f13 [Compile] fix compile error in gcc10 (#5294) 2021-01-26 09:13:11 +08:00
a5298d617d [Performance Improve] Push Down _conjunctf of 'not in' and '!=' to Storage Engine. (#5207) 2021-01-23 21:07:01 +08:00
93a4c7efc1 [LOG] Standardize the use of VLOG in code (#5264)
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
2021-01-21 12:09:09 +08:00
49f7eb69bf [Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030)
* [Refactor] Refactor DeleteHandler and Cond module (#4925)

This patch mainly do the following refactors:
- Use int64_t instead of int32_t for 'version' in DeleteHandler
- Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments
- Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid
- Use range loop to simplify code
- Reduce some compare operations in Cond::del_eval
- Improve some branch predictions in Reader
- Fix and improve some unit tests
2020-12-08 10:01:18 +08:00
b9dabc3b5b [Enhance] Push down predicate on value column of unique table to base rowset (#5022) 2020-12-06 08:50:37 +08:00
c440aa07d1 Revert "[Refactor] Refactor DeleteHandler and Cond module (#4925)" (#5028)
This reverts commit 9c9992e0aa28ee85364eebf86a6675f1073e08fb.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-12-05 21:39:49 +08:00
9c9992e0aa [Refactor] Refactor DeleteHandler and Cond module (#4925)
This patch mainly do the following refactors:
- Use int64_t instead of int32_t for 'version' in DeleteHandler
- Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments
- Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid
- Use range loop to simplify code
- Reduce some compare operations in Cond::del_eval
- Improve some branch predictions in Reader
- Fix and improve some unit tests
2020-12-04 12:13:30 +08:00
1f236a5339 [BUG] Fix core when schema change (#5018) 2020-12-04 09:53:19 +08:00
df1f06e60b Optimized the read performance of the table when have multi versions (#4958)
* Optimized the read performance of the table when have multi versions,
changed the merge method of the unique table,
merged the cumulative version data first, and then merged with the base version.
For the data with only one base version, read directly without merging
2020-12-01 12:25:11 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
d1c2b3ed0d [Optimize] Add an unordered_map for TabletSchema to speed up column name lookup (#4779)
Reduce column name lookup for TabletSchema and Tablet from O(N) to O(1).
2020-11-03 19:53:44 +08:00
09f97f8a05 [Refactor] Fixes some be typo part 2 (#4747) 2020-10-20 09:28:57 +08:00
068707484d Support sequence column for UNIQUE_KEYS Table (#4256)
* add sequence  col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-04 10:10:17 +08:00
c201cf6e4f Support batch delete[part 2] (#4425)
support batch delete for read compaction
2020-08-25 14:05:04 +08:00
75ebe2b363 [Bug] Compaction row number cannot be matched between input rowsets and output rowsets. (#4139)
Unique Key table will load duplicate rows for different loads.
If exists duplicate row between loads. Compaction will merge this rows.
The statistics should take this merged number into consideration.
Now, We missed the merged number. So it will encounter error when compaction.
2020-07-23 10:28:56 +08:00
d3d835844f [Performance] Improve performance of unique table read (#3974)
Implements #3971 
the test table as list:
```
mysql> desc test;
+------------+---------+------+-------+---------+---------+
| Field      | Type    | Null | Key   | Default | Extra   |
+------------+---------+------+-------+---------+---------+
| rid        | BIGINT  | No   | true  | 0       |         |
| qid        | BIGINT  | No   | true  | 0       |         |
| qidDeleted | TINYINT | No   | false | 0       | REPLACE |
| type       | TINYINT | No   | false | 0       | REPLACE |
| uid        | BIGINT  | No   | false | 0       | REPLACE |
| toUid      | BIGINT  | No   | false | 0       | REPLACE |
| status     | INT     | No   | false | 0       | REPLACE |
| createTime | INT     | No   | false | 0       | REPLACE |
| source     | INT     | No   | false | 0       | REPLACE |
| misFlag    | INT     | No   | false | 0       | REPLACE |
| anonymous  | TINYINT | No   | false | 0       | REPLACE |
| uv         | TINYINT | No   | false | 1       | REPLACE |
+------------+---------+------+-------+---------+---------+
12 rows in set (0.00 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|  1093760 |
+----------+
1 row in set (1.00 sec)
```
There is 29 versions at present
![image](https://user-images.githubusercontent.com/9098473/85992244-2aa26c80-ba27-11ea-918a-04701a58dbdf.png)
I run the query `select sum(uv) from test` for 10 times,
the average ScanTime reduced from `9s277ms`  to `8s206ms`
2020-07-02 13:56:08 +08:00
73c3de4313 [refactor] Simple refactor on class Reader (#3691)
This is a simple refactor patch on class Reader without any functional changes.
Main refactor points:
- Remove some useless return value
- Use range loop
- Use empty() instead of size() for some STL containers size judgement
- Use in-class initialization instead of initialize in constructor function
- Some other small refactor
2020-06-03 19:55:53 +08:00
6c33f80544 Add disable_storage_page_cache config (#2890)
1. when read column data page:
    for compaction, schema_change, check_sum: we don't use page cache
    for query and config::disable_storage_page_cache is false, we use page cache
2. when read column index page
    if config::disable_storage_page_cache is false, we use page cache
2020-02-16 19:13:30 +08:00
99ad56d1bf Support bitmap index for more type (#2630)
For #2589

1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict.
2. decimal(decimal12_t) also uses frame of reference code as dict.
3. float/double use bitshuffle code as dict.
2020-01-31 21:09:29 +08:00
13e5fdd512 [AlphaRowset] set num_segments field in rowset meta if missing (#2658)
the num segments should be read from rowset meta pb.
But the previous code error caused this value not to be set in some cases.
So when init the rowset meta and find that the num_segments is 0(not set),
we will try to calculate the num segments from AlphaRowsetExtraMetaPB,
and then set the num_segments field.
This should only happen in some rowsets converted from old version.
and for all newly created rowsets, the num_segments field must be set.
2020-01-07 21:46:02 +08:00
f14cdacfd1 Fix single column read bug (#2122) 2019-11-07 10:24:02 +08:00
4e8d728e75 Remove unused code and unnecessary check (#1918) 2019-09-30 18:35:30 +08:00
cafb9f1e62 Replace Arena with MemPool first step (#1899) 2019-09-28 01:12:22 +08:00
b246d93128 Avoid SerDe for aggregation query with object pool (#1854) 2019-09-26 13:51:13 +08:00
a349409838 Move compare from RowCursor to row (#1764) 2019-09-09 14:51:13 +08:00
1e4dd77d2a Add bitmap agg type and udaf (#1610) 2019-08-26 14:24:42 +08:00
da8b9aad9a Remove preaggregation and index stream cache stuff out of RowsetReaderContext (#1698) 2019-08-26 14:19:03 +08:00
c5edf9dae0 Unify Field and ColumnSchema in Storage (#1561)
Currently, we have Field and ColumnSchema to access column data in a
row. These two classes are mostly the same. So we should unify these to
one class. Now, Field has offset information, which is an row attribute,
so we remove offset in Field.

RowCursor now has some logic which belong to Schema, so in this patch I
add Schema attribute to RowCursor to make RowCursor simple. After this
change, only Schema will handle Field/ColumnSchema.

I extract some logic from RowCursor to be/src/olap/row.h, then we can
use same logic to handle different types of row. Each type of row has
same function that to get Cell of this row. A cell represent a column
content with a null indicator.
2019-07-30 14:01:57 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
8d87e36ff8 Place _init_seek_columns() in right place (#1302) 2019-06-13 20:54:45 +08:00
6ce8087916 Fix bug that RowCusor do NOT match with RowBlock's layout (#1249) 2019-06-04 22:20:10 +08:00
ff95f23615 Remove OLAP_LOG_DEBUG AND OLAP_LOG_TRACE log format (#378)
Use VLOG(3) and VLOG(10) instead
2018-12-03 10:08:21 +08:00