In our origin design, we calc delete bitmap in publish txn, and this operation
will cost too much time as it will load segment data and lookup row key in pre
rowset and segments.And publish version task should run in order, so it'll lead
to timeout in publish_txn.
In this pr, we seperate delete_bitmap calculation to tow part, one of it will be
done in flush mem table, so this work can run parallel. And we calc final
delete_bitmap in publish_txn, get a rowset_id set that should be included and
remove rowsets that has been compacted, the rowset difference between memtable_flush
and publish_txn is really small so publish_txn become very fast.In our test,
publish_txn cost about 10ms.
Co-authored-by: yixiutt <yixiu@selectdb.com>
1. Fix a bug that query large column table may cause infinite loop
2. Optimize the query logic with limit, for the case where the limit value is relatively small, reduce the parallelism of the scanner, reduce unnecessary resource consumption, and increase the number of similar queries that the system can carry at the same time, and increase the query speed by more than 60%
* fix infinite loop when reading wide table
When a wide table is read, the 1st batch would be exceed raw_bytes_threshold,
so Scanner should read at least 1 row.
Actually, we should adjust batch size automatically to reduce memoery usage.
* support like/not like conjuncts push down to storage engine
* vectorized engine support like/not like conjuncts push down to storage engine
* support both evaluate and evaluate_vec method in like predicate
* reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts
* change #ifndef to pragma once as per comments
* change enable_function_pushdown default to false
Co-authored-by: heguangnan <heguangnan@bytedance.com>
* [Schema Change] support fast add/drop column (#49)
* [feature](schema-change) support fast schema change. coauthor: yixiutt
* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang
* [feature](schema change) schema change optimize for add/drop columns.
1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta
Co-authored-by: yixiutt <yixiu@selectdb.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>
[Feature](schema change) fix write and add regression test (#69)
Co-authored-by: yixiutt <yixiu@selectdb.com>
[schema change] be ssupport that delete use newest schema
add delete regression test
fix regression case (#107)
tmp
[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.
[feature](schema change) fe olapTable maxUniqueId write in disk.
[feature](schema change) add rpc iface for sc add column.
[feature](schema change) add columnsDesc to TPushReq for ligtht sc.
resolve the deadlock when schema change (#124)
fix columns from fe don't has bitmap_index flag (#134)
add update/delete case
construct MATERIALIZED schema from origin schema when insert
fix not vectorized compaction coredump
use segment cache
choose newest schema by schema version when compaction (#182)
[bugfix](schema change) fix ligth schema change problem.
[feature](schema change) light schema change add alter job. (#1)
fix be ut
[bug] (schema change) unique drop key column should not light schema
change
[feature](schema change) add schema change regression-test.
fix regression test
[bugfix](schema change) fix multi alter clauses for light schema change. (#2)
[bugfix](schema change) fix multi clauses calculate column unique id (#3)
modify PushTask process (#217)
[Bugfix](schema change) fix jobId replay cause bdbje exception.
[bug](schema change) fix max col unique id repeatitive. (#232)
[optimize](schema change) modify pendingMaxColUniqueId generate rule.
fix compaction error
* fix be ut
* fix snapshot load core
fix unique_id error (#278)
[refact](fe) remove redundant code for light schema change. (#4)
[refact](fe) remove redundant code for light schema change. (#4)
format fe core
format be core
fix be ut
modify fe meta version
fix rebase error
flush schema into rowset_meta in old table
[refactor](schema change) refact fe light schema change. (#5)
delete the change of schemahash and support get max version schema
* modify for review
* fix be ut
* fix schema change test
SEQ_COL is used on tables with unique key to order data in one transaction(rowset),
when there is only one rowset and the rowset is compacted, rows in the rowset is sorted
and rows with same keys are resolved by compaction, so a scanner sets direct_mode to
optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL.
However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator.
Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator
creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator
does not handle it, so a core dump happen.
Actually, in the above case, segment iterator does not need to read SEQ_COL.
When SEQ_COL is really needed, iterators creates SEQ_COL column in block,
so segment Iterator does not need do create SEQ_COL at all.
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track.
2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability.
3. More powerful MemTracker debugging capabilities.
4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%.
5. Fix some other details.
1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: 731730da85/docs/en/server.md (bthread-local)
2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
Add a new column-type to speed up the approximation of quantiles.
1. The new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles.
2. support pre-aggregation of new column-type and quantile_state related functions.
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G
Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
Due to unlimited queue in OlapScanNode and NodeChannel, memory usage can be
very large for reading and writing large table, e.g 'insert into tableB select * from tableA'.
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;
Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
# Proposed changes
Issue Number: close#6238
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
Co-authored-by: wangbo <506340561@qq.com>
Co-authored-by: emmymiao87 <522274284@qq.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: thinker <zchw100@qq.com>
Co-authored-by: Zeno Yang <1521564989@qq.com>
Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
Co-authored-by: xinghuayu007 <1450306854@qq.com>
Co-authored-by: weizuo93 <weizuo@apache.org>
Co-authored-by: yiguolei <guoleiyi@tencent.com>
Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
Co-authored-by: awakeljw <993007281@qq.com>
Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>
## Problem Summary:
### 1. Some code from clickhouse
**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**
The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris
### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node
You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.
### 3. Data Model
Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.
### 4. How to use
1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)
### 5. Some diff from origin exec engine
https://github.com/doris-vectorized/doris-vectorized/issues/294
## Checklist(Required)
1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
1. Consider the responsibility of Reader, Rename Reader to TabletReader, I think the new name TabletReader can represent its function exactly, it is more suitable and meaningful
2. add virtual keyword for the destructor of OlapScanner, because VOlapScanner is derived from it
3. refactor struct ReaderParams and KeysParam as TabletReader's inner struct,guard by TabletReader name scope, it's also more reasonable
4. reduce OlapScanner's member data amount, just use _parent->member_data is simpler
5. bugfix: TupleReader has the same memeber data _collect_iter to its parent class Reader, this usage is dangerous, the writer may make some mistake, so i delete TupleReader::_collect_iter to fix it.
6. call set_tablet_reader() in OlapScanner::prepare() to setup _tablet_reader, VOlapScanner should override set_tablet_reader to new BlockReader instead, use this way to avoid new Reader twice by reset unique_ptr _tablet_reader
7. if the member data is a inseparable part of a class, i suggest using normal variable while not pointer variable, because pointer bring a indirect lay and must handle coping and destructing carefully, it's not necessary
8. some other small changes for readability or design
1. Delete useless variables
2. Add const modifier for read-only function
3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule:
[https://en.cppreference.com/w/cpp/language/rule_of_three]
4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function.
Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override
In RuntimeFilter BloomFilter, decimal column will got a wrong hash value because violating aliasing rules
decimal12_t decimal = { 12, 12 };
murmurhash3(decimal) in bloom filter: 2167721464
expect: 4203026776