Commit Graph

114 Commits

Author SHA1 Message Date
f1b9185830 [feature](cooldown) Implement cold data compaction (#16681) 2023-02-14 15:21:54 +08:00
f3ab55d27d [Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569) 2023-02-14 00:12:45 +08:00
171ae2892f [improvement](batch size) pass batch size of exec engine to storage engine (#16614)
Currently batch_size is not passed on to SegmentIterator, the SegmentIterator uses the hard coded value 4096 - 32 as the max row count of a block.


* fix bug
2023-02-11 09:01:44 +08:00
737c73dcf0 [Improvement](topn) order by key topn query optimization (#15663) 2023-02-06 15:36:05 +08:00
0b5e71d3b4 [refactor](refactor field) remove unused method (#16068) 2023-01-19 10:16:09 +08:00
b1caa68706 [Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823)
Issue Number: Step2 of DSIP-023: Add inverted index for full text search
implementation of inverted index reader

dependency pr: #14211 #15807 #15821
2023-01-17 09:13:56 +08:00
9d1f02c580 [Improvement](topn) runtime prune for topn query (#15558) 2023-01-05 20:10:12 +08:00
305dd15fea [improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035)
Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data.

Based on that situation, this pr will do:

1. this pr in order to support bitmap index apply compound predicates, query sql like:
select * from tb where a > 'hello' or b < 100;
select * from tb where a > 'hello' or b < 100 or c > 'ok';
select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200);
select * from tb where (not a> 'hello') or b < 100;
...
above sql,column a and b and c has created bitmap_index.

2. this optimization can reduce reading data by index
3. set config enable_index_apply_compound_predicates to use this optimization
2022-12-28 20:08:57 +08:00
75aa00d3d0 [Feature](NGram BloomFilter Index) add new ngram bloom filter index to speed up like query (#11579)
This PR implement  the new bloom filter index: NGram bloom filter index, which was proposed in  #10733.
The new index can improve the like query performance greatly, from our some test case , can  get order of magnitude  improve.
For how to use it you can check the docs in this PR, and the index based on the ```enable_function_pushdown```,
you need set it to ```true```, to make the index work for like query.
2022-12-28 18:01:50 +08:00
bd52fa1966 [enhancement](checksum) use vertorized engine in checksum (#15260) 2022-12-26 10:28:15 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
94a6ffb906 [feature](compaction) support vertical_compaction & ordered_data_compaction (#14524) 2022-12-01 22:15:41 +08:00
Pxl
d712c4efe1 [Enhancement](predicate) move create column predicate to create_predicate_function (#14588)
move create column predicate to create_predicate_function
use same macro to create_column_predicate and create_predicate_function
2022-11-28 14:13:40 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
8d729f9386 [fix](error-code) fix misuse fo OLAP_ERR_WRITE_PROTOBUF_ERROR (#13347) 2022-10-14 09:57:07 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
e01986b8b9 [feature](light-schema-change) fix light-schema-change and add more cases (#12160)
Fix _delete_sign_idx and _seq_col_idx when append_column or build_schema when load.
Tablet schema cache support recycle when schema sptr use count equals 1.
Add a http interface for flink-connector to sync ddl.
Improve tablet->tablet_schema() by max_version_schema.
2022-09-17 11:29:36 +08:00
2f192019d3 [bugfix](delete hanlder) delete predicate is merged and could not find schema cause core dump (#12161)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-30 09:18:21 +08:00
5f7d6e8f2b [Refactor](predicate) Unify Conditions and ColumnPredicate (#11985) 2022-08-29 12:11:22 +08:00
ccff3f5711 [bugfix](light weight schema change) support delete condition in schema change (#11869)
* [bugfix](light weight schema change) support delete condition in schema change


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-26 11:45:55 +08:00
ea57bf6370 [refactor](delete predicate) Unify delete to segmentiterator (#11650)
* remove seek columns and unify delete columns in rowset reader


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-11 15:12:43 +08:00
70b39475cf [fix](scanner) delete predicates might be inconsistent with rowset readers (#11598) 2022-08-10 19:40:54 +08:00
0291f84a9e [fix](like-predicate) Add missing functions in LikeColumnPredicate (#11631) 2022-08-10 15:03:14 +08:00
f9b151744d optimize topn query if order by columns is prefix of sort keys of table (#10694)
* [feature](planner): push limit to olapscan when meet sort.

* if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key
and read_orderby_key_reverse for olap scanner

* There is a common query pattern to find latest time serials data.
 eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100

If the ORDER BY columns is the prefix of the sort key of table, it can
be greatly optimized to read much fewer data instead of read all data
between t1 and t2.

By leveraging the same order of ORDER BY columns and sort key of table,
just read the LIMIT N rows for each related segment and merge N rows.

1. set read_orderby_key to true for read_params and _reader_context
   if olap_scan_node's sort info is set.
2. set read_orderby_key_reverse to true for read_params and _reader_context
   if is_asc_order is false.
3. rowset reader force merge read segments if read_orderby_key is true.
4. block reader and tablet reader force merge read rowsets if read_orderby_key is true.

5. for ORDER BY DESC, read and compare in reverse order
5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and
    reverse the result block before return to caller.
5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return
    opposite result for _is_reverse order in its compare function.

Co-authored-by: jackwener <jakevingoo@gmail.com>
2022-08-09 09:08:44 +08:00
321107cb40 [refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475)
* Using tabletschema shared ptr instead of raw ptrs


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-05 11:04:38 +08:00
b5531c5caf [BugFix](BE) fix condition index doesn't match (#11474)
* [BugFix](Be) fix condition index doesn't match
2022-08-05 07:57:18 +08:00
d4fb27125a [feature-wip](unique-key-merge-on-write) row id conversion for compaction (#11149) 2022-07-27 16:32:13 +08:00
Pxl
461a31b1f6 [enhancement][Storage] refactor create predicate (#11017) 2022-07-27 16:12:23 +08:00
93b0e002d1 [feature-wip](unique-key-merge-on-write) add delete bitmap in read path, DSIP-018[2/3] (#11136) 2022-07-27 14:18:21 +08:00
babab5d535 [feature-wip] support datetimev2 (#11085) 2022-07-23 16:07:59 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
dc2b709f6f [Bug](compaction) fix uniq key compaction bug that does not count merged rows right(#10971)
When a rowset includes multiple segments, segments rows will be merged in generic_iterator but merged_rows is not maintained. Compaction will failed in check_correctness.
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-07-20 12:07:45 +08:00
f6cb7a838b [Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355)
* support like/not like conjuncts push down to storage engine
* vectorized engine support like/not like conjuncts push down to storage engine
* support both evaluate and evaluate_vec method in like predicate
* reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts
* change #ifndef to pragma once as per comments
* change enable_function_pushdown default to false
Co-authored-by: heguangnan <heguangnan@bytedance.com>
2022-07-19 08:33:04 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00
486cf0ebd4 [Feature] Lightweight schema change of add/drop column (#10136)
* [Schema Change] support fast add/drop column  (#49)

* [feature](schema-change) support fast schema change. coauthor: yixiutt

* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang

* [feature](schema change) schema change optimize for add/drop columns.

1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta

Co-authored-by: yixiutt <yixiu@selectdb.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>

[Feature](schema change) fix write and add regression test (#69)

Co-authored-by: yixiutt <yixiu@selectdb.com>

[schema change] be ssupport that delete use newest schema

add delete regression test

fix regression case (#107)

tmp

[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.

[feature](schema change) fe olapTable maxUniqueId write in disk.

[feature](schema change) add rpc iface for sc add column.

[feature](schema change) add columnsDesc to TPushReq for ligtht sc.

resolve the deadlock when schema change (#124)

fix columns from fe don't has bitmap_index flag (#134)

add update/delete case

construct MATERIALIZED schema from origin schema when insert

fix not vectorized compaction coredump

use segment cache

choose newest schema by schema version when compaction (#182)

[bugfix](schema change) fix ligth schema change problem.

[feature](schema change) light schema change add alter job. (#1)

fix be ut

[bug] (schema change) unique drop key column should not light schema
change

[feature](schema change) add schema change regression-test.

fix regression test

[bugfix](schema change) fix multi alter clauses for light schema change. (#2)

[bugfix](schema change) fix multi clauses calculate column unique id (#3)

modify PushTask process (#217)

[Bugfix](schema change) fix jobId replay cause bdbje exception.

[bug](schema change) fix max col unique id repeatitive. (#232)

[optimize](schema change) modify pendingMaxColUniqueId generate rule.

fix compaction error
* fix be ut

* fix snapshot load core

fix unique_id error (#278)

[refact](fe) remove redundant code for light schema change. (#4)

[refact](fe) remove redundant code for light schema change. (#4)

format fe core

format be core

fix be ut

modify fe meta version

fix rebase error

flush schema into rowset_meta in old table

[refactor](schema change) refact fe light schema change. (#5)

delete the change of schemahash and support get max version schema

* modify for review

* fix be ut

* fix schema change test
2022-07-12 19:41:06 +08:00
ca94867b4e [Feature-wip] add date v2 type (#9916) 2022-06-26 16:07:56 +08:00
Pxl
5805f8077f [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003) 2022-06-16 10:50:08 +08:00
d247d06180 [Improvement] refine codes in TabletReader (#10042)
fast merge: just remove some code have no affect to other components
2022-06-10 09:12:33 +08:00
0cba6b7d95 [Bug][Fix] One Rowset have same key output in unique table (#9858)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-31 12:29:16 +08:00
b34ed43ec9 [feature-wip] (memory tracker) (step6, End) Fix some details (#9301)
1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track.
2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability.
3. More powerful MemTracker debugging capabilities.
4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%.
5. Fix some other details.
2022-05-10 18:17:09 +08:00
51db78d375 [refactor] modify all OLAP_LOG_WARNING to LOG(WARNING) (#9473)
Co-authored-by: BePPPower <fangtiewei@selectdb.com>
2022-05-10 09:25:25 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
d330bc3806 [Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709) (#9280)
Implement vectorized stream load.
Added fe configuration option `enable_vectorized_load` to enable vectorized stream load.

    Co-authored-by: tengjp@outlook.com
    Co-authored-by: mrhhsg@gmail.com
    Co-authored-by: minghong.zhou@163.com
    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
2022-04-29 09:50:51 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
290366787c [refactor] refactor code, replace some file with stl libs (#8759)
1. replace ConditionVariables with std::condition_variable
2. repalace Mutex with std::mutex
3. repalce MonoTime with std::chrono
2022-04-13 09:55:29 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
41a15ccd45 [fix](vectorized) Agg/Unique not null column outer join coredump (#8461) 2022-03-14 10:52:17 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00