Commit Graph

53 Commits

Author SHA1 Message Date
e1090d6a63 [Fix](column predicate) seperate CHAR primitive type for column predicate (#23581) 2023-09-01 09:41:53 +08:00
26e78ab418 [fix](compaction)none vertical compaction should also use _unique_key_next_block function to read block (#22614) 2023-08-05 00:24:57 +08:00
43d783ae21 [fix](vertical compaction) compaction block reader should return error when reading next block failed (#22431) 2023-08-01 14:09:18 +08:00
103c473b96 [Bug](pipeline) fix pipeline shared scan + topn optimization (#21940) 2023-07-25 12:48:27 +08:00
Pxl
ca71048f7f [Chore](status) avoid empty error msg on status (#21454)
avoid empty error msg on status
2023-07-11 13:48:16 +08:00
Pxl
bbb3af6ce6 [Feature](agg_state) support agg_state combinators (#19969)
support agg_state combinators state/merge/union
2023-05-29 13:07:29 +08:00
Pxl
15a7420661 [Chore](ub) fix some undefined behaviors (#19986)
/home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType'

/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment

/home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0

/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null
2023-05-26 14:08:40 +08:00
92bf485abd [Bug] Fix doris pipeline shared scan and top n opt (#19599) 2023-05-15 10:00:44 +08:00
Pxl
dfad7b6b38 [Feature](generic-aggregation) some prowork of generic aggregation (#19343)
some prowork of generic aggregation
2023-05-09 21:42:21 +08:00
e08de52ee7 [chore](compile) using PCH for compilation acceleration under clang (#19303) 2023-05-08 19:51:06 +08:00
f23c93b3c6 [fix](memory) Fix AggFunc memory leak due to incorrect destroy (#19126) 2023-04-27 14:58:32 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
c29582bd57 [pipeline](split by segment)support segment split by scanner (#17738)
* support segment split by scanner

* change code by cr
2023-03-16 15:25:52 +08:00
4692d6764c [refactor](remove string val) remove string val structure, it is same with string ref (#17461)
remove stringval, decimalv2val, bigintval
2023-03-08 10:42:20 +08:00
Pxl
d8f0ca7108 [Chore](schema change) remove some unused code in schema change (#17459)
remove some unused code in schema change.
remove some row-based config and code.
2023-03-07 09:18:34 +08:00
9b8c91e18c [improvement](rowset reader) fix possible memleak (#16680)
* [improvement](rowset reader) fix possible memleak

* fix be UT
2023-02-15 11:13:31 +08:00
171ae2892f [improvement](batch size) pass batch size of exec engine to storage engine (#16614)
Currently batch_size is not passed on to SegmentIterator, the SegmentIterator uses the hard coded value 4096 - 32 as the max row count of a block.


* fix bug
2023-02-11 09:01:44 +08:00
737c73dcf0 [Improvement](topn) order by key topn query optimization (#15663) 2023-02-06 15:36:05 +08:00
4b6a4b3cf7 [refactor](remove unused code) Remove unused mempool declare or function params (#16222)
* Remove unused mempool declare or function params

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-30 13:03:18 +08:00
0057243f54 [improvement](reader) use union merge when rowset are noneoverlapping (#15749) 2023-01-16 21:53:18 +08:00
40c53931e5 [fix](vec) VMergeIterator add key same label for agg table (#14722) 2023-01-02 22:54:21 +08:00
1cc79510c9 [enhancement](compaction) add delete_sign_index check before filter delete (#15190) 2022-12-22 09:26:37 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
37e4a1769d [fix](sequence) fix that update table core dump with sequence column (#13847)
* [fix](sequence) fix that update table core dump with sequence column

* update
2022-11-03 09:02:21 +08:00
f329d33666 [chore](fix) Fix some spell errors in be's comments. #13452 2022-10-20 08:56:01 +08:00
f7e3ca29b5 [Opt](Vectorized) Support push down no grouping agg (#12803)
Support push down no grouping agg
2022-09-23 18:29:54 +08:00
e01986b8b9 [feature](light-schema-change) fix light-schema-change and add more cases (#12160)
Fix _delete_sign_idx and _seq_col_idx when append_column or build_schema when load.
Tablet schema cache support recycle when schema sptr use count equals 1.
Add a http interface for flink-connector to sync ddl.
Improve tablet->tablet_schema() by max_version_schema.
2022-09-17 11:29:36 +08:00
ccff3f5711 [bugfix](light weight schema change) support delete condition in schema change (#11869)
* [bugfix](light weight schema change) support delete condition in schema change


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-26 11:45:55 +08:00
1056a6d8c7 [bug](compaction) fix bug of coredump of filter delete chose wrong filter column (#12002)
* [bug](compaction) fix bug of coredump of filter delete chose wrong filter column

* clang format
2022-08-23 21:52:11 +08:00
05da3d947f [feature-wip](new-scan) add scanner scheduling framework (#11582)
There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including:

Runtime filter
Predicate pushdown
Scanner generation and scheduling
So I intend to unify the common logic of all ScanNodes.
Different data sources only need to implement different Scanners for data access.
So that the future optimization for scan can be applied to the scan of all data sources,
while also reducing the code duplication.

This PR mainly adds 4 new class:

VScanner
All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods.

VScanNode
The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling.

ScannerContext
ScannerContext is responsible for recording the execution status
of a group of Scanners corresponding to a ScanNode.
Including how many scanners are being scheduled, and maintaining
a producer-consumer blocks queue between scanners and scan nodes.

ScannerContext is also the scheduling unit of ScannerScheduler.
ScannerScheduler schedules a ScannerContext at a time,
and submits the Scanners to the scanner thread pool for data scanning.

ScannerScheduler
Unified responsible for all Scanner scheduling tasks

Test:
This work is still in progress and default is disabled.
I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data.
The QPS can reach about 9000.
I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready.
Co-authored-by: morningman <morningman@apache.org>
2022-08-23 08:45:18 +08:00
b55195bd80 [FixAssist](compaction) add DCHECK in BlockReader::_unique_key_next_block to reason problem (#11951) 2022-08-22 22:33:31 +08:00
01bd7f224b [bugifx](compaction) fix filter_delete if schema has sequence column (#11909)
introduced in #11721. Use last column as delete sign, but if sequence column
exist, it's wrong.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-19 14:56:06 +08:00
d505d1a5ae [Vectorized](compaction) filter delete data in base compaction (#11721)
* [Vectorized](compaction) filter delete data in base compaction


Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-08-18 14:22:59 +08:00
0291f84a9e [fix](like-predicate) Add missing functions in LikeColumnPredicate (#11631) 2022-08-10 15:03:14 +08:00
f9b151744d optimize topn query if order by columns is prefix of sort keys of table (#10694)
* [feature](planner): push limit to olapscan when meet sort.

* if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key
and read_orderby_key_reverse for olap scanner

* There is a common query pattern to find latest time serials data.
 eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100

If the ORDER BY columns is the prefix of the sort key of table, it can
be greatly optimized to read much fewer data instead of read all data
between t1 and t2.

By leveraging the same order of ORDER BY columns and sort key of table,
just read the LIMIT N rows for each related segment and merge N rows.

1. set read_orderby_key to true for read_params and _reader_context
   if olap_scan_node's sort info is set.
2. set read_orderby_key_reverse to true for read_params and _reader_context
   if is_asc_order is false.
3. rowset reader force merge read segments if read_orderby_key is true.
4. block reader and tablet reader force merge read rowsets if read_orderby_key is true.

5. for ORDER BY DESC, read and compare in reverse order
5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and
    reverse the result block before return to caller.
5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return
    opposite result for _is_reverse order in its compare function.

Co-authored-by: jackwener <jakevingoo@gmail.com>
2022-08-09 09:08:44 +08:00
1e6a3610a7 [feature-wip](unique-key-merge-on-write) optimize rowid conversion and add ut (#11541) 2022-08-08 10:41:44 +08:00
d4fb27125a [feature-wip](unique-key-merge-on-write) row id conversion for compaction (#11149) 2022-07-27 16:32:13 +08:00
93b0e002d1 [feature-wip](unique-key-merge-on-write) add delete bitmap in read path, DSIP-018[2/3] (#11136) 2022-07-27 14:18:21 +08:00
06ecf8bdc5 [Bug](vec compaction) fix compaction core with sequence column (#10845)
Block reader ignores sequence column but rowset writer should write this column, will core in set_source_column row_num DCHECK. 

Sequence column works across rowsets, so compaction can not discard it and should keeps it altime. 

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-07-15 20:39:50 +08:00
486cf0ebd4 [Feature] Lightweight schema change of add/drop column (#10136)
* [Schema Change] support fast add/drop column  (#49)

* [feature](schema-change) support fast schema change. coauthor: yixiutt

* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang

* [feature](schema change) schema change optimize for add/drop columns.

1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta

Co-authored-by: yixiutt <yixiu@selectdb.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>

[Feature](schema change) fix write and add regression test (#69)

Co-authored-by: yixiutt <yixiu@selectdb.com>

[schema change] be ssupport that delete use newest schema

add delete regression test

fix regression case (#107)

tmp

[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.

[feature](schema change) fe olapTable maxUniqueId write in disk.

[feature](schema change) add rpc iface for sc add column.

[feature](schema change) add columnsDesc to TPushReq for ligtht sc.

resolve the deadlock when schema change (#124)

fix columns from fe don't has bitmap_index flag (#134)

add update/delete case

construct MATERIALIZED schema from origin schema when insert

fix not vectorized compaction coredump

use segment cache

choose newest schema by schema version when compaction (#182)

[bugfix](schema change) fix ligth schema change problem.

[feature](schema change) light schema change add alter job. (#1)

fix be ut

[bug] (schema change) unique drop key column should not light schema
change

[feature](schema change) add schema change regression-test.

fix regression test

[bugfix](schema change) fix multi alter clauses for light schema change. (#2)

[bugfix](schema change) fix multi clauses calculate column unique id (#3)

modify PushTask process (#217)

[Bugfix](schema change) fix jobId replay cause bdbje exception.

[bug](schema change) fix max col unique id repeatitive. (#232)

[optimize](schema change) modify pendingMaxColUniqueId generate rule.

fix compaction error
* fix be ut

* fix snapshot load core

fix unique_id error (#278)

[refact](fe) remove redundant code for light schema change. (#4)

[refact](fe) remove redundant code for light schema change. (#4)

format fe core

format be core

fix be ut

modify fe meta version

fix rebase error

flush schema into rowset_meta in old table

[refactor](schema change) refact fe light schema change. (#5)

delete the change of schemahash and support get max version schema

* modify for review

* fix be ut

* fix schema change test
2022-07-12 19:41:06 +08:00
7a85e8d525 [bug](be) fix be block_reader.cc::_update_agg_value() mem leak.(#10216) (#10218) 2022-06-17 21:25:52 +08:00
Pxl
5805f8077f [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003) 2022-06-16 10:50:08 +08:00
Pxl
f2aa5f32b8 [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change (#9811)
Some pre-refactorings or interface additions for schema change
2022-06-07 15:04:57 +08:00
7c2db79b73 [BUG] fix bug for vectorized compaction and some storage vectorization bug (#9610) 2022-05-19 16:35:15 +08:00
4cd579b155 [refactor] Check status precise_code instead of construct OLAPInternalError (#9514)
* check status precise_code instead of construct OLAPInternalError
* move is_io_error to Status
2022-05-12 15:39:29 +08:00
49a0cd1925 [fix](compaction) fix bug for vectorized compaction (#9344)
1. add a BE config to switch vectorized compaction
2. Fix vectorized compaction bug that row statistic is not right.
2022-05-03 17:31:40 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
d330bc3806 [Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709) (#9280)
Implement vectorized stream load.
Added fe configuration option `enable_vectorized_load` to enable vectorized stream load.

    Co-authored-by: tengjp@outlook.com
    Co-authored-by: mrhhsg@gmail.com
    Co-authored-by: minghong.zhou@163.com
    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
2022-04-29 09:50:51 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
46e1b05490 [refactor] Fix some code comments typo and cleanup unused include (#8684) 2022-03-30 09:51:48 +08:00