Commit Graph

3597 Commits

Author SHA1 Message Date
3894de49d2 [Enhancement](topn) support two phase read for topn query (#15642)
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
2023-01-19 10:01:33 +08:00
c43edbdfea [bug](cooldown)fix bug for single cooldown (#16040)
* fix bug for single cooldown

* fix bug for single cooldown
2023-01-19 08:03:32 +08:00
ee76b9796c [Bug](regresstest) BE Crash in DEBUG mode run regress test (#16042) 2023-01-18 17:58:16 +08:00
95c91fab2e [refactor](vec) delete non-vec runtime filter (#16016)
* [refactor](vec) delete non-vec runtime filter

* update
2023-01-18 17:49:20 +08:00
bac2adfc74 [refractor](schema) refractor schema::get_predicate_column_ptr (#16043)
* refractor Schema::get_predicate_column_ptr

* update code format

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-01-18 17:47:37 +08:00
d257059e6b [refactor](remove hadoop dpp) remove hadoop dpp code since it is not used (#16009) 2023-01-18 15:01:04 +08:00
42b5d17fa1 [refactor](remove non vec) remove column block and column view (#16022)
* [refactor](remove non vec) remove column block and column view and column vectorized batch

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-18 12:40:53 +08:00
b2fe385742 [refractor](schema) refractor function Schema::get_column_by_field to make it simple #16027
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-01-18 11:11:16 +08:00
e579530c99 [Feature-WIP](inverted index) support use inverted index searcher cache (#16003)
use inverted index searcher cache to improve query performance

dependency pr: #14211 #15807 #15823
2023-01-18 09:30:55 +08:00
3bff5ebf9a [fix](DOE) only return first batch data in ES 8.x (#16025)
Do not use terminate_after and size together in scroll request of ES 8.x.
2023-01-18 09:28:34 +08:00
31cc99964c [Feature-WIP](inverted index)(bkd) bdk index'reader implementation which in inverted index using for numeric types (#15994)
Step3 of DSIP-023: Add inverted index for full text search
implementation of bkd index's reader which in inverted index using for numeric types
dependency pr: #14211 #15807 #15823
2023-01-18 09:24:19 +08:00
e6a5d3375e [Feature-WIP](inverted index) add chinese analyzer for inverted index reader (#15998)
add chinese analyzer for inverted index reader
dependency pr: #14211 #15807 #15823
2023-01-17 20:20:40 +08:00
6be0cc252a [fix](BrokerFileReader) fix Compile error #16018 2023-01-17 19:53:06 +08:00
95397ff05d [refactor](array) remove depandancy of ColumnBlock, ColumnBlockView (#16002)
change to vectorized::MutableColumnPtr
2023-01-17 19:16:16 +08:00
d5a3e8df3a [Exec](opt) Opt the vexplode_split function performance (#15945) 2023-01-17 19:02:57 +08:00
bbdf40b6bd [Enhencement](Push Handle) use VParquetScanner in PushHandle (#15980)
* use VParquetScanner in PushHadnle

* delete ParquetScanner
2023-01-17 16:21:04 +08:00
151ae71761 [fix](be)fix bug of VSetOperationNode::release_resource (#15997)
should call "ExecNode::release_resource(state)" if child class override the parent's method
2023-01-17 16:16:25 +08:00
d062ca2944 [refactor](vectorized) remove unnecessary vectorization check (#15984) 2023-01-17 12:21:46 +08:00
7d34512501 [Bug](pipeline) Fix DCHECK failure (#15928) 2023-01-17 12:01:20 +08:00
9f106161a7 [Bug](join) Fix null aware anti join error in fuzzy mode (#15987) 2023-01-17 11:32:16 +08:00
9755358787 [fix](brokerload) fix be core dump casued by broker load (#15874) 2023-01-17 11:21:13 +08:00
0ab0479633 [Compile](lzo) fix lzo decompressor compiler error (#15956) 2023-01-17 09:56:07 +08:00
b1caa68706 [Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823)
Issue Number: Step2 of DSIP-023: Add inverted index for full text search
implementation of inverted index reader

dependency pr: #14211 #15807 #15821
2023-01-17 09:13:56 +08:00
0057243f54 [improvement](reader) use union merge when rowset are noneoverlapping (#15749) 2023-01-16 21:53:18 +08:00
65a4c8b163 [refactor] refactor segment writer (#15705)
Co-authored-by: zhoubintao <1229701101@qq.com>
2023-01-16 21:50:21 +08:00
5521c7a236 [fix](load) fix that tablet channel doesn't set received rows for verify the number of rows (#15961) 2023-01-16 19:46:59 +08:00
bdec4d5ac2 [enhancement](profile) add read columns to scanner profile (#15902) 2023-01-16 19:32:46 +08:00
97fcad76f8 [enhancement](memtracker) Improve readability (#15716) 2023-01-16 16:30:35 +08:00
b7f43441e3 [enhancement](load) change the publish version log to VLOG_CRITICAL (#15673) 2023-01-16 16:22:33 +08:00
63d48564ed [fix](datetimev2) fix datetimev2 error with T (#15915)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-01-16 15:30:48 +08:00
Pxl
81bab55d43 [Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903) 2023-01-16 11:21:28 +08:00
151fdc224e [Fix](inverted index) fix compilation error for inverted index compound directory (#15946)
fix compilation error for inverted index compound directory

```
be/src/olap/rowset/segment_v2/inverted_index_compound_directory.cpp:249:32: error: comparison of unsigned expression in '< 0' is always false [-Werror=type-limits]
  249 |         if (h->_reader->size() < 0) {
      |             ~~~~~~~~~~~~~~~~~~~^~~
```
2023-01-16 08:59:55 +08:00
Pxl
b727033906 [Chore](build) enable -Wextra and remove some -Wno (#15760)
enable -Wextra and remove some -Wno
2023-01-15 10:40:35 +08:00
5af7bcaa55 [Bug](decimalv3) Fix missing precision and scale in predicates (#15930) 2023-01-15 00:01:48 +08:00
58c520dbfd [Feature](remote) Cooldown cold data to object storage only one replica (#15832) 2023-01-14 23:58:00 +08:00
0206e0bc57 [Feature](inverted index) implementation of inverted index writer for numeric types, using bkd index (#15918)
Step3 of DSIP-023: Add inverted index for full text search
implementation of inverted index writer for numeric types, using bkd index
dependency pr: #14207 #15807 #15821
2023-01-14 21:06:51 +08:00
98c74f9ab8 [improvement](signal) add tid during core dump,the tid is equal to tid in be.INFO (#15893)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-14 18:40:02 +08:00
84d6938a73 [Bug](pipeline) Fix BE crash caused by pipeline (#15890)
* [Bug](pipeline) Fix BE crash caused by pipeline

* update
2023-01-14 18:37:19 +08:00
c4475a8dbc [Enhencement](jdbc scanner) add profile for jdbc scanner (#15914) 2023-01-14 10:28:59 +08:00
313e14d220 [Bugfix] (ROLLUP) fix the coredump when add rollup by link schema change (#15654)
Because of the rollup has the same keys and the keys's order is same, BE will do linked schema change. The base tablet's segments will link to the new rollup tablet. But the unique id from the base tablet is starting from 0 and as the rollup tablet also. In this case, the unique id 4 in the base table is column 'city', but in the rollup tablet is 'cost'. It will decode the varcode page to bigint page so that be coredump. It needs to be rejected.

I think that if a rollup add by link schema change, it means this rollup is redundant. It brings no additional revenue and wastes storage space. So It needs to be rejected.
2023-01-14 10:20:07 +08:00
d8990522fb [conf](compaction) enable vertical_compaction ordered_data_compaction (#14945) 2023-01-13 23:12:42 +08:00
ecb5aea182 [Feature-WIP](inverted index) inverted index writer's implementation (#15821) 2023-01-13 21:30:44 +08:00
514de605b6 [Bug](predicate) add double predicate creator (#15762)
Add one double predicator the same as integer predicate creator.
2023-01-13 18:34:09 +08:00
049f8ad2f9 [Bug](sort)fix merge sorter might div zero when block bytes less than block rows (#15859)
If block bytes are bigger than the corresponding block's rows, then the avg_size_per_row would be zero. Which would end up diving zero in the following logic.
2023-01-13 18:33:40 +08:00
1489e3cfbf [Fix](file system) Make the constructor of XxxFileSystem a private method (#15889)
Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem.
To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.
2023-01-13 15:32:16 +08:00
34bb9cd5d3 [fix](parquet-reader) fix coredump when load datatime data to doris from parquet (#15794)
`date_time_v2` will check scale when constructed datatimev2:
```
LOG(FATAL) << fmt::format("Scale {} is out of bounds", scale);
```

This [PR](https://github.com/apache/doris/pull/15510) has fixed this issue, but parquet does not use constructor to create `TypeDescriptor`, leading the `scale = -1` when reading datetimev2 data.
2023-01-13 11:51:11 +08:00
b1fb1277dd [fix](bitmap) fix bitmap iterator comparison error (#15779)
Fix the bug that bitmap.begin() == bitmap.end() is always true when the bitmap contains a single value.
2023-01-13 11:37:07 +08:00
9468711f9f [Bug](join) fix bug null aware left anti join not correct result (#15841) 2023-01-13 10:18:05 +08:00
688a0bb96a [feature](multi-catalog) support clickhouse jdbc catalog (#15780) 2023-01-13 10:07:22 +08:00
16862d9b43 [refactor](remove unused code) remove buffer pool and disk io mgr (#15853)
* [refactor](remove buffer pool and disk io mgr) remove unused code


Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-13 09:42:58 +08:00