[Enhancement](topn) support two phase read for topn query (#15642)

This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
This commit is contained in:
lihangyu
2023-01-19 10:01:33 +08:00
committed by GitHub
parent c7a72436e6
commit 3894de49d2
53 changed files with 829 additions and 33 deletions

View File

@ -54,8 +54,12 @@ Block::Block(const ColumnsWithTypeAndName& data_) : data {data_} {
initialize_index_by_name();
}
Block::Block(const std::vector<SlotDescriptor*>& slots, size_t block_size) {
Block::Block(const std::vector<SlotDescriptor*>& slots, size_t block_size,
bool ignore_trivial_slot) {
for (const auto slot_desc : slots) {
if (ignore_trivial_slot && !slot_desc->need_materialize()) {
continue;
}
auto column_ptr = slot_desc->get_empty_mutable_column();
column_ptr->reserve(block_size);
insert(ColumnWithTypeAndName(std::move(column_ptr), slot_desc->get_data_type_ptr(),
@ -919,9 +923,13 @@ void Block::deep_copy_slot(void* dst, MemPool* pool, const doris::TypeDescriptor
}
}
MutableBlock::MutableBlock(const std::vector<TupleDescriptor*>& tuple_descs, int reserve_size) {
MutableBlock::MutableBlock(const std::vector<TupleDescriptor*>& tuple_descs, int reserve_size,
bool ignore_trivial_slot) {
for (auto tuple_desc : tuple_descs) {
for (auto slot_desc : tuple_desc->slots()) {
if (ignore_trivial_slot && !slot_desc->need_materialize()) {
continue;
}
_data_types.emplace_back(slot_desc->get_data_type_ptr());
_columns.emplace_back(_data_types.back()->create_column());
if (reserve_size != 0) {