[improvement](vectorized) Make bloom filter predicate run short-circuit logic (#8484)

The current BloomFilter runs vectorization predicate evaluate, but `evaluate_vec` interface is not implemented, so the RuntimeFilter does not play a role after it is pushed down to the storage layer.
And BF predicate computation cannot be automatically vectorized, thus making BloomFilter run short-circuit logic.

For SSB Q2.1,`enable_storage_vectorization = true;`
```
test before impl:
- Total: 36s164ms
- RowsVectorPredFiltered: 0
- RealRuntimeFilterType: bloomfilter
- HasPushDownToEngine: true

test after impl:
- Total: 2s345ms
- RowsVectorPredFiltered: 595.247102M (595247102)
- RealRuntimeFilterType: bloomfilter
- HasPushDownToEngine: true
```
This commit is contained in:
ZenoYang
2022-03-17 10:07:30 +08:00
committed by GitHub
parent 30d8089b2f
commit b537e06ecd
3 changed files with 7 additions and 1 deletions

View File

@ -65,6 +65,8 @@ public:
void evaluate(vectorized::IColumn& column, uint16_t* sel, uint16_t* size) const override;
bool is_bloom_filter_predicate() override { return true; }
private:
std::shared_ptr<IBloomFilterFuncBase> _filter;
SpecificFilter* _specific_filter; // owned by _filter

View File

@ -71,6 +71,8 @@ public:
virtual bool is_in_predicate() { return false; }
virtual bool is_bloom_filter_predicate() { return false; }
protected:
uint32_t _column_id;
bool _opposite;

View File

@ -613,7 +613,9 @@ void SegmentIterator::_vec_init_lazy_materialization() {
_is_pred_column[cid] = true;
pred_column_ids.insert(cid);
if (type == OLAP_FIELD_TYPE_VARCHAR || type == OLAP_FIELD_TYPE_CHAR || type == OLAP_FIELD_TYPE_STRING || predicate->is_in_predicate()) {
if (type == OLAP_FIELD_TYPE_VARCHAR || type == OLAP_FIELD_TYPE_CHAR
|| type == OLAP_FIELD_TYPE_STRING || predicate->is_in_predicate()
|| predicate->is_bloom_filter_predicate()) {
short_cir_pred_col_id_set.insert(cid);
_short_cir_eval_predicate.push_back(predicate);
_is_all_column_basic_type = false;