[improvement](hll) Optimize Hyperloglog (#8829)

In meituan, pr #6625 was revert due to the oom probleam.
currently, we are trying to modify the old hyperloglog, based on pr #8555, we did some works.
via some test, we find it better than old hll, and better than apache:master hll.

Changes summary:

- use SIMD max tp speed up heavy function _merge_registers
- use phmap::flat_hash_set rather than std::set
- replace std::max
- other small changes
This commit is contained in:
zbtzbtzbt
2022-04-08 09:06:08 +08:00
committed by GitHub
parent b88bf73ca7
commit 0b98d78664
3 changed files with 75 additions and 192 deletions

View File

@ -1108,11 +1108,9 @@ void AggregateFunctions::hll_merge(FunctionContext* ctx, const StringVal& src, S
DCHECK(!src.is_null);
DCHECK_EQ(dst->len, std::pow(2, HLL_COLUMN_PRECISION));
DCHECK_EQ(src.len, std::pow(2, HLL_COLUMN_PRECISION));
auto dp = dst->ptr;
auto sp = src.ptr;
for (int i = 0; i < src.len; ++i) {
dp[i] = (dp[i] < sp[i] ? sp[i] : dp[i]);
dst->ptr[i] = (dst->ptr[i] < src.ptr[i] ? src.ptr[i] : dst->ptr[i]);
}
}