[Improve](hash-fun)improve nested hash with range #21699

Issue Number: close #xxx

when cal array hash, elem size is not need to seed hash
hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size),
                                                   sizeof(elem_size), hash);
but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to
make difference
2. use range for one hash value to avoid virtual function call in loop.
which double the performance. I make it in ut

column: array[int64]
50 rows , and single array has 10w elements
This commit is contained in:
amory
2023-07-11 14:40:40 +08:00
committed by GitHub
parent cb69349873
commit d0eb4d7da3
15 changed files with 371 additions and 97 deletions

View File

@ -196,15 +196,17 @@ void ColumnStruct::update_hashes_with_value(std::vector<SipHash>& hashes,
SIP_HASHES_FUNCTION_COLUMN_IMPL();
}
void ColumnStruct::update_xxHash_with_value(size_t n, uint64_t& hash) const {
void ColumnStruct::update_xxHash_with_value(size_t start, size_t end, uint64_t& hash,
const uint8_t* __restrict null_data) const {
for (const auto& column : columns) {
column->update_xxHash_with_value(n, hash);
column->update_xxHash_with_value(start, end, hash, nullptr);
}
}
void ColumnStruct::update_crc_with_value(size_t n, uint64_t& crc) const {
void ColumnStruct::update_crc_with_value(size_t start, size_t end, uint64_t& hash,
const uint8_t* __restrict null_data) const {
for (const auto& column : columns) {
column->update_crc_with_value(n, crc);
column->update_crc_with_value(start, end, hash, nullptr);
}
}