when enable pipeline to true, and set instances > 1
because all scan nodes share the scanners, maybe get the profile of scan node is all empty
now show all the scan nodes and remove some infos those that _num_scanners->value() == 0
In #17976, we introduced small fix container to optimize the in expr. This PR will change small fix container size of In set to 8, which has better performance when size > 8 by the perf test.
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.
Optimize mem limit exceeded log printing
Optimize compilation time
Use SIMD stringsearcher and SIMD memcmp optimze split_by_string and substring_index function.
split_by_string function has 32%~540% up
substring_index function has 22%~46% up
Performance difference depends on the needle size and whether the needle is constant param. And the longer the needle, the more performance improvement
When the be_exec_version is less than 2, murmurhash will still be used, otherwise crc32 will be used. When the be_exec_version is upgraded to 2, please remove.
1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up #18397
3. Add KW_HOSTNAME to keywords region, follow up #17329
4. Fix tvf not working with pipeline engine, follow up #18376
Sometimes, `show load profile` will only show part of the insert opertion's profile.
This is because we assume that for all load operation(including insert), there is only one fragment in the plan.
But actually, there will be more than 1 fragment in plan. eg:
`insert into tbl1 select * from tbl1 limit 1` will have 2 fragments.
This PR mainly changes:
1. modify the `show load profile`
Before: `show load profile "/queryid/taskid/instanceid";`
After: `show load profile "/queryid/taskid/fragmentid/instanceid";`
2. Modify the display of `ReadColumns` in OlapScanNode
Because for wide table, the line of `ReadColumns` may be too long for show in profile.
So I wrap it and each line contains at most 10 columns names.
3. Fix tvf not working with pipeline engine, follow up #18376
Optimize concat function 29% up by memcpy_small_allow_read_write_overflow15.
Optimize string functions list: concat, convert_to, mask, initcap, lower, upper.
concat function has 29% up:
Optimize constant empty string compare:
(1) When the constant empy string '' (size is 0), we can compare offsets in SIMD directly.
q10: SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
q11: SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
q12: SELECT SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q13: SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10;
q14: SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
Issue Number: close #xxx