Files
doris/be
ZhangYu0123 b627088e8c [Optimization](String) Optimize q20 q21 q22 q23 LIKE_SUBSTRING (like '%xxx%') (#18309)
Optimize q20, q21, q22, q23 LIKE_SUBSTRING (like '%xxxx%'). Idea is from clickhouse stringsearcher:

Stringsearcher is about 10%~20% faster than volnitsky algorithm when needle size is less than 10 using two chars at beginning search in SIMD .
Stringsearcher is faster than volnitsky algorithm, when needle size is less than 21.
The changes are as follows:

Using first two chars of needle at beginning search. We can compare two chars of needle and [n:n+17) chars in haystack in SIMD in one loop. Filter efficiency will be higher.
When env support SIMD, we use stringsearcher.
Test result in clickbench:

q20 is about 15% up.
q20: SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
q21, q22 is about 1%~5% up.
q21: SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q22: SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q23 is about 30%~40% up and not stable.
q23: SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;
2023-04-03 18:09:15 +08:00
..