Commit Graph

4260 Commits

Author SHA1 Message Date
042cf2a1bf [enhancement](ut) add ut for buffered reader (#18667) 2023-04-16 18:08:22 +08:00
69ae14f228 [Bug](pipeline) regression heap use after free (#18701) 2023-04-16 16:22:41 +08:00
bcff3710ca [fix] set execution timeout for brokerload and use query timeout when… (#18694)
We should use query timeout if execution timeout is not set to upgrade.
2023-04-15 20:41:04 +08:00
cc4778a271 [Fix](orc-reader) Check hasNulls() firstly when use notNull data in ColumnVectorBatch. #18674 2023-04-15 19:48:31 +08:00
Pxl
975b373896 [Chore](thrift) add some check on client cache && remove some unused code && catch st… #18683 2023-04-15 17:47:51 +08:00
98b8bef05b [bugfix](inverted index) fix inverted index to support NULL value filter (#18302) 2023-04-15 13:20:26 +08:00
d4928c60c8 [vectorized](profile) fix pipeline profile can't get result under more instances (#18525)
when enable pipeline to true, and set instances > 1
because all scan nodes share the scanners, maybe get the profile of scan node is all empty
now show all the scan nodes and remove some infos those that _num_scanners->value() == 0
2023-04-14 18:20:19 +08:00
4cde3d4f21 [Enhancement](Expr) Change small fix container size of In set to 8. (#18492)
In #17976, we introduced small fix container to optimize the in expr. This PR will change small fix container size of In set to 8, which has better performance when size > 8 by the perf test.
2023-04-14 18:19:45 +08:00
81799d614e [feature-wip](resource-group) support resource group interface in be. (#18588) 2023-04-14 14:00:49 +08:00
c704351273 [enhancement](memory) Refactor memory limit exceeded behavior (#18590)
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
2023-04-14 10:42:35 +08:00
8751f08d5a [bugfix](GEO)fix precision problem (#18642) 2023-04-14 10:39:19 +08:00
56d84739c1 [Opt](pipeline) opt the scanner ctx schedule in pipeline engine (#18545) 2023-04-14 09:59:03 +08:00
2294fb46a5 [refactor](minor) update scan concurrency for pipeline (#18650) 2023-04-14 09:45:12 +08:00
6c0af24e9d [Improve](simdjson reader) support UTF-8 unicode (with BOM) (#18585) 2023-04-13 21:58:44 +08:00
281ceee3cc [feature-wip](resource-group) Support resource group tvf (#18519)
related: #18098
2023-04-13 20:11:20 +08:00
2519931a04 [vectorized](function) support time_to_sec function (#18354)
support time_to_sec function
2023-04-13 19:31:12 +08:00
40a352959d [Pipeline](exec) Support shared scan in colo agg (#18457) 2023-04-13 17:25:41 +08:00
2f64a8b387 [feature](GEO)Support read/write WKB/EWKB to gis types (#18526)
Support mutual conversion from wkb and gis types.also compatible with EWKB format
https://cwiki.apache.org/confluence/display/DORIS/DSIP-033%3A+More+GEO+functions
2023-04-13 16:25:18 +08:00
2ae0bb7f13 [minor](test) remove unused function to improve test coverage (#18598) 2023-04-13 15:30:53 +08:00
726402b53b [bugfix](topn) fix topn runtime predicate crash in short circuit evaluate for types like string decimal (#18409) 2023-04-13 11:10:59 +08:00
4335c9998f [chore](ARM) Add some vectorization compatibility code on aarch64 (#18553)
update sse2noen to support more sse code on arm cpus
2023-04-13 10:15:33 +08:00
6d91635c5b [fix](json_reader) Do not increase the value of read_rows for empty line (#18611)
If read an empty row the row num++, the row num will be larger than actual column size, it will core.
2023-04-13 10:08:11 +08:00
3c3364ba27 [chore](row store) ignore serialize block to row column if no row store column (#18601) 2023-04-13 10:02:33 +08:00
d57371da13 [feature](struct-type) support basic struct constructor function (#18190)
This commit will support struct and named_struct function.
2023-04-13 09:18:00 +08:00
34c946bb99 [Bug](date) fix regression test test_date_function (#18564) 2023-04-12 14:16:30 +08:00
ecb22ad35e [chore](proto) modify the order of store_row_column and is_dynamic_schema to be compatible with branch-1.2-lts (#18232) 2023-04-12 11:59:56 +08:00
43392918cd [Optimization](functions)Optimize function call for const columns. (#18310) 2023-04-12 11:11:01 +08:00
49a9956986 [Enhencement](Profile) add profile info for jdbc scanner #18569 2023-04-12 10:47:21 +08:00
2209b714d1 [chore](orc) Update orc lib to third party lib(1.8.3) using git submodule. (#18531) 2023-04-12 10:37:50 +08:00
cbe2e138c3 [Enhancement](HttpServer) Support https in be (#17034)
* [Enhancement](HttpServer) Support https in be
2023-04-12 10:27:07 +08:00
1238f6de97 [bug](array) fix be core in array_with_constant/array_repeat function when the first argument is nullable (#18404)
fix be core in array_with_constant/array_repeat function when the first argument is nullable
2023-04-11 19:46:41 +08:00
5aac346ca4 [minor](refactor) delete unused codes (#18540) 2023-04-11 17:24:50 +08:00
0c5e3df4a3 [optimize](string) optimize split_by_string and substring_index function (#18496)
Use SIMD stringsearcher and SIMD memcmp optimze split_by_string and substring_index function.

split_by_string function has 32%~540% up
substring_index function has 22%~46% up
Performance difference depends on the needle size and whether the needle is constant param. And the longer the needle, the more performance improvement
2023-04-11 15:49:03 +08:00
e562017801 [feature](table-metadata) support altering the property "light_schema_change" for the tables which created before 1.2 (#17704) 2023-04-11 11:09:43 +08:00
101737023c [Bug](round) fix wrong scale for round-like function (#18507) 2023-04-11 09:36:59 +08:00
1c0698e2d7 [bug](be) fix accept null predicate mem leak (#18510) 2023-04-11 09:08:06 +08:00
Pxl
297764b37d [Chore](build) fix some compile fail on gnu20 && remove some unused compatibility codes (#18467) 2023-04-10 18:05:52 +08:00
a8315b86ca [refactor](planner) using crchash replace murmurhash in the runtime filter (#18472)
When the be_exec_version is less than 2, murmurhash will still be used, otherwise crc32 will be used. When the be_exec_version is upgraded to 2, please remove.
2023-04-10 14:12:39 +08:00
012a261f69 [FIX](complex-type) fixed complex type with create_column_const_with_default_value #18463 2023-04-10 14:11:15 +08:00
5efafefeda [refactor](string) remove volnitsky search algorithm (#18474) 2023-04-10 10:56:07 +08:00
ea47a6ae59 [fix](hdfs) not setting hadoop username when kerberos enabled (#18485)
1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up #18397
3. Add KW_HOSTNAME to keywords region, follow up #17329
4. Fix tvf not working with pipeline engine, follow up #18376
2023-04-10 09:32:27 +08:00
Pxl
c9b4eaea76 [Chore](storage) change FieldType to enum class #18500 2023-04-10 08:53:44 +08:00
f38e00b4c0 [refactor](typesystem) using typeindex to create column instead of type name because type name is not stable (#18328)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-09 18:08:31 +08:00
3d28de6e54 [Enhencement](like) fallback to re2 if hyperscan failed (#18350) 2023-04-09 09:18:13 +08:00
60c0bbe272 [fix](profile) fix show load query profile (#18487)
Sometimes, `show load profile` will only show part of the insert opertion's profile.
This is because we assume that for all load operation(including insert), there is only one fragment in the plan.
But actually, there will be more than 1 fragment in plan. eg:

`insert into tbl1 select * from tbl1 limit 1` will have 2 fragments.

This PR mainly changes:

1. modify the `show load profile`
   Before:  `show load profile "/queryid/taskid/instanceid";`
   After: `show load profile "/queryid/taskid/fragmentid/instanceid";`

2. Modify the display of `ReadColumns` in OlapScanNode
    Because for wide table, the line of `ReadColumns` may be too long for show in profile.
    So I wrap it and each line contains at most 10 columns names.

3. Fix tvf not working with pipeline engine, follow up #18376
2023-04-09 08:41:18 +08:00
fb50626075 [optimize](string) optimize concat function by SIMD memcpy (#18458)
Optimize concat function 29% up by memcpy_small_allow_read_write_overflow15.
Optimize string functions list: concat, convert_to, mask, initcap, lower, upper.

concat function has 29% up:
2023-04-08 17:05:34 +08:00
58bbd46c65 [Optimization](string) optimize constant empty string compare ( column='', column!='') (#18321)
Optimize constant empty string compare:
(1) When the constant empy string '' (size is 0), we can compare offsets in SIMD directly.

q10: SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
q11: SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
q12: SELECT SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q13: SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10;
q14: SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
Issue Number: close #xxx
2023-04-08 16:04:10 +08:00
0517616242 [vectorized](function) support array_repeat function to be compatible with hive syntax (#18028)
---------

Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-04-08 15:50:28 +08:00
0b8bc51b72 [fix](inverted index) Fix key column match query failed (#18436)
* [fix](inverted index) Fix key column match query failed

* [chore](regression case) add regression case

* [fix] fix regression case no order by
2023-04-08 15:45:08 +08:00
161678380c [bug](GC)the issue of incorrect disk usage (#18397) 2023-04-08 09:32:36 +08:00