doris

Author	SHA1	Message	Date
ZhangYu0123	0c5e3df4a3	[optimize](string) optimize split_by_string and substring_index function (#18496 ) Use SIMD stringsearcher and SIMD memcmp optimze split_by_string and substring_index function. split_by_string function has 32%~540% up substring_index function has 22%~46% up Performance difference depends on the needle size and whether the needle is constant param. And the longer the needle, the more performance improvement	2023-04-11 15:49:03 +08:00
奕冷	e562017801	[feature](table-metadata) support altering the property "light_schema_change" for the tables which created before 1.2 (#17704 )	2023-04-11 11:09:43 +08:00
Gabriel	101737023c	[Bug](round) fix wrong scale for round-like function (#18507 )	2023-04-11 09:36:59 +08:00
AlexYue	1c0698e2d7	[bug](be) fix accept null predicate mem leak (#18510 )	2023-04-11 09:08:06 +08:00
Pxl	297764b37d	[Chore](build) fix some compile fail on gnu20 && remove some unused compatibility codes (#18467 )	2023-04-10 18:05:52 +08:00
Mryange	a8315b86ca	[refactor](planner) using crchash replace murmurhash in the runtime filter (#18472 ) When the be_exec_version is less than 2, murmurhash will still be used, otherwise crc32 will be used. When the be_exec_version is upgraded to 2, please remove.	2023-04-10 14:12:39 +08:00
amory	012a261f69	[FIX](complex-type) fixed complex type with create_column_const_with_default_value #18463	2023-04-10 14:11:15 +08:00
ZhangYu0123	5efafefeda	[refactor](string) remove volnitsky search algorithm (#18474 )	2023-04-10 10:56:07 +08:00
Mingyu Chen	ea47a6ae59	[fix](hdfs) not setting hadoop username when kerberos enabled (#18485 ) 1. If we set hadoop user property along with kerberos info, the authentication will fail. 2. fix some minor issue of local fs, follow up #18397 3. Add KW_HOSTNAME to keywords region, follow up #17329 4. Fix tvf not working with pipeline engine, follow up #18376	2023-04-10 09:32:27 +08:00
Pxl	c9b4eaea76	[Chore](storage) change FieldType to enum class #18500	2023-04-10 08:53:44 +08:00
yiguolei	f38e00b4c0	[refactor](typesystem) using typeindex to create column instead of type name because type name is not stable (#18328 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-09 18:08:31 +08:00
Kang	3d28de6e54	[Enhencement](like) fallback to re2 if hyperscan failed (#18350 )	2023-04-09 09:18:13 +08:00
Mingyu Chen	60c0bbe272	[fix](profile) fix show load query profile (#18487 ) Sometimes, `show load profile` will only show part of the insert opertion's profile. This is because we assume that for all load operation(including insert), there is only one fragment in the plan. But actually, there will be more than 1 fragment in plan. eg: `insert into tbl1 select * from tbl1 limit 1` will have 2 fragments. This PR mainly changes: 1. modify the `show load profile` Before: `show load profile "/queryid/taskid/instanceid";` After: `show load profile "/queryid/taskid/fragmentid/instanceid";` 2. Modify the display of `ReadColumns` in OlapScanNode Because for wide table, the line of `ReadColumns` may be too long for show in profile. So I wrap it and each line contains at most 10 columns names. 3. Fix tvf not working with pipeline engine, follow up #18376	2023-04-09 08:41:18 +08:00
ZhangYu0123	fb50626075	[optimize](string) optimize concat function by SIMD memcpy (#18458 ) Optimize concat function 29% up by memcpy_small_allow_read_write_overflow15. Optimize string functions list: concat, convert_to, mask, initcap, lower, upper. concat function has 29% up:	2023-04-08 17:05:34 +08:00
ZhangYu0123	58bbd46c65	[Optimization](string) optimize constant empty string compare ( column='', column!='') (#18321 ) Optimize constant empty string compare: (1) When the constant empy string '' (size is 0), we can compare offsets in SIMD directly. q10: SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10; q11: SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10; q12: SELECT SearchPhrase, COUNT() AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10; q13: SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10; q14: SELECT SearchEngineID, SearchPhrase, COUNT() AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10; Issue Number: close #xxx	2023-04-08 16:04:10 +08:00
ZhangYu0123	0517616242	[vectorized](function) support array_repeat function to be compatible with hive syntax (#18028 ) --------- Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-04-08 15:50:28 +08:00
YueW	0b8bc51b72	[fix](inverted index) Fix key column match query failed (#18436 ) * [fix](inverted index) Fix key column match query failed * [chore](regression case) add regression case * [fix] fix regression case no order by	2023-04-08 15:45:08 +08:00
chenlinzhong	161678380c	[bug](GC)the issue of incorrect disk usage (#18397 )	2023-04-08 09:32:36 +08:00
Gabriel	d881d71cd1	[Bug](cast) Fix bug for cast function between datetimev2 and string (#18442 ) Fix bug for cast function between datetimev2 and string	2023-04-07 22:02:15 +08:00
amory	30f2abe5d3	[FIX](Map)fix calculate map offset in olap convertor (#18295 ) Fix be core when load bigger kv data in one row for map.	2023-04-07 17:04:08 +08:00
zxealous	e3ff2e3d21	[fix](file cache) Fix be core while use block/whole/sub file cache (#18440 ) BE will core dump while use whole/sub file cache. Call func CachedRemoteFileReader/WholeFileCache/SubFileCache::read_at_impl() did not pass IOContext when reading segment footer.	2023-04-07 16:39:59 +08:00
Gabriel	f6f4dac1d0	[Improvement](DECIMAL) Improve decimal operation (#18437 )	2023-04-07 15:58:28 +08:00
Xinyi Zou	308ff9a16f	[enchancement](memory) tracking lru cache memory and page memory not in cache (#18361 ) Statistics lru cache memory in metrics Statistics page memory not in cache in mem tracker	2023-04-07 14:22:44 +08:00
Jerry Hu	d36e9bd523	[chore](scan) Disable low cardinality optimization for compaction (#18424 )	2023-04-07 14:19:11 +08:00
HappenLee	c32adba1cf	[Refactor](Pipeline) Refactor pipeline code to improve coverage (#18376 ) Refactor pipeline code to improve coverage	2023-04-07 13:09:44 +08:00
airborne12	2b662ac26b	[Fix](segment iterator) fix filter block size and filter size mismatch problem (#18395 ) adding result column id to _column_filter in _output_index_result_column	2023-04-07 09:43:33 +08:00
TengJianPing	4e1cdb9ce7	[fix](agg_sort)fix bug of agg sort group concat with order by(#18447 )	2023-04-07 08:42:36 +08:00
Tiewei Fang	759f1da32e	[Enhencement](Backends) add `HostName` filed in backends table and delete backends table in information_schema (#18156 ) 1. Add `HostName` field for `show backends` statement and `backends()` tvf. 2. delete the `backends` table in `information_schema` database	2023-04-07 08:30:42 +08:00
Mingyu Chen	e848e456be	[config] modify tablet_shard to 4 and add some log (#18416 ) modify the default value of BE config tablet_map_shard_size to 4. To reduce lock contention. Add log when failed writing disk test file, for debug	2023-04-06 17:18:16 +08:00
amory	82248ab392	[FIX](complex-type) get_default to return real nested default value (#18413 ) make real default value to return with nested type in complex type	2023-04-06 15:24:32 +08:00
YueW	591f76a6a4	[fix](alter inverted index) Temporary deal with add or drop inverted index by directly schema change (#18378 ) In the current implementation of the function of dynamically add and drop inverted index, there is a problem that the inverted index information of historical data is out of date after compaction on the base tablet. In the future, I will submit PRs to solve this problem. Now, temporarily add or drop inverted index by the directly schema change logic	2023-04-06 15:07:37 +08:00
Gabriel	550c8aa648	[Bug](DECIMALV3) fix wrong decimal scale returned by function `round` (#18375 )	2023-04-06 14:44:21 +08:00
Pxl	76d76f672c	[Chore](build) enchancement for backend build time usage (#18344 )	2023-04-06 11:13:21 +08:00
TengJianPing	4ca0c0face	[fix](join) fix wrong result of right join (#18365 ) When processing data in hash table for right join and full outer join, if the output data rows of one hash bucket excceeds batch size, the logic when continue processing this bucket is wrong, it should differentiate between different join types.	2023-04-06 10:55:58 +08:00
Gabriel	a01d824256	[Improvement](bloom filter) inline function call (#18396 )	2023-04-06 10:21:48 +08:00
Ashin Gau	f28c75bd80	[fix](file_reader) bad_typeid when reading csv&json files (#18400 ) PR(#18340) resolve the conflict with PR(#18301) has changed the file_reader to create, resulting in e: [E-123] std::bad_typeid exception.	2023-04-06 10:00:29 +08:00
Jerry Hu	66a0c090b8	[fix](column) Add unimplemented replicate function in ColumnStruct (#18368 )	2023-04-06 09:50:27 +08:00
Ashin Gau	47aa8a6d8a	[fix](file_cache) turn on file cache by FE session variable (#18340 ) Fix tow bugs: 1. Enabling file caching requires both `FE session` and `BE` configurations(enable_file_cache=true) to be enabled. 2. `ParquetReader` has not used `IOContext` previously, but `CachedRemoteFileReader::read_at` needs `IOContext` after PR(#17586).	2023-04-05 15:51:47 +08:00
gitccl	7f8d92656e	[fix](streamload) fix stream load failed when enable profile (#18364 ) #18015 enables stream load profile log, however be will encounter rpc fail when loading tpch data(see #18291). This is because when `is_report_success` is true, be will reportExecStatus to fe, but fe cannot find QueryInfo in `coordinatorMap`, thus it will return error to be.	2023-04-05 01:01:46 +08:00
morrySnow	e29fc3b46b	[fix](chore) fix compile failed in JdbcExecutor and revert #18306 since be crash randomly (#18371 ) fix 2 problems: 1. PR #18187 use the api resizeColumn in JNINativeMethod has been removed by #17960 2. revert PR #18306 to fix pipeline core when load	2023-04-04 20:04:28 +08:00
Ashin Gau	66bfd18601	[opt](file_reader) add prefetch buffer to read csv&json file (#18301 ) Co-authored-by: ByteYue <[yj976240184@gmail.com](mailto:yj976240184@gmail.com)> This PR is an optimization for https://github.com/apache/doris/pull/17478: 1. Change the buffer size of `LineReader` to 4MB to align with the size of prefetch buffer. 2. Lazily prefetch data in the first read to prevent wasted reading. 3. S3 block size is 32MB only, which is too small for a file split. Set 128MB as default file split size. 4. Add `_end_offset` for prefetch buffer to prevent wasted reading. The query performance of reading data on object storage is improved by more than 3x+.	2023-04-04 19:05:22 +08:00
zhannngchen	175e5d405c	[improvement](merge-on-write) remove CHECK if lookup_row_key return unexpected status (#18326 )	2023-04-04 12:42:07 +08:00
yixiutt	0cada3f81d	[Enhancement](compaction) return error instead of core when ctx not valid (#18363 )	2023-04-04 12:27:13 +08:00
zhangstar333	54dbb4af67	[vectorzied](jdbc) refactor jdbc table read array type (#18187 ) jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object it's difficult to maintain and read the code, so change all database's array result to string, then add a cast function from string to doris array type	2023-04-04 11:57:04 +08:00
Xin Liao	418ea0a24e	[fix](merge-on-write) fix that failed to capture_consistent_rowsets when full clone (#18346 ) When full clone, if the max version of the local table is less than or equal to the max version of the clone table, there is no need to calculate the delete bitmap again.	2023-04-04 10:39:28 +08:00
zhangstar333	50e6c4216a	[vectorized](function) suppoort date_trunc function truncate week mode (#18334 ) support date_trunc could truncate week eg: select date_trunc('2023-4-3 19:28:30', 'week');	2023-04-04 10:24:26 +08:00
Gabriel	a724443eb9	[Improvement](predicate) optimize short-circuit predicates (#18278 ) For scan node with no vectorized predicate, the input column for the first short-circuit predicate is dense and we don't need to access the selector column. This PR improve performance by ~30% on TPCH Q3.	2023-04-04 10:21:41 +08:00
Lightman	af80e65094	[Improve](FileCahe) Support the file cache profile in olap scan node and Update the profile (#17710 ) We want to use file cache for caching cold data in S3. When reading them, we want to know where the data come from and the time taken to read the datas. So we support the metrics in olap scan node. And for clearing the information, i also update the fields about the metrics.	2023-04-04 10:18:30 +08:00
ZhangYu0123	8b85c55117	[vectorized](function) Support array_shuffle and shuffle function. (#18116 ) --------- Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-04-04 08:53:13 +08:00
Qi Chen	eb0fd0017e	[Fix](orc-reader) Fix the scale of decimal column is incorrect when query orc tables. (#18324 ) The scale of decimal column is incorrect when query orc tables.	2023-04-04 08:50:47 +08:00

1 2 3 4 5 ...

4228 Commits