doris

Author	SHA1	Message	Date
Mryange	bc697ca9d6	[fix](time) fix error in time_to_sec	2023-08-07 17:33:24 +08:00
AlexYue	f036cdfde6	[feature](compaction) support delete in cumulative compaction (#19609 )	2023-08-07 15:22:21 +08:00
Pxl	591aee528d	[Bug](exchange) change BlockSerializer from unique_ptr to object (#22653 ) change BlockSerializer from unique_ptr to object	2023-08-07 14:47:21 +08:00
HHoflittlefish777	0ca0c162b1	[fix][load] fix memtable reset cause nullptr (#22577 )	2023-08-07 10:45:09 +08:00
zhangstar333	af8774c2e6	[Test](function) not unpack when else column is const null in IF function (#22419 )	2023-08-07 09:34:48 +08:00
Xinyi Zou	1847e440b2	[fix](memory) enable Jemalloc arena dirty pages (#22639 ) If there is a core dump here, it may cover up the real stack, if stack trace indicates heap corruption (which led to invalid jemalloc metadata), like double free or use-after-free in the application. Try sanitizers such as ASAN, or build jemalloc with --enable-debug to investigate further.	2023-08-06 19:18:44 +08:00
czzmmc	1a8a1e5b16	[Feature](count_by_enum) support count_by_enum function (#22071 ) count_by_enum(expr1, expr2, ... , exprN); Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.	2023-08-06 16:05:14 +08:00
Xinyi Zou	c2c01825c1	[opt](stacktrace) Optimize stacktrace output #22467	2023-08-06 15:53:53 +08:00
Mingyu Chen	d628baba0a	[improvement](hdfs) support hedged read (#22634 ) In some cases, the high load of HDFS may lead to a long time to read the data on HDFS, thereby slowing down the overall query efficiency. HDFS Client provides Hedged Read. This function can start another read thread to read the same data when a read request exceeds a certain threshold and is not returned, and whichever is returned first will use the result. eg: create catalog regression properties ( 'type'='hms', 'hive.metastore.uris' = 'thrift://172.21.16.47:7004', 'dfs.client.hedged.read.threadpool.size' = '128', 'dfs.client.hedged.read.threshold.millis' = "500" );	2023-08-06 14:51:48 +08:00
Jerry Hu	ab3fc1df5e	[chore](profile) Fix 'BlocksProduced' in plan_fragment_executor (#22637 )	2023-08-06 12:42:39 +08:00
Xinyi Zou	96f42ca20a	[fix](memory) Independent count exec node memory profile (#22598 ) Independent count exec node memory profile, after #22582	2023-08-06 10:56:31 +08:00
Pxl	7839a0e708	[Bug](brpc) fix brpc failed on big query came concurrently (#22600 ) fix PriorityThreadPool get_info get wrong number change brpc pool from priority to fifo do not use brpc pool when send eos	2023-08-05 21:24:32 +08:00
yiguolei	55100277a1	[refactor](mysql writer) remove some unused code (#22632 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-08-05 17:59:14 +08:00
zhangstar333	d3b50e3b2a	[BUG](date_trunc) fix date_trunc function only handle lower string (#22602 ) fix date_trunc function only handle lower string	2023-08-05 12:53:13 +08:00
zzzxl	fe6bae2924	[fix](invert index) supports utf8 and non-utf8 strings (#22570 ) supports utf8 and non-utf8 strings: [fix] compatible with utf8 and invalid utf8 doris-thirdparty#110	2023-08-05 12:52:53 +08:00
Jack Drogon	6fe0aa492c	[Chore](cmake) Remove ununsed be rowset CMakeLists.txt (#22627 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-08-05 12:51:58 +08:00
Xujian Duan	3024b82918	[fix](load)Fix wrong default value for char and varchar of reading json data (#22626 ) If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct: +------+ \| col \| +------+ \| 1 \| +------+ But expect: +------+ \| col \| +------+ \| NULL \| +------+ --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-08-05 12:47:27 +08:00
zxealous	38f9ac99df	[fix](bug) fix be custom conf persistence path and read path are inconsistent (#22520 ) be_custom.conf persistence path is ${doris_home}/conf/be_custom.conf, but if we set ${custom_config_dir} is a different path, will cause be can't read be_custom.conf from ${custom_config_dir}. set be_custom.conf persist path to ${custom_config_dir}.	2023-08-05 10:22:08 +08:00
huanghaibin	12262a2025	[fix](compaction) filter block row locations with delete sign should ignore merge on read scenario (#22628 )	2023-08-05 09:15:38 +08:00
huanghaibin	26e78ab418	[fix](compaction)none vertical compaction should also use _unique_key_next_block function to read block (#22614 )	2023-08-05 00:24:57 +08:00
Pxl	c1c38c956d	[exec] fix coredump when limit<0 and limit!=-1 with 1.2 fe (#22622 )	2023-08-04 22:18:45 +08:00
Kaijie Chen	8bbccc59ef	[refactor](load) split segment flush out of beta rowset writer (#21725 )	2023-08-04 19:48:56 +08:00
TengJianPing	b122f9b80c	[fix](concat) ColumnString::chars is resized with wrong size (#22610 ) FunctionStringConcat::execute_impl resized with size that include string null terminator, which causes ColumnString::chars.size() does not match with ColumnString::offsets.back, this will cause problems for some string functions, e.g. like and regexp.	2023-08-04 19:13:35 +08:00
Kaijie Chen	93593a013d	[feature](load) add segment bytes limit in segcompaction (#22526 )	2023-08-04 18:00:52 +08:00
Kang	7fe08c74fe	[fix](inverted index) return empty result instead of error for empty match query (#22592 ) return empty result instead of error for empty match query as follows: `SELECT * FROM t WHERE msg MATCH ''` `SELECT * FROM t WHERE msg MATCH 'stop_word'`	2023-08-04 17:36:32 +08:00
DeadlineFen	3d758de7a2	[improvement](binlog) gc be binlog metas when tablet is dropped. (#22447 )	2023-08-04 14:38:13 +08:00
zhengyu	24c1953e91	[fix](debug) add bvar counter for memtable & loadchannel (#22578 ) * [fix](debug) add bvar counter for memtable & loadchannel Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> * format code Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> --------- Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-08-04 13:58:28 +08:00
Jack Drogon	3d0d5bfd6d	[chore](cmake) Split thirdparty into cmake/thirdparty.cmake (#22572 ) * [chore](cmake) Split thirdparty into cmake/thirdparty.cmake * Add Apache License into thirdparty.cmake Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com> --------- Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-08-04 13:21:22 +08:00
Xinyi Zou	ed6bb1fc9d	[fix](memory) remove memory tracker profile refresh thread #22582 Memtrackers are usually bound to operators in query/load. If a large number of query/loads are stuck, memtrackers will be very large. memory tracker profile refresh thread will get stuck on the lock. This pr is for branch-2.0, I will rewrite the memory profile in the next pr	2023-08-04 11:51:19 +08:00
huanghaibin	868e65d618	[fix](compaction) rowid_conversion should ignore deleted row (#22579 )	2023-08-04 11:41:17 +08:00
Tiewei Fang	bad8237850	[BugFix](Es Catalog) fix bug that es catalog will return error when query partial columns (#22423 ) Bug： When the value of some ES column is empty, querying these value in doc_values mode will receive an error. Reson： In doc values mode, these values are empty, We need to determine if the array is empty	2023-08-04 11:28:30 +08:00
Qi Chen	9c0528daf6	[Opt](orc-reader) opt the performance of date convertion. (#22381 ) Opt the performance of date conversion in orc reader. ``` mysql> select count(l_commitdate) from lineitem; +---------------------+ \| count(l_commitdate) \| +---------------------+ \| 600037902 \| +---------------------+ 1 row in set (1.28 sec) mysql> select count(l_commitdate) from lineitem; +---------------------+ \| count(l_commitdate) \| +---------------------+ \| 600037902 \| +---------------------+ 1 row in set (0.19 sec) ```	2023-08-04 10:52:09 +08:00
Kaijie Chen	0c68f7e347	[peformance](load) cancel unstarted segcompaction tasks when build rowset (#22392 )	2023-08-04 10:10:38 +08:00
Xinyi Zou	e8d105d6ff	[fix](debug) add bvar counter for memtracker #22581	2023-08-04 09:56:30 +08:00
Mingyu Chen	1ed1b69485	[refactor](reader) move reader from vec/exec/scan to vec/exec/format (#22371 ) This readers should be in vec/exec/format	2023-08-04 09:47:20 +08:00
Pxl	c4cee5122b	[Chore](brpc) make error messages more verbose when brpc pool offer failed (#22558 )	2023-08-03 22:02:37 +08:00
amory	86e6f5d039	[FIX](decimal)fix decimal precision (#22364 ) Now we make wrong for decimal parse from string if given string precision is bigger than defined decimal precision, we will return a overflow error, but only digit part is bigger than typed digit length , we should return overflow error when we traverse given string to decimal value	2023-08-03 21:13:58 +08:00
HappenLee	e7e73a618c	[exec](join) Print join type in profile (#22567 )	2023-08-03 20:46:15 +08:00
Pxl	098bab7b30	[Bug](exchange) disable implicit conversion of block to bool (#22534 ) disable implicit conversion of block to bool	2023-08-03 20:37:14 +08:00
AlexYue	ec187662be	use correct bool value (#22507 )	2023-08-03 20:09:57 +08:00
Xinyi Zou	96a46302e8	[fix](stacktrace) Fix Jemalloc enable profile fail to run BE after rewrites dl_iterate_phdr (#22549 ) Jemalloc heap profile follows libgcc's way of backtracing by default. rewrites dl_iterate_phdr will cause Jemalloc to fail to run after enable profile. TODO, two solutions: - Jemalloc specifies GNU libunwind as the prof backtracing way, but my test failed, --enable-prof-libunwind not work: --enable-prof-libunwind not work jemalloc/jemalloc#2504 - ClickHouse/libunwind solves Jemalloc profile backtracing, but the branch of ClickHouse/libunwind has been out of touch with GNU libunwind and LLVM libunwind, which will leave the fate to others.	2023-08-03 19:32:36 +08:00
Jack Drogon	d02b45e847	[chore](cmake) Refactor be CMakeLists option (#22499 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-08-03 19:25:04 +08:00
zhannngchen	e90f95dfda	[config](merge-on-write) use separate config to control primary key index cache (#22538 )	2023-08-03 17:11:19 +08:00
HappenLee	f7755aa538	[exec](set_operation) Support one child node in set operation (#22463 ) Support one child node in set operation	2023-08-03 10:35:59 +08:00
zhangstar333	9f0a9e6fd6	[bug](distinct-agg) fix limit value not effective in some case (#22517 ) fix limit value not effective in some case	2023-08-03 10:35:36 +08:00
Kaijie Chen	c2db01037a	[refactor](config) rename segcompaction_max_threads (#22468 )	2023-08-02 22:35:14 +08:00
Ashin Gau	938f768aba	[fix](parquet) resolve offset check failed in parquet map type (#22510 ) Fix error when reading empty map values in parquet. The `offsets.back()` doesn't not equal the number of elements in map's key column. ### How does this happen Map in parquet is stored as repeated group, and `repeated_parent_def_level` is set incorrectly when parsing map node in parquet schema. ``` the map definition in parquet: optional group <name> (MAP) { repeated group map (MAP_KEY_VALUE) { required <type> key; optional <type> value; } } ``` ### How to fix Set the `repeated_parent_def_level` of key/value node as the definition level of map node. `repeated_parent_def_level` is the definition level of the first ancestor node whose `repetition_type` equals `REPEATED`. Empty array/map values are not stored in doris column, so have to use `repeated_parent_def_level` to skip the empty or null values in ancestor node. For instance, considering an array of strings with 3 rows like the following: `null, [], [a, b, c]` We can store four elements in data column: `null, a, b, c` and the offsets column is: `1, 1, 4` and the null map is: `1, 0, 0` For the `i-th` row in array column: range from `offsets[i - 1]` until `offsets[i]` represents the elements in this row, so we can't store empty array/map values in doris data column. As a comparison, spark does not require `repeated_parent_def_level`, because the spark column stores empty array/map values , and use anther length column to indicate empty values. Please reference: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java Furthermore, we can also avoid store null array/map values in doris data column. The same three rows as above, We can only store three elements in data column: `a, b, c` and the offsets column is: `0, 0, 3` and the null map is: `1, 0, 0`	2023-08-02 22:33:10 +08:00
Pxl	3d0d7a427b	[Chore](brpc) display pool name when try offer failed (#22514 )	2023-08-02 22:31:33 +08:00
ZenoYang	9d3f1dcf44	[improvement](vectorized) Deserialized elements of count distinct aggregation directly inserted into target hashset (#21888 ) The original logic is to first deserialize the ColumnString into a HashSet (insert the deserialized elements into the hashset), and then traverse all the HashSet elements into the target HashSet during the merge phase. After optimization, when deserializing, elements are directly inserted into the target HashSet, thereby reducing unnecessary hashset insert overhead. In one of our internal query tests, 30 hashsets were merged in second phase aggregation(the average cardinality is 1,400,000), and the cardinality after merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms to 3s375ms.	2023-08-02 21:19:56 +08:00
Kaijie Chen	781c1d5238	[log](load) add debug logs for potential duplicate tablet ids (#22485 )	2023-08-02 20:38:41 +08:00

1 2 3 4 5 ...

5194 Commits