doris

Author	SHA1	Message	Date
zclllhhjj	79a6496bb6	[branch-2.1](function) fix wrong result when convert_tz is out of bound (#37358 ) (#38313 ) ## Proposed changes pick https://github.com/apache/doris/pull/37358 before: ```sql mysql> select CONVERT_TZ(cast('0000-01-01 00:00:00.00001' as DATETIMEV1), cast('Asia/Shanghai' as VARCHAR(65533)), cast('America/Los_Angeles' as VARCHAR(65533))); +---------------------------------------------------------------------------------------------------------------------------------------------------+ \| convert_tz(cast('0000-01-01 00:00:00.00001' as DATETIME), cast('Asia/Shanghai' as VARCHAR(65533)), cast('America/Los_Angeles' as VARCHAR(65533))) \| +---------------------------------------------------------------------------------------------------------------------------------------------------+ \| q535-12-31 08:01:19 \| +---------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.12 sec) ``` now: ```sql mysql> select CONVERT_TZ(cast('0000-01-01 00:00:00.00001' as DATETIMEV1), cast('Asia/Shanghai' as VARCHAR(65533)), cast('America/Los_Angeles' as VARCHAR(65533))); +---------------------------------------------------------------------------------------------------------------------------------------------------+ \| convert_tz(cast('0000-01-01 00:00:00.00001' as DATETIME), cast('Asia/Shanghai' as VARCHAR(65533)), cast('America/Los_Angeles' as VARCHAR(65533))) \| +---------------------------------------------------------------------------------------------------------------------------------------------------+ \| NULL \| +---------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.09 sec) ```	2024-07-25 11:32:44 +08:00
bobhan1	c30c1d2436	[branch-2.1] Picks "[opt](delete) Delete job should retry for failure that is not DELETE_INVALID_XXX #37834 " (#38032 ) ## Proposed changes picks https://github.com/apache/doris/pull/37834 and https://github.com/apache/doris/pull/38043	2024-07-18 14:50:30 +08:00
zhiqiang	02716598d4	[Fix](sql function) memory overflow to the left of string address when do_money_format has small negative value #36226 (#37870 ) cherry pick from #36226 Co-authored-by: sparrow <38098988+biohazard4321@users.noreply.github.com>	2024-07-16 15:04:42 +08:00
Pxl	d7e84b7ee3	[Enchancement](bitmap) optimize bitmap deserialize and remove some unused code (#37623 ) ## Proposed changes pick from #35789	2024-07-16 11:21:54 +08:00
zhangstar333	967173d7d0	[cherry-pick-2.1](table-function) pick some table functions exec performance (#34090 ) (#37778 ) ## Proposed changes pick from master: https://github.com/apache/doris/pull/33904 https://github.com/apache/doris/pull/34090 Co-authored-by: HappenLee <happenlee@hotmail.com>	2024-07-15 17:15:56 +08:00
zclllyybb	2759383365	[branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062 ) (#37269 ) pick https://github.com/apache/doris/pull/37062 1. revert https://github.com/apache/doris/pull/25097. we decide to rely on OS. not maintain independent tzdata anymore to keep result consistency 2. refactor timezone load. removed rwlock. before: ```sql mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates; +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) \| count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| 16000000 \| 16000000 \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ 1 row in set (6.88 sec) ``` now: ```sql mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates; +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) \| count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| 16000000 \| 16000000 \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ 1 row in set (2.61 sec) ``` 3. now don't support timezone offset format string like 'UTC+8', like we already said in https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage 4. support case-insensitive timezone parsing in nereids. 5. a bug when parse timezone using nereids. should check DST by input, but wrongly by now before. now fixed. doc pr: https://github.com/apache/doris-website/pull/810	2024-07-15 10:56:48 +08:00
Qi Chen	8930df3b31	[Feature](iceberg-writer) Implements iceberg partition transform. (#37692 ) ## Proposed changes Cherry-pick iceberg partition transform functionality. #36289 #36889 --------- Co-authored-by: kang <35803862+ghkang98@users.noreply.github.com> Co-authored-by: lik40 <lik40@chinatelecom.cn> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Mingyu Chen <morningman@163.com>	2024-07-13 16:07:50 +08:00
Xinyi Zou	cf2fb6945a	[branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658 ) pick #36235 #35965	2024-07-11 21:04:01 +08:00
Xinyi Zou	62e0230523	[branch-2.1](memory) Add `ThreadMemTrackerMgr` BE UT (#37654 ) ## Proposed changes pick #35518	2024-07-11 21:03:49 +08:00
Kaijie Chen	fed632bf4a	[fix](move-memtable) check segment num when closing each tablet (#36753 ) (#37536 ) cherry-pick #36753 and #37660	2024-07-11 20:33:44 +08:00
Luwei	9f4e7346fb	[fix](compaction) fixing the inaccurate statistics of concurrent compaction tasks (#37318 ) (#37496 )	2024-07-10 22:23:25 +08:00
walter	afcc6170f6	[fix](txn_manager) Add ingested rowsets to unused rowsets when removing txn (#37417 ) Generally speaking, as long as a rowset has a version, it can be considered not to be in a pending state. However, if the rowset was created through ingesting binlogs, it will have a version but should still be considered in a pending state because the ingesting txn has not yet been committed. This PR updates the condition for determining the pending state. If a rowset is COMMITTED, the txn should be allowed to roll back even if a version exists. Cherry-pick #36551	2024-07-10 14:25:44 +08:00
walter	5280e277e7	[chore](be) Acquire and check MD5 digest of the file to download (#37418 ) Cherry-pick #35807, #36621, #36726	2024-07-08 18:55:35 +08:00
Mingyu Chen	ceef9ee123	[feature](serde) support presto compatible output format (#37039 ) (#37253 ) bp #37039	2024-07-04 13:56:05 +08:00
Yongqiang YANG	07278e9dcb	[improvement](segmentcache) limit segment cache by memory or segment … (#37035 ) …num (#37026) pick ##37026	2024-06-30 20:34:13 +08:00
Jerry Hu	f27ae8fa09	[fix](bitmap) incorrect type of BitmapValue with fastunion (#36834 ) (#36896 )	2024-06-28 11:29:03 +08:00
Mingyu Chen	0cff539810	[feature](function) support new function replace_empty (#36283 ) (#36656 ) #36283	2024-06-21 16:46:22 +08:00
zhiqiang	c8f2a3f952	[fix](eq_for_null) fix incorrect logic in function eq_for_null #36004 (#36124 ) cherry pick from #36004 cherry pick from #36164	2024-06-21 14:31:21 +08:00
Kaijie Chen	612f2ae961	[feature](api) add BE HTTP /api/load_streams (#36312 ) (#36338 ) cherry-pick #36312	2024-06-16 22:09:04 +08:00
Mingyu Chen	b75533e72b	[branch-2.1](beut) fix BE UT (#36147 ) only for branch-2.1	2024-06-12 08:21:38 +08:00
abmdocrt	596a9a16d3	[chore](Compile) Fix segment cache ut's compile error due to miss cherry-pick (#36099 )	2024-06-11 17:12:42 +08:00
AlexYue	a0f3c1cd1e	[chore](Compile) Fix S3 file writer ut's compile error due to miss cherry-pick (#36037 ) The S3 File Writer's ut can't pass ut compile, this pr tries to fix it.	2024-06-08 22:21:20 +08:00
plat1ko	af779f5cd8	Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc (#32793 )" (#35978 ) ## Proposed changes Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc (#32793)"	2024-06-06 22:24:30 +08:00
yiguolei	f80b856405	[enhancement](oom) return error when bloom filter allocate memory failed (#35790 ) ## Proposed changes 1. return error when bloom filter allocate memory failed 2. return error when deserialize a block， it may need a lot of memory. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-06-03 18:22:11 +08:00
yiguolei	9dd573888a	[bugfix](stdcallonce) replace std callonce with a lock because it is not exception safe (#35126 )	2024-06-01 08:00:42 +08:00
Gavin Chou	9c270e5cdf	[fix](delete) Fix unrecognized column name delete handler (#32429 ) (#35742 ) pick doris-master #32429	2024-05-31 20:41:22 +08:00
zclllyybb	680be6d19f	[fix](ub) fix uninitialized accesses in BE (#35370 ) ubsan hints: ```c++ /root/doris/be/src/olap/hll.h:93:29: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType' /root/doris/be/src/olap/hll.h:94:23: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType' /root/doris/be/src/runtime/descriptors.h:439:38: runtime error: load of value 118, which is not a valid value for type 'bool' /root/doris/be/src/vec/exec/vjdbc_connector.cpp:61:50: runtime error: load of value 35, which is not a valid value for type 'bool' ```	2024-05-29 20:31:07 +08:00
Qi Chen	b91d2caab8	[Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587 ) backport #34929	2024-05-29 16:40:54 +08:00
TengJianPing	8fb28244d6	[improvement](page builder) avoid allocating big memory in ctor (#35493 ) ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-29 15:03:54 +08:00
Kaijie Chen	7058b31edd	[fix](move-memtable) clear load streams before shutdown SegmentFileWriterThreadPool (#35217 )	2024-05-28 13:12:03 +08:00
Pxl	b143f0dfe2	[Improvement](date) shortcut for str to date parse (#35288 ) shortcut for str to date parse	2024-05-25 17:47:20 +08:00
TengJianPing	639c7ee7fb	[fix](decimalv2) fix scale of decimalv2 to string (#35222 ) (#35359 ) * [fix](decimalv2) fix scale of decimalv2 to string	2024-05-24 17:20:43 +08:00
abmdocrt	309503855e	[Fix](bloom filter) Fix bloom filter memory leak (#34871 ) * Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory. Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises. Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.	2024-05-24 16:23:58 +08:00
Kaijie Chen	a6f7747d29	[feature](datatype) add BE config to allow zero date (#34961 ) Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>	2024-05-23 19:12:39 +08:00
Gabriel	c23384ff07	[fix](decimal) Fix long string casting to decimalv2 (#35121 )	2024-05-22 14:32:29 +08:00
Ashin Gau	98f8eb5c43	[opt](split) get file splits in batch mode (#34032 ) (#35107 ) bp #34032	2024-05-21 22:27:07 +08:00
Yongqiang YANG	b4a798240a	[fix](inverted_index) donot use int32_t for index id to avoid overflow (#35062 )	2024-05-21 12:58:38 +08:00
lihangyu	e3e5f18f26	[Fix](Json type) correct cast result for json type (#34764 )	2024-05-18 18:40:17 +08:00
zhiqiang	eb7eaee386	[fix](function) money format (#34680 )	2024-05-18 18:35:29 +08:00
HHoflittlefish777	1a24895257	[opt](routine-load) optimize routine load task thread pool and related param(#32282 ) (#34896 )	2024-05-15 12:42:02 +08:00
Sun Chenyang	95b05928fd	[fix](compaction) fix time series compaction merge empty rowsets priority #34562 (#34765 )	2024-05-14 09:10:09 +08:00
zhiqiang	0ae1b9c70a	[chore](remove code) Remove dragonbox related (#34528 ) * Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)" This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019. * run buildall * MORE * FIX	2024-05-13 22:16:57 +08:00
yiguolei	32cbd4a583	[chore](status) unify error code between thrift,pb, status.h (#34397 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-05-10 14:41:01 +08:00
yangshijie	9b712b03b4	[FIX]fix is_ip_address_in_range func with const param (#34266 )	2024-05-10 14:37:20 +08:00
yiguolei	8fdfbcb3c4	Revert "[Opt](func) opt the percentile func performance (#34373 ) (#34416 )" This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.	2024-05-07 07:23:48 +08:00
Chester	f7900b53ce	[enhancement](function) floor/ceil/round/round_bankers can use column as scale argument (#34391 )	2024-05-06 22:18:36 +08:00
HappenLee	509ae425e4	[Opt](func) opt the percentile func performance (#34373 ) (#34416 )	2024-05-06 20:10:35 +08:00
苏小刚	0f0c0a266b	[opt](parquet)Skip page with offset index (#33082 ) Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.	2024-04-26 15:06:16 +08:00
Ashin Gau	c631f4f8a8	[fix](schema change) resolve the use count check of source logical column (#33932 ) Fix error like: ``` 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be 10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block, unsigned long, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514 11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333 12# doris::vectorized::VScanner::get_block(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132 13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99 ``` Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error. ``` if (_logical_converter->is_consistent()) { // If logical converter is consistent, _src_logical_column is the final destination column, // other components will check the use count _src_logical_column.reset(); } ```	2024-04-22 12:31:46 +08:00
Sun Chenyang	7e91e69eb9	[fix](compaction) fix single compaction (#33907 ) * [fix](compaction)Fix single compaction to get all local versions #33849 add test and comment * remove single replica compaction prepare input rowsets reviesd	2024-04-19 23:30:25 +08:00

1 2 3 4 5 ...

1428 Commits