doris

Author	SHA1	Message	Date
Pxl	16fc3a0e22	[Chore](compile) remove some unused static on inline function to reduce compile time (#17603 ) remove some unused static on inline function to reduce compile time	2023-03-13 11:11:59 +08:00
abmdocrt	55c42da511	[Feature](array) Support array<decimalv3> data type (#16640 )	2023-03-13 10:48:13 +08:00
HappenLee	39b5682d59	[Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine	2023-03-13 10:33:57 +08:00
yuxuan-luo	edb2d90852	[fix](routine load) fix ROUTINE LOAD bug,kafka commit a lack of one(#17282 ) (#17291 ) Co-authored-by: hugoluo <hugoluo@tencent.com>	2023-03-13 10:20:59 +08:00
Jerry Hu	93a865c3e8	[improvement](join) Avoid reading from left child while hash table is empty(right join) (#17655 ) When the right (build) side is empty in a right outer join, there is no need to read data from the left child.	2023-03-13 09:03:17 +08:00
Johnny_Sc	47cfc81925	[fix docs] (#17634 ) Co-authored-by: shenshoucheng <shenshoucheng@jd.com>	2023-03-13 08:06:33 +08:00
HappenLee	6386458498	[Refactor](exec) remove unless attr of slot ref (#17688 ) Remove unless attr of slot ref	2023-03-12 23:45:32 +08:00
slothever	455c800405	[feature](parquet-reader) add rle bool and delta decoder to read AWS Glue (#17112 ) Support delta encoding and rle(bool) to read Glue data add delta bit pack decoder, add delta length byte array decoder, add delta byte array decoder. add rle bool decoder. We find some data type is read with delta encoding on AWS Glue, so it should be supported. The definition of delta encoding can refer to the delta encoding in parquet.	2023-03-12 20:09:58 +08:00
Pxl	8328ab69ad	[Chore](Materialized-View) add some mv regression test case (#17345 ) 1. add some mv regression test case 2. rename materialized_view_p0 to mv_p0 (avoid create database failed because long db name)	2023-03-11 10:55:11 +08:00
camby	6dcd791b74	[feature](struct-type) support CAST AS Struct type (#17553 ) 1. add support `CAST AS Struct` from Struct type; 2. fix crash while `CAST('{}' AS Struct)`; 3. `CAST('' AS complext_type)` should return NULL instead of empty object; --------- Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-03-10 21:21:16 +08:00
zhengyu	2739a44eaf	[fix](segcompaction) heap overflow when doing segcompaction for cancelling load(#17529 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-03-10 20:52:05 +08:00
morrySnow	365c8eed7e	[fix](function) width_bucket should get min and max from each tuple (#17466 )	2023-03-10 13:14:12 +08:00
lihangyu	a79b8ede88	[Bug](ColumnArray) Fix array column replicate `replicate_offsets` not matched (#17616 ) the input replicate_offsets should be the same size as ColumnArray's offset. ``` IColumn::Offsets replicate_offsets(get_offsets().size(), 0); // \|---------------------\|-------------------------\|-------------------------\| // [0, begin) [begin, begin + count_sz) [begin + count_sz, size()) // do not need to copy copy counts[n] times do not need to copy ``` we should	2023-03-10 11:52:22 +08:00
Pxl	1a549edac2	[Chore](third-party) upgrade thrift from 0.13 to 0.16 (#17202 ) upgrade thrift from 0.13 to 0.16 There is thrift's release notes https://github.com/apache/thrift/blob/master/CHANGES.md	2023-03-10 11:33:16 +08:00
lihangyu	fcd25b53bf	[Optimize](Random distribution) Improve the performance of tablet sin… (#17389 ) The current distribution model for Doris is as follows: OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block. This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable. This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet	2023-03-10 10:52:40 +08:00
bobhan1	e1bf9411de	[feature](array function) add support for array_enumerate_uniq (#17541 ) add support for array_enumerate_uniq()	2023-03-10 10:20:49 +08:00
huangzhaowei	4ba93efc98	[Enhance](DOE)Support parse default es iso datetime string (#17412 ) * support parse default es iso datetime string	2023-03-10 09:59:20 +08:00
WenYao	a745ab1703	[fix](schema scanner) fix query some schema table report invalid parameter (#17626 ) Example: SELECT ROUTINE_SCHEMA AS PROCEDURE_CAT, NULL AS PROCEDURE_SCHEM,ROUTINE_NAME AS PROCEDURE_NAME,NULL AS NUM_INPUT_PARAMS,NULL AS NUM_OUTPUT_PARAMS,NULL AS NUM_RESULT_SETS,ROUTINE_COMMENT AS REMARKS,IF(ROUTINE_TYPE = 'FUNCTION', 2,IF(ROUTINE_TYPE= 'PROCEDURE', 1, 0)) AS PROCEDURE_TYPE FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_SCHEMA = DATABASE(); ERROR 1105 (HY000): errCode = 2, detailMessage = invalid parameter This wrong and some BI tools could not work correctly.	2023-03-10 08:52:09 +08:00
Jerry Hu	08f0170895	[fix](olap) The 'scan key' generated by the 'is null' expression causes incorrect query results (#17569 )	2023-03-10 08:51:06 +08:00
Xinyi Zou	f9baf9c556	[improvement](scan) Support pushdown execute expr ctx (#15917 ) In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process: Read part of the column first, and calculate the row ids with a simple push-down predicate. Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates. This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process: Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above) Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again. Use row ids to read the remaining columns and pass them to the scanner.	2023-03-10 08:35:32 +08:00
xueweizhang	0334cde2b1	[fix](merge-on-write) when if publish and be down, need recalc delete bitmap for MoW (#17617 ) * (merge-on-write) when if publish and be down, need recalc delete bitmap for MoW Signed-off-by: nextdreamblue <zxw520blue1@163.com> * fix code Signed-off-by: nextdreamblue <zxw520blue1@163.com> --------- Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-03-10 07:55:00 +08:00
zhangstar333	e80ae0367a	[improvement](be) add a name for be jvm (#17595 )	2023-03-09 23:27:15 +08:00
Xin Liao	849b5b7b8f	[fix](sequence) fix that the result is wrong when load multiple duplicate keys (#17575 )	2023-03-09 20:59:23 +08:00
Gabriel	0432ba8b33	[refactor](status) refactor status judgement (#17592 )	2023-03-09 17:40:25 +08:00
YueW	5d26a12312	[fix](inverted index) fix missing several numeric types for inverted index query (#17359 )	2023-03-09 16:34:06 +08:00
zhangstar333	4ef46159ae	[vectorized](udaf) support array type for java-udaf (#17351 )	2023-03-09 11:30:07 +08:00
amory	06dee69174	[Refactor](map) remove using column array in map to reduce offset column (#17330 ) 1. remove column array in map 2. add offsets column in map Aim to reduce duplicate offset from key-array and value-array in disk	2023-03-09 11:22:26 +08:00
lihangyu	368e6a4f9c	[Bug](array filter) Fix bug due to `ColumnArray::filter_generic` invalid inplace `size_at` after `set_end_ptr` (#17554 ) We should make a new PodArray to add items instead of do it inplace	2023-03-09 10:59:29 +08:00
luozenglin	00727e8c11	[fix](in-bitmap) fix result may be wrong if the left side of the in bitmap predicate is a constant (#17570 )	2023-03-09 10:59:05 +08:00
Pxl	65b8dfc7ff	[Enchancement](function) Inline some aggregate function && remove nullable combinator (#17328 ) 1. Inline some aggregate function 2. remove nullable combinator	2023-03-09 10:39:04 +08:00
zxealous	6923bf8d7b	[fix](file cache)fix block file cache can't be configured (#17511 )	2023-03-09 10:12:08 +08:00
Xinyi Zou	397cc011c4	[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420 ) ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided. Solved: 192/256 supports calculation without init vector For other algorithms, an error should be reported when there is no init vector Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector. Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found	2023-03-09 09:51:41 +08:00
starocean999	2b6d971c2f	[fix](nereids)fix first_value/lead/lag window function bug in nereids (#17315 ) * [fix](nereids)fix first_value/lead/lag window function bug in nereids * add more test * add order by to fix test case * fix test cases	2023-03-09 09:35:27 +08:00
zhannngchen	2cf90ddfc5	[fix](scanner) remove useless _src_block_mem_reuse to avoid core dump while loading (#17559 ) The _src_block_mem_reuse variable actually not work, since the _src_block is cleared each time when we call get_block. But current code may cause core dump, see issue #17587. Because we insert some result column generated by expr into dest block, and such a column holds a pointer to some column in original schema. When clearing the data of _src_block, some column's data in dest block is also cleared. e.g. coalesce will return a result column which holds a pointer to some original column, see issue #17588	2023-03-09 09:26:32 +08:00
ElvinWei	bd5ed2b0c2	[enhancement](histogram) optimize the histogram bucketing strategy, etc (#17264 ) * optimize the histogram bucketing strategy, etc * fix p0 regression of histogram	2023-03-08 20:12:05 +08:00
TengJianPing	eea6d770d7	[fix](bitmap) fix wrong result of bitmap_or for null (#17456 ) Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null. This PR fix logic of bitmap_or and bitmap_or_count. Other count related funcitons should also be checked and fix, they will be fixed in another PR.	2023-03-08 16:29:01 +08:00
AlexYue	f3b50b3472	[enhance](cooldown) skip once failed follow cooldown tablet (#16810 )	2023-03-08 14:14:13 +08:00
Xin Liao	8001d65811	[fix](insert) fix memory leak for insert transaction (#17530 )	2023-03-08 14:10:59 +08:00
AlexYue	273d2100ac	[enhance](cooldown) turn write cooldown meta async (#16813 )	2023-03-08 14:06:21 +08:00
qiye	3a877857ae	[improvement](inverted index)Remove searcher bitmap timer to improve query speed (#17407 ) Timer becomes a bottleneck when the query hit volume is very high.	2023-03-08 14:03:36 +08:00
Xinyi Zou	335c1e5953	[fix](memory) Fix MacOS mem_limit parse error and GC after env Init #17528 Fix MacOS mem_limit parse result is 0. Fix GC after env Init, otherwise, when the memory is insufficient, BE will start failure. * Query id: 0-0 * * Aborted at 1677833773 (unix time) try "date -d @1677833773" if you are using GNU date * * Current BE git commitID: 8ee5f45 * * SIGSEGV address not mapped to object (@0x70) received by PID 24145 (TID 0x7fa53c9fd700) from PID 112; stack trace: * 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at be/src/common/signal_handler.h:420 1# os::Linux::chained_handler(int, siginfo, void) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so 2# JVM_handle_linux_signal in /usr/local/jdk/jre/lib/amd64/server/libjvm.so 3# signalHandler(int, siginfo, void) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so 4# 0x00007FA56295A400 in /lib64/libc.so.6 5# doris::MemTrackerLimiter::log_process_usage_str(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:208 6# doris::MemTrackerLimiter::print_log_process_usage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:226 7# doris::Daemon::memory_maintenance_thread() at be/src/common/daemon.cpp:245 8# doris::Thread::supervise_thread(void*) at be/src/util/thread.cpp:455 9# start_thread in /lib64/libpthread.so.0 10# clone in /lib64/libc.so.6	2023-03-08 14:00:57 +08:00
bobhan1	4ea0d6c5fa	[feature](array_function) add support for array_popfront (#17416 )	2023-03-08 13:57:38 +08:00
gitccl	b1d65f855d	[Feature](array-function) Support array_concat function (#17436 )	2023-03-08 13:57:16 +08:00
Pxl	e2ac06d6d6	[Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300 ) 1. change PipelineTaskState to enum class 2. remove some row-based code on FoldConstantExecutor::_get_result 3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned) 4. add Wunused-template check, and remove some unused function, change some static function to inline function.	2023-03-08 12:41:15 +08:00
TengJianPing	778acb3c5b	[opt](string) optimize string equal comparision (#17336 ) Optimize string equal and not-equal comparison by using memequal_small_allow_overflow15.	2023-03-08 11:30:00 +08:00
yiguolei	9213dd906a	[enhancement](exception) add exception structure and using unique ptr in VExplodeBitmapTableFunction (#17531 ) add exception class in common. using unique ptr in VExplodeBitmapTableFunction support single exception or nested exception, like this: ---SingleException [E-100] test OS_ERROR bug @ 0x55e80b93c0d9 doris::Exception::Exception<>() @ 0x55e80b938df1 doris::ExceptionTest_NestedError_Test::TestBody() @ 0x55e82e16bafb testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x55e82e15ab3a testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x55e82e1361e3 testing::Test::Run() @ 0x55e82e136f29 testing::TestInfo::Run() @ 0x55e82e1376e4 testing::TestSuite::Run() @ 0x55e82e148042 testing::internal::UnitTestImpl::RunAllTests() @ 0x55e82e16dcab testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x55e82e15ce4a testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x55e82e147bab testing::UnitTest::Run() @ 0x55e80c4b39e3 RUN_ALL_TESTS() @ 0x55e80c4a99b5 main @ 0x7f0a619d0493 __libc_start_main @ 0x55e80b84602a _start @ (nil) (unknown)	2023-03-08 10:44:14 +08:00
yiguolei	4692d6764c	[refactor](remove string val) remove string val structure, it is same with string ref (#17461 ) remove stringval, decimalv2val, bigintval	2023-03-08 10:42:20 +08:00
qiye	a767472c56	[fix](DOE)Fix es p0 case error (#17502 ) Fix es array parse error, introduced by #16806	2023-03-08 08:06:30 +08:00
htyoung	69c62b6c6c	[Fix](vectorization) fixed that when a column's _fixed_values exceeds the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong (#17405 ) the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong Co-authored-by: tongyang.hty <hantongyang@douyu.tv>	2023-03-08 07:23:56 +08:00
zxealous	5334a5899e	[fix](remote)fix whole file cache and sub file cache (#17468 )	2023-03-07 19:55:18 +08:00

1 2 3 4 5 ...

4021 Commits