doris

Author	SHA1	Message	Date
Mingyu Chen	cb79e42e5c	[refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586 ) See #17764 for details I have tested: - Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp - Outfile to local/s3/hdfs/broker. - Load from local/s3/hdfs/broker. - Query file on local/s3/hdfs/broker file system, with table value function and catalog. - Backup/Restore with local/s3/hdfs/broker file system Not test: - cold & host data separation case.	2023-03-21 21:08:38 +08:00
Mellorsssss	4193884a32	[feature](array_zip) Support array_zip function (#17696 )	2023-03-21 18:44:30 +08:00
Jerry Hu	656b01d191	[fix](agg) Avoid reusing a non-nullable column that has been converted to nullable within a block (#17944 ) 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /root/doris/be/src/common/signal_handler.h:420 1# os::Linux::chained_handler(int, siginfo, void) in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so 2# JVM_handle_linux_signal in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so 3# signalHandler(int, siginfo, void) in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so 4# 0x00007F4051C9F400 in /lib64/libc.so.6 5# memcpy at /root/doris/be/src/glibc-compatibility/memcpy/memcpy_x86_64.cpp:219 6# doris::vectorized::ColumnString::deserialize_and_insert_from_arena(char const) at /root/doris/be/src/vec/columns/column_string.cpp:226 7# doris::vectorized::ColumnString::deserialize_vec_with_null_map(std::vector<StringRef, std::allocator<StringRef> >&, unsigned long, unsigned char const) at /root/doris/be/src/vec/columns/column_string.cpp:283 8# void doris::vectorized::AggregationNode::_serialize_with_serialized_key_result(doris::RuntimeState, doris::vectorized::Block, bool)::{lambda(auto:1&&)#1}::operator()<doris::vectorized::AggregationMethodSerialized<PHHashMap<StringRef, char, DefaultHash<StringRef, void>, false> >&>(doris::vectorized::AggregationMethodSerialized<PHHashMap<StringRef, char, DefaultHash<StringRef, void>, false> >&) const at /root/doris/be/src/vec/exec/vaggregation_node.cpp:1232 9# doris::vectorized::AggregationNode::_serialize_with_serialized_key_result(doris::RuntimeState, doris::vectorized::Block, bool) at /root/doris/be/src/vec/exec/vaggregation_node.cpp:1294 10# std::_Function_handler<doris::Status (doris::RuntimeState, doris::vectorized::Block, bool), std::_Bind_result<doris::Status, doris::Status (doris::vectorized::AggregationNode::(doris::vectorized::AggregationNode, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(doris::RuntimeState, doris::vectorized::Block, bool)> >::_M_invoke(std::_Any_data const&, doris::RuntimeState&&, doris::vectorized::Block&&, bool&&) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:293 11# doris::vectorized::AggregationNode::get_next(doris::RuntimeState, doris::vectorized::Block, bool) at /root/doris/be/src/vec/exec/vaggregation_node.cpp:508 12# doris::ExecNode::get_next_after_projects(doris::RuntimeState, doris::vectorized::Block, bool) at /root/doris/be/src/exec/exec_node.cpp:852 13# doris::PlanFragmentExecutor::get_vectorized_internal(doris::vectorized::Block) at /root/doris/be/src/runtime/plan_fragment_executor.cpp:352 14# doris::PlanFragmentExecutor::open_vectorized_internal() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:300 15# doris::PlanFragmentExecutor::open() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:253 16# doris::FragmentExecState::execute() at /root/doris/be/src/runtime/fragment_mgr.cpp:251 17# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor)>) at /root/doris/be/src/runtime/fragment_mgr.cpp:498 18# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::PlanFragmentExecutor)>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:291 19# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:542 20# doris::Thread::supervise_thread(void) at /root/doris/be/src/util/thread.cpp:455 21# start_thread in /lib64/libpthread.so.0 22# clone in /lib64/libc.so.6	2023-03-21 09:00:06 +08:00
HappenLee	7b93c17364	[Bug][Fix] regexp function core dump DCHECK failed and error result (#17953 ) CREATE TABLE `test` ( `name` varchar(64) NULL, `age` int(11) NULL ) ENGINE=OLAP DUPLICATE KEY(`name`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`name`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ); insert into `test` values ("lemon",1),("tom",2); select a.name regexp concat('^', a.name) from test a;	2023-03-21 08:56:19 +08:00
zhangstar333	dc284b62d9	[vectorized](function) support array_filter function (#17832 )	2023-03-20 23:18:10 +08:00
HappenLee	3661ca4510	[Bug](regression-test) Fix the collect_set/list size limit result failed (#17963 ) fix regression test failed in fuzzy mode:collect_set	2023-03-20 21:00:41 +08:00
Pxl	a92115f709	[Bug](materialized-view) fix select mv rollback fail on left join (#17850 ) fix select mv rollback fail on left join	2023-03-20 19:14:17 +08:00
Gabriel	bd8e3e6405	[refactor](date) unify DateTimeValue and VecDateTimeValue (#17670 )	2023-03-20 16:27:08 +08:00
Qi Chen	378789ba8a	[Fix](parquet-reader) Fix dict_filter crashed caused by VDirectInPredicate checking expr result is not nullable. (#17924 ) Be crashed in parquet dict_filter function caused by VDirectInPredicate checking expr result is not nullable.	2023-03-20 00:02:59 +08:00
yiguolei	dd53bc1c8d	[unify type system](remove unused type desc) remove some code (#17921 ) There are many type definitions in BE. Should unify the type system and simplify the development. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:05:02 +08:00
yiguolei	a993ac91d4	[bugfix](jsonb memory leak) there are memory leak in jsonb field (#17922 ) * [bugfix](jsonb memory leak) there are memory leak in jsonb field --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:04:14 +08:00
zhangstar333	e359e412e1	[vectorized](udaf) fix java udaf meet error of std::bad_alloc (#17848 ) Now if the user code of java udaf throws exception, because c++ code of agg function nobody could deal with it, so maybe get error of std::bad_alloc	2023-03-19 11:52:15 +08:00
TengJianPing	dfa2528b5e	[fix](bitmap) fix wrong result of bitmap count functions for null values (#17849 ) bitmap count functions result is null when there are null values, which is not right:	2023-03-19 11:49:58 +08:00
ZhangYu0123	e7e13bc338	[optimize](array function) array_apply fucntion vectorized compute column_filter loop (#17687 )	2023-03-19 10:18:09 +08:00
Qi Chen	d79da2f926	[Fix](parquet-reader) Fix dict filter not enabled. (#17882 )	2023-03-18 22:16:37 +08:00
TengJianPing	5c5dcfda78	Revert "[enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800 )" (#17910 ) This reverts commit 17d1c1bc7f6cc95eecd224eaa219c976b60fa17e.	2023-03-17 20:50:01 +08:00
Tiewei Fang	46d88ede02	[Refactor](Metadata tvf) Reconstruct Metadata table-value function into a more general framework. (#17590 )	2023-03-17 19:54:50 +08:00
lihangyu	043f77200f	[Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null (#17842 ) Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior. Second column alignment logic has bug : ``` template <typename ColumnInserterFn> void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt, ColumnInserterFn inserter) { CHECK(dst.is_finalized() && src.is_finalized()); // Use rows() here instead of size(), since size() will check_consistency // but we could not check_consistency since num_rows will be upgraded even // if src and dst is empty, we just increase the num_rows of dst and fill // num_rows of default values when meet new data size_t num_rows = dst.rows(); ```	2023-03-17 16:53:30 +08:00
ZhaoChangle	b95cd7eca2	[Refactor](function) Reconstruct default logic for const args. (#17830 )	2023-03-17 11:13:13 +08:00
Kang	5d3de05976	[feature](map) basic functions for map datatype (#16916 ) basic functions for map datatype: - MAP<K, V> map(K k1, V v1, ...) - BIGINT map_size(MAP<K, V> m) - BOOL map_contains_key(MAP<K, V> m, K k1) - BOOL map_contains_value(MAP<K, V> m, V v1) - ARRAY< K> map_keys(MAP<K, V> m) - ARRAY< V> map_values(MAP<K, V> m)	2023-03-17 10:28:17 +08:00
Qi Chen	b4b126b817	[Feature](parquet-reader) Implements dict filter functionality parquet reader. (#17594 ) Implements dict filter functionality parquet reader to improve performance.	2023-03-16 20:29:27 +08:00
HappenLee	c29582bd57	[pipeline](split by segment)support segment split by scanner (#17738 ) * support segment split by scanner * change code by cr	2023-03-16 15:25:52 +08:00
amory	ee7226348d	[FIX](Map) fix map compaction error (#17795 ) When compaction case, memory map offsets coming to same olap convertor which is from 0 to 0+size but it should be continue in different pages when in one segment writer . eg : last block with map offset : [3, 6, 8, ... 100] this block with map offset : [5, 10, 15 ..., 100] the same convertor should record last offset to make later coming offset followed last offset. so after convertor : the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages	2023-03-16 13:54:01 +08:00
Xinyi Zou	17d1c1bc7f	[enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800 ) MemPool is about to be removed, replaced by Arena and PODArray.	2023-03-16 09:01:28 +08:00
Gabriel	bbf88ecc49	[Bug](datetimev2) Fix BE crash if scale is invalid (#17763 )	2023-03-15 12:08:23 +08:00
ZhaoChangle	66f3ef568e	(functions) optimize const_column to full convert	2023-03-15 10:57:03 +08:00
zhangstar333	85080ee3c3	[vectorized](function) support array_map function (#17581 )	2023-03-15 10:51:29 +08:00
TengJianPing	64c2437be5	[fix](coalesce) support coalesce function for bitmap (#17798 )	2023-03-15 09:34:44 +08:00
lihangyu	7180cf3d9b	[Improve](row store) avoid serialize null slot into a jsonb row (#17734 ) This could save some disk space	2023-03-14 22:13:41 +08:00
zhbinbin	ff9e03e2bf	[Feature](add bitmap udaf) add the bitmap intersection and difference set for mixed calculation of udaf (#15588 ) * Add the bitmap intersection and difference set for mixed calculation of udaf Co-authored-by: zhangbinbin05 <zhangbinbin05@baidu.com>	2023-03-14 20:40:37 +08:00
yiguolei	77ab2fac20	[refactor](functioncontext) remove function context impl class (#17715 ) * [refactor](functioncontext) remove function context impl class Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-03-14 11:21:45 +08:00
spaces-x	5b39fa9843	[Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562 ) * [Feature](vectorized)(quantile_state): support vectorized quantile state functions 1. now quantile column only support not nullable 2. add up some regression test cases 3. set default enable_quantile_state_type = true --------- Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-03-14 10:54:04 +08:00
TengJianPing	7d91114304	[fix](join) fix wrong result of null aware left anti join (#17752 )	2023-03-14 09:35:46 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
gitccl	c302fa2564	[Feature](array-function) Support array_pushfront function (#17584 )	2023-03-13 14:26:02 +08:00
Pxl	16fc3a0e22	[Chore](compile) remove some unused static on inline function to reduce compile time (#17603 ) remove some unused static on inline function to reduce compile time	2023-03-13 11:11:59 +08:00
abmdocrt	55c42da511	[Feature](array) Support array<decimalv3> data type (#16640 )	2023-03-13 10:48:13 +08:00
HappenLee	39b5682d59	[Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine	2023-03-13 10:33:57 +08:00
Jerry Hu	93a865c3e8	[improvement](join) Avoid reading from left child while hash table is empty(right join) (#17655 ) When the right (build) side is empty in a right outer join, there is no need to read data from the left child.	2023-03-13 09:03:17 +08:00
Johnny_Sc	47cfc81925	[fix docs] (#17634 ) Co-authored-by: shenshoucheng <shenshoucheng@jd.com>	2023-03-13 08:06:33 +08:00
HappenLee	6386458498	[Refactor](exec) remove unless attr of slot ref (#17688 ) Remove unless attr of slot ref	2023-03-12 23:45:32 +08:00
slothever	455c800405	[feature](parquet-reader) add rle bool and delta decoder to read AWS Glue (#17112 ) Support delta encoding and rle(bool) to read Glue data add delta bit pack decoder, add delta length byte array decoder, add delta byte array decoder. add rle bool decoder. We find some data type is read with delta encoding on AWS Glue, so it should be supported. The definition of delta encoding can refer to the delta encoding in parquet.	2023-03-12 20:09:58 +08:00
camby	6dcd791b74	[feature](struct-type) support CAST AS Struct type (#17553 ) 1. add support `CAST AS Struct` from Struct type; 2. fix crash while `CAST('{}' AS Struct)`; 3. `CAST('' AS complext_type)` should return NULL instead of empty object; --------- Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-03-10 21:21:16 +08:00
morrySnow	365c8eed7e	[fix](function) width_bucket should get min and max from each tuple (#17466 )	2023-03-10 13:14:12 +08:00
lihangyu	a79b8ede88	[Bug](ColumnArray) Fix array column replicate `replicate_offsets` not matched (#17616 ) the input replicate_offsets should be the same size as ColumnArray's offset. ``` IColumn::Offsets replicate_offsets(get_offsets().size(), 0); // \|---------------------\|-------------------------\|-------------------------\| // [0, begin) [begin, begin + count_sz) [begin + count_sz, size()) // do not need to copy copy counts[n] times do not need to copy ``` we should	2023-03-10 11:52:22 +08:00
lihangyu	fcd25b53bf	[Optimize](Random distribution) Improve the performance of tablet sin… (#17389 ) The current distribution model for Doris is as follows: OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block. This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable. This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet	2023-03-10 10:52:40 +08:00
bobhan1	e1bf9411de	[feature](array function) add support for array_enumerate_uniq (#17541 ) add support for array_enumerate_uniq()	2023-03-10 10:20:49 +08:00
Xinyi Zou	f9baf9c556	[improvement](scan) Support pushdown execute expr ctx (#15917 ) In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process: Read part of the column first, and calculate the row ids with a simple push-down predicate. Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates. This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process: Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above) Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again. Use row ids to read the remaining columns and pass them to the scanner.	2023-03-10 08:35:32 +08:00
zhangstar333	4ef46159ae	[vectorized](udaf) support array type for java-udaf (#17351 )	2023-03-09 11:30:07 +08:00
amory	06dee69174	[Refactor](map) remove using column array in map to reduce offset column (#17330 ) 1. remove column array in map 2. add offsets column in map Aim to reduce duplicate offset from key-array and value-array in disk	2023-03-09 11:22:26 +08:00

1 2 3 4 5 ...

1402 Commits