doris

Author	SHA1	Message	Date
Qi Chen	378789ba8a	[Fix](parquet-reader) Fix dict_filter crashed caused by VDirectInPredicate checking expr result is not nullable. (#17924 ) Be crashed in parquet dict_filter function caused by VDirectInPredicate checking expr result is not nullable.	2023-03-20 00:02:59 +08:00
Jerry Hu	1d26b4d6c2	[improvement](predicate) Cache the dict code in ComparisonPredicate (#17684 )	2023-03-19 17:37:28 +08:00
yiguolei	dd53bc1c8d	[unify type system](remove unused type desc) remove some code (#17921 ) There are many type definitions in BE. Should unify the type system and simplify the development. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:05:02 +08:00
yiguolei	a993ac91d4	[bugfix](jsonb memory leak) there are memory leak in jsonb field (#17922 ) * [bugfix](jsonb memory leak) there are memory leak in jsonb field --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:04:14 +08:00
zhangstar333	e359e412e1	[vectorized](udaf) fix java udaf meet error of std::bad_alloc (#17848 ) Now if the user code of java udaf throws exception, because c++ code of agg function nobody could deal with it, so maybe get error of std::bad_alloc	2023-03-19 11:52:15 +08:00
TengJianPing	dfa2528b5e	[fix](bitmap) fix wrong result of bitmap count functions for null values (#17849 ) bitmap count functions result is null when there are null values, which is not right:	2023-03-19 11:49:58 +08:00
ZhangYu0123	e7e13bc338	[optimize](array function) array_apply fucntion vectorized compute column_filter loop (#17687 )	2023-03-19 10:18:09 +08:00
Qi Chen	d79da2f926	[Fix](parquet-reader) Fix dict filter not enabled. (#17882 )	2023-03-18 22:16:37 +08:00
TengJianPing	5c5dcfda78	Revert "[enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800 )" (#17910 ) This reverts commit 17d1c1bc7f6cc95eecd224eaa219c976b60fa17e.	2023-03-17 20:50:01 +08:00
Tiewei Fang	46d88ede02	[Refactor](Metadata tvf) Reconstruct Metadata table-value function into a more general framework. (#17590 )	2023-03-17 19:54:50 +08:00
lihangyu	043f77200f	[Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null (#17842 ) Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior. Second column alignment logic has bug : ``` template <typename ColumnInserterFn> void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt, ColumnInserterFn inserter) { CHECK(dst.is_finalized() && src.is_finalized()); // Use rows() here instead of size(), since size() will check_consistency // but we could not check_consistency since num_rows will be upgraded even // if src and dst is empty, we just increase the num_rows of dst and fill // num_rows of default values when meet new data size_t num_rows = dst.rows(); ```	2023-03-17 16:53:30 +08:00
ZhaoChangle	b95cd7eca2	[Refactor](function) Reconstruct default logic for const args. (#17830 )	2023-03-17 11:13:13 +08:00
Kang	5d3de05976	[feature](map) basic functions for map datatype (#16916 ) basic functions for map datatype: - MAP<K, V> map(K k1, V v1, ...) - BIGINT map_size(MAP<K, V> m) - BOOL map_contains_key(MAP<K, V> m, K k1) - BOOL map_contains_value(MAP<K, V> m, V v1) - ARRAY< K> map_keys(MAP<K, V> m) - ARRAY< V> map_values(MAP<K, V> m)	2023-03-17 10:28:17 +08:00
Qi Chen	b4b126b817	[Feature](parquet-reader) Implements dict filter functionality parquet reader. (#17594 ) Implements dict filter functionality parquet reader to improve performance.	2023-03-16 20:29:27 +08:00
HappenLee	c29582bd57	[pipeline](split by segment)support segment split by scanner (#17738 ) * support segment split by scanner * change code by cr	2023-03-16 15:25:52 +08:00
yixiutt	ea943415a0	[bugfix](compaction) remove useless check (#17804 ) transient size may not equal to candidate_rowset size. For example, one rowset has many segment, but size is smaller then promotion size, this rowset will break pick rowset loop cause compaction score is enough but will be filtered in level_size check, this will make transient size not equal to candidate size.	2023-03-16 15:23:49 +08:00
yixiutt	caed2155f5	[test](fix) use vertorized interface in test (#17649 )	2023-03-16 15:23:07 +08:00
amory	ee7226348d	[FIX](Map) fix map compaction error (#17795 ) When compaction case, memory map offsets coming to same olap convertor which is from 0 to 0+size but it should be continue in different pages when in one segment writer . eg : last block with map offset : [3, 6, 8, ... 100] this block with map offset : [5, 10, 15 ..., 100] the same convertor should record last offset to make later coming offset followed last offset. so after convertor : the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages	2023-03-16 13:54:01 +08:00
Xinyi Zou	17d1c1bc7f	[enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800 ) MemPool is about to be removed, replaced by Arena and PODArray.	2023-03-16 09:01:28 +08:00
yongkang.zhong	921e8192b2	[fix](multi-catalog) fix hana jdbc catalog insert error (#17838 )	2023-03-16 07:25:19 +08:00
pengxiangyu	a378a6024d	[fix](cooldown)Support change be.conf: max_sub_cache_file_size (#17773 ) * delete files when sub file cache size is changed.	2023-03-15 12:19:12 +08:00
Gabriel	bbf88ecc49	[Bug](datetimev2) Fix BE crash if scale is invalid (#17763 )	2023-03-15 12:08:23 +08:00
ZhaoChangle	66f3ef568e	(functions) optimize const_column to full convert	2023-03-15 10:57:03 +08:00
zhangstar333	85080ee3c3	[vectorized](function) support array_map function (#17581 )	2023-03-15 10:51:29 +08:00
TengJianPing	64c2437be5	[fix](coalesce) support coalesce function for bitmap (#17798 )	2023-03-15 09:34:44 +08:00
lihangyu	7180cf3d9b	[Improve](row store) avoid serialize null slot into a jsonb row (#17734 ) This could save some disk space	2023-03-14 22:13:41 +08:00
zhbinbin	ff9e03e2bf	[Feature](add bitmap udaf) add the bitmap intersection and difference set for mixed calculation of udaf (#15588 ) * Add the bitmap intersection and difference set for mixed calculation of udaf Co-authored-by: zhangbinbin05 <zhangbinbin05@baidu.com>	2023-03-14 20:40:37 +08:00
Kang	f999b823fc	[feature](array) support array for apache arrow convertor (#17682 ) * support array type for arrow * fix builder.Append() for each array row * fix array child column append start offset	2023-03-14 17:53:16 +08:00
yiguolei	77ab2fac20	[refactor](functioncontext) remove function context impl class (#17715 ) * [refactor](functioncontext) remove function context impl class Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-03-14 11:21:45 +08:00
spaces-x	5b39fa9843	[Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562 ) * [Feature](vectorized)(quantile_state): support vectorized quantile state functions 1. now quantile column only support not nullable 2. add up some regression test cases 3. set default enable_quantile_state_type = true --------- Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-03-14 10:54:04 +08:00
airborne12	2e0af4e33c	[Enhancement](inverted-index) use read buffer when read index bytes in compound reader (#17306 ) Read IO would be a problem when reading inverted index from disk. Using read buffer to reduce IO. Set use buffer flag to be true when reading internal bytes in compound reader for inverted index.	2023-03-14 10:10:59 +08:00
TengJianPing	7d91114304	[fix](join) fix wrong result of null aware left anti join (#17752 )	2023-03-14 09:35:46 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
gitccl	c302fa2564	[Feature](array-function) Support array_pushfront function (#17584 )	2023-03-13 14:26:02 +08:00
Pxl	16fc3a0e22	[Chore](compile) remove some unused static on inline function to reduce compile time (#17603 ) remove some unused static on inline function to reduce compile time	2023-03-13 11:11:59 +08:00
abmdocrt	55c42da511	[Feature](array) Support array<decimalv3> data type (#16640 )	2023-03-13 10:48:13 +08:00
HappenLee	39b5682d59	[Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine	2023-03-13 10:33:57 +08:00
yuxuan-luo	edb2d90852	[fix](routine load) fix ROUTINE LOAD bug,kafka commit a lack of one(#17282 ) (#17291 ) Co-authored-by: hugoluo <hugoluo@tencent.com>	2023-03-13 10:20:59 +08:00
Jerry Hu	93a865c3e8	[improvement](join) Avoid reading from left child while hash table is empty(right join) (#17655 ) When the right (build) side is empty in a right outer join, there is no need to read data from the left child.	2023-03-13 09:03:17 +08:00
Johnny_Sc	47cfc81925	[fix docs] (#17634 ) Co-authored-by: shenshoucheng <shenshoucheng@jd.com>	2023-03-13 08:06:33 +08:00
HappenLee	6386458498	[Refactor](exec) remove unless attr of slot ref (#17688 ) Remove unless attr of slot ref	2023-03-12 23:45:32 +08:00
slothever	455c800405	[feature](parquet-reader) add rle bool and delta decoder to read AWS Glue (#17112 ) Support delta encoding and rle(bool) to read Glue data add delta bit pack decoder, add delta length byte array decoder, add delta byte array decoder. add rle bool decoder. We find some data type is read with delta encoding on AWS Glue, so it should be supported. The definition of delta encoding can refer to the delta encoding in parquet.	2023-03-12 20:09:58 +08:00
Pxl	8328ab69ad	[Chore](Materialized-View) add some mv regression test case (#17345 ) 1. add some mv regression test case 2. rename materialized_view_p0 to mv_p0 (avoid create database failed because long db name)	2023-03-11 10:55:11 +08:00
camby	6dcd791b74	[feature](struct-type) support CAST AS Struct type (#17553 ) 1. add support `CAST AS Struct` from Struct type; 2. fix crash while `CAST('{}' AS Struct)`; 3. `CAST('' AS complext_type)` should return NULL instead of empty object; --------- Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-03-10 21:21:16 +08:00
zhengyu	2739a44eaf	[fix](segcompaction) heap overflow when doing segcompaction for cancelling load(#17529 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-03-10 20:52:05 +08:00
morrySnow	365c8eed7e	[fix](function) width_bucket should get min and max from each tuple (#17466 )	2023-03-10 13:14:12 +08:00
lihangyu	a79b8ede88	[Bug](ColumnArray) Fix array column replicate `replicate_offsets` not matched (#17616 ) the input replicate_offsets should be the same size as ColumnArray's offset. ``` IColumn::Offsets replicate_offsets(get_offsets().size(), 0); // \|---------------------\|-------------------------\|-------------------------\| // [0, begin) [begin, begin + count_sz) [begin + count_sz, size()) // do not need to copy copy counts[n] times do not need to copy ``` we should	2023-03-10 11:52:22 +08:00
Pxl	1a549edac2	[Chore](third-party) upgrade thrift from 0.13 to 0.16 (#17202 ) upgrade thrift from 0.13 to 0.16 There is thrift's release notes https://github.com/apache/thrift/blob/master/CHANGES.md	2023-03-10 11:33:16 +08:00
lihangyu	fcd25b53bf	[Optimize](Random distribution) Improve the performance of tablet sin… (#17389 ) The current distribution model for Doris is as follows: OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block. This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable. This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet	2023-03-10 10:52:40 +08:00
bobhan1	e1bf9411de	[feature](array function) add support for array_enumerate_uniq (#17541 ) add support for array_enumerate_uniq()	2023-03-10 10:20:49 +08:00

1 2 3 4 5 ...

4055 Commits