Commit Graph

4055 Commits

Author SHA1 Message Date
378789ba8a [Fix](parquet-reader) Fix dict_filter crashed caused by VDirectInPredicate checking expr result is not nullable. (#17924)
Be crashed in parquet dict_filter function caused by VDirectInPredicate checking expr result is not nullable.
2023-03-20 00:02:59 +08:00
1d26b4d6c2 [improvement](predicate) Cache the dict code in ComparisonPredicate (#17684) 2023-03-19 17:37:28 +08:00
dd53bc1c8d [unify type system](remove unused type desc) remove some code (#17921)
There are many type definitions in BE. Should unify the type system and simplify the development.



---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-19 14:05:02 +08:00
a993ac91d4 [bugfix](jsonb memory leak) there are memory leak in jsonb field (#17922)
* [bugfix](jsonb memory leak) there are memory leak in jsonb field


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-19 14:04:14 +08:00
e359e412e1 [vectorized](udaf) fix java udaf meet error of std::bad_alloc (#17848)
Now if the user code of java udaf throws exception, because c++ code of agg function nobody could deal
with it, so maybe get error of std::bad_alloc
2023-03-19 11:52:15 +08:00
dfa2528b5e [fix](bitmap) fix wrong result of bitmap count functions for null values (#17849)
bitmap count functions result is null when there are null values, which is not right:
2023-03-19 11:49:58 +08:00
e7e13bc338 [optimize](array function) array_apply fucntion vectorized compute column_filter loop (#17687) 2023-03-19 10:18:09 +08:00
d79da2f926 [Fix](parquet-reader) Fix dict filter not enabled. (#17882) 2023-03-18 22:16:37 +08:00
5c5dcfda78 Revert "[enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800)" (#17910)
This reverts commit 17d1c1bc7f6cc95eecd224eaa219c976b60fa17e.
2023-03-17 20:50:01 +08:00
46d88ede02 [Refactor](Metadata tvf) Reconstruct Metadata table-value function into a more general framework. (#17590) 2023-03-17 19:54:50 +08:00
043f77200f [Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null (#17842)
Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior.
Second column alignment logic has bug :

```
template <typename ColumnInserterFn>
void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt,
                                    ColumnInserterFn inserter) {
    CHECK(dst.is_finalized() && src.is_finalized());
    // Use rows() here instead of size(), since size() will check_consistency
    // but we could not check_consistency since num_rows will be upgraded even
    // if src and dst is empty, we just increase the num_rows of dst and fill
    // num_rows of default values when meet new data
    size_t num_rows = dst.rows();
```
2023-03-17 16:53:30 +08:00
b95cd7eca2 [Refactor](function) Reconstruct default logic for const args. (#17830) 2023-03-17 11:13:13 +08:00
5d3de05976 [feature](map) basic functions for map datatype (#16916)
basic functions for map datatype:
- MAP<K, V> map(K k1, V v1, ...)
- BIGINT map_size(MAP<K, V> m)
- BOOL map_contains_key(MAP<K, V> m, K k1)
- BOOL map_contains_value(MAP<K, V> m, V v1)
- ARRAY< K> map_keys(MAP<K, V> m)
- ARRAY< V> map_values(MAP<K, V> m)
2023-03-17 10:28:17 +08:00
b4b126b817 [Feature](parquet-reader) Implements dict filter functionality parquet reader. (#17594)
Implements dict filter functionality parquet reader to improve performance.
2023-03-16 20:29:27 +08:00
c29582bd57 [pipeline](split by segment)support segment split by scanner (#17738)
* support segment split by scanner

* change code by cr
2023-03-16 15:25:52 +08:00
ea943415a0 [bugfix](compaction) remove useless check (#17804)
transient size may not equal to candidate_rowset size.
For example, one rowset has many segment, but size is smaller then
promotion size, this rowset will break pick rowset loop cause compaction
score is enough but will be filtered in level_size check, this will make
 transient size not equal to candidate size.
2023-03-16 15:23:49 +08:00
caed2155f5 [test](fix) use vertorized interface in test (#17649) 2023-03-16 15:23:07 +08:00
ee7226348d [FIX](Map) fix map compaction error (#17795)
When compaction case, memory map offsets coming to  same olap convertor which is from 0 to 0+size
but it should be continue in different pages when in one segment writer . 
eg : 
last block with map offset : [3, 6, 8, ... 100] 
this block with map offset : [5, 10, 15 ..., 100] 
the same convertor should record last offset to make later coming offset followed last offset.
so after convertor : 
the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages
2023-03-16 13:54:01 +08:00
17d1c1bc7f [enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800)
MemPool is about to be removed, replaced by Arena and PODArray.
2023-03-16 09:01:28 +08:00
921e8192b2 [fix](multi-catalog) fix hana jdbc catalog insert error (#17838) 2023-03-16 07:25:19 +08:00
a378a6024d [fix](cooldown)Support change be.conf: max_sub_cache_file_size (#17773)
* delete files when sub file cache size is changed.
2023-03-15 12:19:12 +08:00
bbf88ecc49 [Bug](datetimev2) Fix BE crash if scale is invalid (#17763) 2023-03-15 12:08:23 +08:00
66f3ef568e (functions) optimize const_column to full convert 2023-03-15 10:57:03 +08:00
85080ee3c3 [vectorized](function) support array_map function (#17581) 2023-03-15 10:51:29 +08:00
64c2437be5 [fix](coalesce) support coalesce function for bitmap (#17798) 2023-03-15 09:34:44 +08:00
7180cf3d9b [Improve](row store) avoid serialize null slot into a jsonb row (#17734)
This could save some disk space
2023-03-14 22:13:41 +08:00
ff9e03e2bf [Feature](add bitmap udaf) add the bitmap intersection and difference set for mixed calculation of udaf (#15588)
* Add the bitmap intersection and difference set for mixed calculation of udaf

Co-authored-by: zhangbinbin05 <zhangbinbin05@baidu.com>
2023-03-14 20:40:37 +08:00
f999b823fc [feature](array) support array for apache arrow convertor (#17682)
* support array type for arrow

* fix builder.Append() for each array row

* fix array child column append start offset
2023-03-14 17:53:16 +08:00
77ab2fac20 [refactor](functioncontext) remove function context impl class (#17715)
* [refactor](functioncontext) remove function context impl class


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-03-14 11:21:45 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
2e0af4e33c [Enhancement](inverted-index) use read buffer when read index bytes in compound reader (#17306)
Read IO would be a problem when reading inverted index from disk.
Using read buffer to reduce IO.
Set use buffer flag to be true when reading internal bytes in compound reader for inverted index.
2023-03-14 10:10:59 +08:00
7d91114304 [fix](join) fix wrong result of null aware left anti join (#17752) 2023-03-14 09:35:46 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
c302fa2564 [Feature](array-function) Support array_pushfront function (#17584) 2023-03-13 14:26:02 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
55c42da511 [Feature](array) Support array<decimalv3> data type (#16640) 2023-03-13 10:48:13 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
edb2d90852 [fix](routine load) fix ROUTINE LOAD bug,kafka commit a lack of one(#17282) (#17291)
Co-authored-by: hugoluo <hugoluo@tencent.com>
2023-03-13 10:20:59 +08:00
93a865c3e8 [improvement](join) Avoid reading from left child while hash table is empty(right join) (#17655)
When the right (build) side is empty in a right outer join, there is no need to read data from the left child.
2023-03-13 09:03:17 +08:00
47cfc81925 [fix docs] (#17634)
Co-authored-by: shenshoucheng <shenshoucheng@jd.com>
2023-03-13 08:06:33 +08:00
6386458498 [Refactor](exec) remove unless attr of slot ref (#17688)
Remove unless attr of slot ref
2023-03-12 23:45:32 +08:00
455c800405 [feature](parquet-reader) add rle bool and delta decoder to read AWS Glue (#17112)
Support delta encoding and rle(bool) to read Glue data
add delta bit pack decoder,
add delta length byte array decoder,
add delta byte array decoder.
add rle bool decoder.

We find some data type is read with delta encoding on AWS Glue, so it should be supported.
The definition of delta encoding can refer to the delta encoding in parquet.
2023-03-12 20:09:58 +08:00
Pxl
8328ab69ad [Chore](Materialized-View) add some mv regression test case (#17345)
1. add some mv regression test case
2. rename materialized_view_p0 to mv_p0 (avoid create database failed because long db name)
2023-03-11 10:55:11 +08:00
6dcd791b74 [feature](struct-type) support CAST AS Struct type (#17553)
1. add support `CAST AS Struct` from Struct type;
2. fix crash while `CAST('{}' AS Struct)`;
3. `CAST('' AS complext_type)` should return NULL instead of empty object;

---------

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-03-10 21:21:16 +08:00
2739a44eaf [fix](segcompaction) heap overflow when doing segcompaction for cancelling load(#17529)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-03-10 20:52:05 +08:00
365c8eed7e [fix](function) width_bucket should get min and max from each tuple (#17466) 2023-03-10 13:14:12 +08:00
a79b8ede88 [Bug](ColumnArray) Fix array column replicate replicate_offsets not matched (#17616)
the input replicate_offsets should be the same size as ColumnArray's offset.
```
IColumn::Offsets replicate_offsets(get_offsets().size(), 0);
// |---------------------|-------------------------|-------------------------|
// [0, begin)             [begin, begin + count_sz)  [begin + count_sz, size())
//  do not need to copy    copy counts[n] times       do not need to copy
```

we should
2023-03-10 11:52:22 +08:00
Pxl
1a549edac2 [Chore](third-party) upgrade thrift from 0.13 to 0.16 (#17202)
upgrade thrift from 0.13 to 0.16
There is thrift's release notes https://github.com/apache/thrift/blob/master/CHANGES.md
2023-03-10 11:33:16 +08:00
fcd25b53bf [Optimize](Random distribution) Improve the performance of tablet sin… (#17389)
The current distribution model for Doris is as follows:

OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.

This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.

This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet
2023-03-10 10:52:40 +08:00
e1bf9411de [feature](array function) add support for array_enumerate_uniq (#17541)
add support for array_enumerate_uniq()
2023-03-10 10:20:49 +08:00