doris

Author	SHA1	Message	Date
Zhengguo Yang	4335c9998f	[chore](ARM) Add some vectorization compatibility code on aarch64 (#18553 ) update sse2noen to support more sse code on arm cpus	2023-04-13 10:15:33 +08:00
zclllyybb	43392918cd	[Optimization](functions)Optimize function call for const columns. (#18310 )	2023-04-12 11:11:01 +08:00
Pxl	c9b4eaea76	[Chore](storage) change FieldType to enum class #18500	2023-04-10 08:53:44 +08:00
yiguolei	f38e00b4c0	[refactor](typesystem) using typeindex to create column instead of type name because type name is not stable (#18328 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-09 18:08:31 +08:00
amory	30f2abe5d3	[FIX](Map)fix calculate map offset in olap convertor (#18295 ) Fix be core when load bigger kv data in one row for map.	2023-04-07 17:04:08 +08:00
Jerry Hu	66a0c090b8	[fix](column) Add unimplemented replicate function in ColumnStruct (#18368 )	2023-04-06 09:50:27 +08:00
zclllyybb	f800ba8f4c	[Exec](opt) Optimize function call for const columns (#18212 )	2023-03-31 11:36:21 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
Qi Chen	6b6682cd96	[Enhancement](Expr) Opt In Set by small size fixed container to improve performance. (#17976 )	2023-03-28 23:10:39 +08:00
TengJianPing	78abb40fdc	[improvement](string) throw exception instead of log fatal if string column exceed total size limit (#17989 ) Throw exception instead of log fatal if string column exceed total size limit, so that we can catch it and let query fail, instead of causing be exit.	2023-03-27 08:55:26 +08:00
Pxl	a8753faeb1	[Bug](function) fix column complex not resize after filter (#18043 )	2023-03-25 21:48:13 +08:00
yiguolei	7ae51c856e	[refactor](unify exception) unify exception definition and error code (#18006 ) * [refactor](unify exception) unify exception definition and error code --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-25 12:41:07 +08:00
Gabriel	e8b9587fe6	[Improvement](dict) compute hash only if needed (#18058 )	2023-03-24 11:45:58 +08:00
hqx871	1999cccde9	[feature](array-type) Unique table support array value (#17024 ) Unique table support array value --------- Co-authored-by: huangqixiang.871 <huangqixiang.871@bytedance.com>	2023-03-24 10:18:59 +08:00
Pxl	40ca250678	[Feature](materialized-view) support where clause on create materialized view (#17534 ) support where clause on create materialized view	2023-03-22 11:25:13 +08:00
Mellorsssss	4193884a32	[feature](array_zip) Support array_zip function (#17696 )	2023-03-21 18:44:30 +08:00
Gabriel	bd8e3e6405	[refactor](date) unify DateTimeValue and VecDateTimeValue (#17670 )	2023-03-20 16:27:08 +08:00
yiguolei	dd53bc1c8d	[unify type system](remove unused type desc) remove some code (#17921 ) There are many type definitions in BE. Should unify the type system and simplify the development. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:05:02 +08:00
TengJianPing	5c5dcfda78	Revert "[enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800 )" (#17910 ) This reverts commit 17d1c1bc7f6cc95eecd224eaa219c976b60fa17e.	2023-03-17 20:50:01 +08:00
lihangyu	043f77200f	[Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null (#17842 ) Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior. Second column alignment logic has bug : ``` template <typename ColumnInserterFn> void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt, ColumnInserterFn inserter) { CHECK(dst.is_finalized() && src.is_finalized()); // Use rows() here instead of size(), since size() will check_consistency // but we could not check_consistency since num_rows will be upgraded even // if src and dst is empty, we just increase the num_rows of dst and fill // num_rows of default values when meet new data size_t num_rows = dst.rows(); ```	2023-03-17 16:53:30 +08:00
Kang	5d3de05976	[feature](map) basic functions for map datatype (#16916 ) basic functions for map datatype: - MAP<K, V> map(K k1, V v1, ...) - BIGINT map_size(MAP<K, V> m) - BOOL map_contains_key(MAP<K, V> m, K k1) - BOOL map_contains_value(MAP<K, V> m, V v1) - ARRAY< K> map_keys(MAP<K, V> m) - ARRAY< V> map_values(MAP<K, V> m)	2023-03-17 10:28:17 +08:00
Xinyi Zou	17d1c1bc7f	[enhancement](memory) PODArray replaces MemPool in PredicateColumn (#17800 ) MemPool is about to be removed, replaced by Arena and PODArray.	2023-03-16 09:01:28 +08:00
spaces-x	5b39fa9843	[Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562 ) * [Feature](vectorized)(quantile_state): support vectorized quantile state functions 1. now quantile column only support not nullable 2. add up some regression test cases 3. set default enable_quantile_state_type = true --------- Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-03-14 10:54:04 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
lihangyu	a79b8ede88	[Bug](ColumnArray) Fix array column replicate `replicate_offsets` not matched (#17616 ) the input replicate_offsets should be the same size as ColumnArray's offset. ``` IColumn::Offsets replicate_offsets(get_offsets().size(), 0); // \|---------------------\|-------------------------\|-------------------------\| // [0, begin) [begin, begin + count_sz) [begin + count_sz, size()) // do not need to copy copy counts[n] times do not need to copy ``` we should	2023-03-10 11:52:22 +08:00
amory	06dee69174	[Refactor](map) remove using column array in map to reduce offset column (#17330 ) 1. remove column array in map 2. add offsets column in map Aim to reduce duplicate offset from key-array and value-array in disk	2023-03-09 11:22:26 +08:00
lihangyu	368e6a4f9c	[Bug](array filter) Fix bug due to `ColumnArray::filter_generic` invalid inplace `size_at` after `set_end_ptr` (#17554 ) We should make a new PodArray to add items instead of do it inplace	2023-03-09 10:59:29 +08:00
Pxl	e2ac06d6d6	[Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300 ) 1. change PipelineTaskState to enum class 2. remove some row-based code on FoldConstantExecutor::_get_result 3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned) 4. add Wunused-template check, and remove some unused function, change some static function to inline function.	2023-03-08 12:41:15 +08:00
ZhangYu0123	8ccc805cd0	[Fix](Lightweight schema Change) query error caused by array default type is unsupported (#17331 ) We have supportted array type default [], but when using lightweight schema Change to add column array type, query failed as follows: Fix "array default type is unsupported" error. Fix the default value filling assignment digit problem.	2023-03-07 16:30:41 +08:00
Jerry Hu	caacee253d	[fix](olap)Crashing caused by IS NULL expression (#17463 ) Issue Number: close #17462	2023-03-07 15:32:52 +08:00
ZhaoChangle	e82b827bc8	[optimize](vectorization)Optimize to_string's performance. (#17076 )	2023-03-03 10:35:59 +08:00
xy720	48ef61780d	[refactor](struct-type) refactor and clean unused code for struct type (#17257 ) remove unused code for struct type	2023-03-01 15:49:31 +08:00
Jerry Hu	a1db5c6f52	[fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215 )	2023-02-28 15:27:06 +08:00
xy720	91fc9fae8e	[Bug](complex-type) Fix is null predicate in delete stmt for array/struct/map type (#17018 )	2023-02-23 15:06:49 +08:00
Jerry Hu	08adf914f9	[improvement](vec) avoid creating a new column while filtering mutable columns (#16850 ) Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.	2023-02-21 09:47:21 +08:00
Qi Chen	ef2fdb79bb	[Improvement](parquet-reader) Optimize and refactor parquet reader to improve performance. (#16818 ) Optimize and refactor parquet reader to improve performance. - Improve 2x performance for small dict string by aligned copying. - Refactor code to decrease condition(if) checking. - Don't call skip(0). - Don't read page index if no condition. ssb-flat-100: (single-machine, single-thread) \| Query \| before opt \| after opt \| \| ------------- \|:-------------:\| ---------:\| \| SELECT count(lo_revenue) FROM lineorder_flat \| 9.23 \| 9.12 \| \| SELECT count(lo_linenumber) FROM lineorder_flat \| 4.50 \| 4.36 \| \| SELECT count(c_name) FROM lineorder_flat \| 18.22 \| 17.88\| \| SELECT count(lo_shipmode) FROM lineorder_flat \|10.09 \| 6.15\|	2023-02-20 11:42:29 +08:00
HappenLee	f08c1222cc	[Opt](exec) Refactor the code and logical functions to SIMD the code (#16785 )	2023-02-16 16:55:12 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
xy720	1b3902baa2	[Feature](Complex-type) Add struct and map type to Doris (#16444 ) This commit support: 1、Insert + select for struct/map type 2、Json stream load for struct type 3、m[key] function for map type How to use: Set the fe config to create table for struct and map type 1、admin set frontend config("enable_struct_type" = "true"); 2、admin set frontend config("enable_map_type" = "true"); #16547 Co-authored-by: xy720 <xuyang25@baidu.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: cambyzju <zhuxiaoli01@baidu.com> Co-authored-by: hucheng01 <hucheng01@baidu.com>	2023-02-10 11:00:33 +08:00
Pxl	266bb971a6	[Enchancement](function) display elements number on check_chars_length #16570	2023-02-10 08:52:41 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
Jerry Hu	9aa0d86fec	[fix](olap) Incorrect reserving size for PredicateColumn converted from ColumnDictionary (#16249 )	2023-01-30 20:28:22 +08:00
Pxl	2b5f95f08a	[Bug](function) remove datev2 signature of hour_ceil/hour_floor #16168	2023-01-29 11:27:56 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
Jerry Hu	bae29157aa	[fix](olap) dictionary cannot be sorted after inserting some null values (#15829 )	2023-01-13 09:28:55 +08:00
Gabriel	699bf972e2	[Bug](bitmap) Fix bitmap_from_string for null constant (#15698 )	2023-01-09 10:21:08 +08:00
Ashin Gau	2c8de30cce	[optimize](multi-catalog) use dictionary encode&filter to process delete files (#15441 ) Optimize PR #14470 has used `Expr` to filter delete rows to match current data file, but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files) to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows. In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path` columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column. After testing, the performance of iceberg v2's MOR is improved by 30%+. Fix Bug Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr` and the batch_eof is generated from next batch.	2022-12-30 08:57:55 +08:00
TengJianPing	f7988fad03	[improvement](string) set bigger limit for ColumnString chars length (#15426 )	2022-12-28 15:41:01 +08:00
chenlinzhong	524208ab3a	[Feature](bitmap/hll)Support return bitmap/hll data in select statement in vectorization (#15224 ) Support return bitmap data in select statement in vectorization mode In the scenario of using Bitmap to circle people, users need to return the Bitmap results to the upper layer, which is parsing the contents of the Bitmap to deal with high QPS query scenarios	2022-12-27 14:49:24 +08:00
TengJianPing	301640d3c0	[fix](string) fix offsets over flow for extreme large String column (#15360 ) * [fix](string) fix offsets over flow for extreme large String column * fix	2022-12-26 21:23:58 +08:00

1 2 3 4

186 Commits