Commit Graph

155 Commits

Author SHA1 Message Date
48ef61780d [refactor](struct-type) refactor and clean unused code for struct type (#17257)
remove unused code for struct type
2023-03-01 15:49:31 +08:00
a1db5c6f52 [fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215) 2023-02-28 15:27:06 +08:00
91fc9fae8e [Bug](complex-type) Fix is null predicate in delete stmt for array/struct/map type (#17018) 2023-02-23 15:06:49 +08:00
08adf914f9 [improvement](vec) avoid creating a new column while filtering mutable columns (#16850)
Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.
2023-02-21 09:47:21 +08:00
ef2fdb79bb [Improvement](parquet-reader) Optimize and refactor parquet reader to improve performance. (#16818)
Optimize and refactor parquet reader to improve performance.
- Improve 2x performance for small dict string by aligned copying.
- Refactor code to decrease condition(if) checking.
- Don't call skip(0).
- Don't read page index if no condition.

**ssb-flat-100**: (single-machine, single-thread)
| Query        | before opt           | after opt  |
| ------------- |:-------------:| ---------:|
| SELECT count(lo_revenue) FROM lineorder_flat       | 9.23   | 9.12 |
| SELECT count(lo_linenumber) FROM lineorder_flat | 4.50    | 4.36 |
| SELECT count(c_name) FROM lineorder_flat             | 18.22 | 17.88| 
| **SELECT count(lo_shipmode) FROM lineorder_flat**     |**10.09** | **6.15**|
2023-02-20 11:42:29 +08:00
f08c1222cc [Opt](exec) Refactor the code and logical functions to SIMD the code (#16785) 2023-02-16 16:55:12 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
1b3902baa2 [Feature](Complex-type) Add struct and map type to Doris (#16444)
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type

How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");

#16547

Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2023-02-10 11:00:33 +08:00
Pxl
266bb971a6 [Enchancement](function) display elements number on check_chars_length #16570 2023-02-10 08:52:41 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
9aa0d86fec [fix](olap) Incorrect reserving size for PredicateColumn converted from ColumnDictionary (#16249) 2023-01-30 20:28:22 +08:00
Pxl
2b5f95f08a [Bug](function) remove datev2 signature of hour_ceil/hour_floor #16168 2023-01-29 11:27:56 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
bae29157aa [fix](olap) dictionary cannot be sorted after inserting some null values (#15829) 2023-01-13 09:28:55 +08:00
699bf972e2 [Bug](bitmap) Fix bitmap_from_string for null constant (#15698) 2023-01-09 10:21:08 +08:00
2c8de30cce [optimize](multi-catalog) use dictionary encode&filter to process delete files (#15441)
**Optimize**
PR #14470 has used `Expr` to filter delete rows to match current data file,
but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files)
to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows.

In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path`
columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column.

After testing, the performance of iceberg v2's MOR is improved by 30%+.

**Fix Bug**
Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr`
and the batch_eof is generated from next batch.
2022-12-30 08:57:55 +08:00
f7988fad03 [improvement](string) set bigger limit for ColumnString chars length (#15426) 2022-12-28 15:41:01 +08:00
524208ab3a [Feature](bitmap/hll)Support return bitmap/hll data in select statement in vectorization (#15224)
Support return bitmap data in select statement in vectorization mode

In the scenario of using Bitmap to circle people, users need to return the Bitmap results to the upper layer, which is parsing the contents of the Bitmap to deal with high QPS query scenarios
2022-12-27 14:49:24 +08:00
301640d3c0 [fix](string) fix offsets over flow for extreme large String column (#15360)
* [fix](string) fix offsets over flow for extreme large String column

* fix
2022-12-26 21:23:58 +08:00
40141a9c9c [opt](vectorized) opt the null map _has_null logic (#15181)
opt the null map _has_null logic
2022-12-20 10:01:54 +08:00
8c406c5e59 [Bug](DictoryColumn) reverse the _codes.size() replace _reserve_size (#14984) 2022-12-11 18:25:18 +08:00
e279c90965 [fix](ColumnVector) ColumnVector::insert_date_column crashed #14839
ColumnVector::insert_date_column make BE crashed with large data(>512 rows).


Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-12-06 09:06:57 +08:00
82579126cf [fix](Dictionary-codec) heap overflow with in-predicate on nullable columns (#14319) (#14641)
Losing segmentid info will mess up the _segment_id_to_value_in_dict_flags map
in InListPredicate, causing two distinct segments to collide and crash the BE
at last.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-29 21:22:18 +08:00
daeabcf053 [improvement](vec) optimize the logic for _has_null in ColumnNullable (#14633) 2022-11-29 08:53:30 +08:00
70a424d6e3 [Bug](regression) Fail regression test in test_grouping_sets in fuzzy mode (#14601) 2022-11-26 12:17:31 +08:00
064b8d2aa6 [fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604)
BE will crash when querying partitioned hive table with text format
and put partition column at first of select items.

1. FE should use file slots to set the column mapping index of csv file.
2. BE should use `get_by_name` of block to get right column in a block in csv reader.
2022-11-26 11:42:40 +08:00
52c6ba051e [feature](jsonb type)refactor JSONB type using column and add testcase (#13778)
1. Refactor JSONB type using ColumnString instead making a copy.
2. Add regression testcase for JSONB load and functions.
2022-11-26 10:06:15 +08:00
f68fa442cd [Bug](regression-test) Fix regression aggregate failed muti distinct (#14563)
Fix regression aggregate failed muti distinct
2022-11-25 10:58:10 +08:00
70ea07bc4b [fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463) 2022-11-24 09:35:08 +08:00
1ec7f45fb6 [Bug](avg) Fix avg for bigint (#14433) 2022-11-22 10:29:59 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
aec214b4b0 [bug](ColumnDecimal)call set_decimalv2_type when cloning ColumnDecimal (#14061)
* call set_decimalv2_type when cloning ColumnDecimal

* clang format
2022-11-09 11:23:43 +08:00
Pxl
9d8b4bc176 [Enhancement](Dictionary-codec) update dict once on same segment (#13936)
update dict once on same segment
2022-11-08 10:59:35 +08:00
fbc8b7311f [Opt](function) opt the function of ndv (#13887) 2022-11-02 22:21:20 +08:00
62f765b7f5 [improvement](scan) speed up inserting strings into ColumnString (#13397) 2022-11-02 22:19:02 +08:00
277025b046 [fix](join)ColumnNullable need handle const column with nullable const value (#13866) 2022-11-02 08:52:49 +08:00
1f7829e099 [Fix](array-type) bugfix for array column with delete condition (#13361)
Fix for SQL with array column:
delete from tbl where c_array is null;

more info please refer to #13360

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-21 09:29:02 +08:00
f329d33666 [chore](fix) Fix some spell errors in be's comments. #13452 2022-10-20 08:56:01 +08:00
50e2d0fd3e [opt](storage) opt the read by column decimal (#13488)
do the opt:
TPCH Q18 36s->33s
Q20 18s->17s
2022-10-20 08:53:23 +08:00
9ac4cfc9bb [bugfix](array-type) ColumnDate lost is_date_type after cloned (#13420)
Problem:
IColumn::is_date property will lost after ColumnDate::clone called.

Fix:
After ColumnDate created, also set IColumn::is_date.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-19 21:29:36 +08:00
cd3450bd9d [Improvement](join) optimize join probing phase (#13357) 2022-10-18 12:37:17 +08:00
6746434770 [improvement](schema change) avoid using column ptr swap (#13273) 2022-10-14 15:19:08 +08:00
830183984a [fix](hash)update_hashes_with_value method should handle if input value is null (#13332)
* [fix](hash)update_hashes_with_value method should handle if input value is null

* remove unnessasery xxHash64NullWithSeed
2022-10-13 14:36:01 +08:00
9b590ac4cb [improvement](olap) cache value of has_null in ColumnNullable (#13289) 2022-10-13 09:12:02 +08:00
4e4f8afa28 [fix](array-type) fix get_data_at for zero element array #13225
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-11 15:41:34 +08:00
1cd4e5cec6 refractor insert_xxx functions (#13088)
As mentioned in #13074, there will be some problem in ColumnVector<int>::insert_many_in_copy_way.
Column::insert_xxx functions will append some data, they should reserve or resize before append data.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-10 11:54:27 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
34b14a71c8 [Improvement](string) Optimize scanning for string #12911
~0.2X performance boost for queries containing string predicates
2022-09-29 15:11:16 +08:00