doris

Author	SHA1	Message	Date
Jerry Hu	9aa0d86fec	[fix](olap) Incorrect reserving size for PredicateColumn converted from ColumnDictionary (#16249 )	2023-01-30 20:28:22 +08:00
Pxl	2b5f95f08a	[Bug](function) remove datev2 signature of hour_ceil/hour_floor #16168	2023-01-29 11:27:56 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
Jerry Hu	bae29157aa	[fix](olap) dictionary cannot be sorted after inserting some null values (#15829 )	2023-01-13 09:28:55 +08:00
Gabriel	699bf972e2	[Bug](bitmap) Fix bitmap_from_string for null constant (#15698 )	2023-01-09 10:21:08 +08:00
Ashin Gau	2c8de30cce	[optimize](multi-catalog) use dictionary encode&filter to process delete files (#15441 ) Optimize PR #14470 has used `Expr` to filter delete rows to match current data file, but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files) to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows. In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path` columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column. After testing, the performance of iceberg v2's MOR is improved by 30%+. Fix Bug Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr` and the batch_eof is generated from next batch.	2022-12-30 08:57:55 +08:00
TengJianPing	f7988fad03	[improvement](string) set bigger limit for ColumnString chars length (#15426 )	2022-12-28 15:41:01 +08:00
chenlinzhong	524208ab3a	[Feature](bitmap/hll)Support return bitmap/hll data in select statement in vectorization (#15224 ) Support return bitmap data in select statement in vectorization mode In the scenario of using Bitmap to circle people, users need to return the Bitmap results to the upper layer, which is parsing the contents of the Bitmap to deal with high QPS query scenarios	2022-12-27 14:49:24 +08:00
TengJianPing	301640d3c0	[fix](string) fix offsets over flow for extreme large String column (#15360 ) * [fix](string) fix offsets over flow for extreme large String column * fix	2022-12-26 21:23:58 +08:00
HappenLee	40141a9c9c	[opt](vectorized) opt the null map _has_null logic (#15181 ) opt the null map _has_null logic	2022-12-20 10:01:54 +08:00
HappenLee	8c406c5e59	[Bug](DictoryColumn) reverse the _codes.size() replace _reserve_size (#14984 )	2022-12-11 18:25:18 +08:00
camby	e279c90965	[fix](ColumnVector) ColumnVector::insert_date_column crashed #14839 ColumnVector::insert_date_column make BE crashed with large data(>512 rows). Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-12-06 09:06:57 +08:00
zhengyu	82579126cf	[fix](Dictionary-codec) heap overflow with in-predicate on nullable columns (#14319 ) (#14641 ) Losing segmentid info will mess up the _segment_id_to_value_in_dict_flags map in InListPredicate, causing two distinct segments to collide and crash the BE at last. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-29 21:22:18 +08:00
Jerry Hu	daeabcf053	[improvement](vec) optimize the logic for _has_null in ColumnNullable (#14633 )	2022-11-29 08:53:30 +08:00
HappenLee	70a424d6e3	[Bug](regression) Fail regression test in test_grouping_sets in fuzzy mode (#14601 )	2022-11-26 12:17:31 +08:00
Mingyu Chen	064b8d2aa6	[fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604 ) BE will crash when querying partitioned hive table with text format and put partition column at first of select items. 1. FE should use file slots to set the column mapping index of csv file. 2. BE should use `get_by_name` of block to get right column in a block in csv reader.	2022-11-26 11:42:40 +08:00
Kang	52c6ba051e	[feature](jsonb type)refactor JSONB type using column and add testcase (#13778 ) 1. Refactor JSONB type using ColumnString instead making a copy. 2. Add regression testcase for JSONB load and functions.	2022-11-26 10:06:15 +08:00
HappenLee	f68fa442cd	[Bug](regression-test) Fix regression aggregate failed muti distinct (#14563 ) Fix regression aggregate failed muti distinct	2022-11-25 10:58:10 +08:00
abmdocrt	70ea07bc4b	[fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463 )	2022-11-24 09:35:08 +08:00
Gabriel	1ec7f45fb6	[Bug](avg) Fix `avg` for bigint (#14433 )	2022-11-22 10:29:59 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
Pxl	0e26f28bf2	[Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581 ) enlarge runtime filter in predicate threshold	2022-11-10 15:48:46 +08:00
Kang	aec214b4b0	[bug](ColumnDecimal)call set_decimalv2_type when cloning ColumnDecimal (#14061 ) * call set_decimalv2_type when cloning ColumnDecimal * clang format	2022-11-09 11:23:43 +08:00
Pxl	9d8b4bc176	[Enhancement](Dictionary-codec) update dict once on same segment (#13936 ) update dict once on same segment	2022-11-08 10:59:35 +08:00
HappenLee	fbc8b7311f	[Opt](function) opt the function of ndv (#13887 )	2022-11-02 22:21:20 +08:00
Jerry Hu	62f765b7f5	[improvement](scan) speed up inserting strings into ColumnString (#13397 )	2022-11-02 22:19:02 +08:00
starocean999	277025b046	[fix](join)ColumnNullable need handle const column with nullable const value (#13866 )	2022-11-02 08:52:49 +08:00
camby	1f7829e099	[Fix](array-type) bugfix for array column with delete condition (#13361 ) Fix for SQL with array column: delete from tbl where c_array is null; more info please refer to #13360 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-21 09:29:02 +08:00
xy720	f329d33666	[chore](fix) Fix some spell errors in be's comments. #13452	2022-10-20 08:56:01 +08:00
HappenLee	50e2d0fd3e	[opt](storage) opt the read by column decimal (#13488 ) do the opt： TPCH Q18 36s->33s Q20 18s->17s	2022-10-20 08:53:23 +08:00
camby	9ac4cfc9bb	[bugfix](array-type) ColumnDate lost is_date_type after cloned (#13420 ) Problem: IColumn::is_date property will lost after ColumnDate::clone called. Fix: After ColumnDate created, also set IColumn::is_date. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-19 21:29:36 +08:00
Gabriel	cd3450bd9d	[Improvement](join) optimize join probing phase (#13357 )	2022-10-18 12:37:17 +08:00
TengJianPing	6746434770	[improvement](schema change) avoid using column ptr swap (#13273 )	2022-10-14 15:19:08 +08:00
starocean999	830183984a	[fix](hash)update_hashes_with_value method should handle if input value is null (#13332 ) * [fix](hash)update_hashes_with_value method should handle if input value is null * remove unnessasery xxHash64NullWithSeed	2022-10-13 14:36:01 +08:00
Jerry Hu	9b590ac4cb	[improvement](olap) cache value of has_null in ColumnNullable (#13289 )	2022-10-13 09:12:02 +08:00
camby	4e4f8afa28	[fix](array-type) fix get_data_at for zero element array #13225 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-11 15:41:34 +08:00
camby	1cd4e5cec6	refractor insert_xxx functions (#13088 ) As mentioned in #13074, there will be some problem in ColumnVector<int>::insert_many_in_copy_way. Column::insert_xxx functions will append some data, they should reserve or resize before append data. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-10 11:54:27 +08:00
Pxl	245490d6b7	[Enhancement](runtime filter) optimize for runtime filter (#12856 ) optimize for runtime filter	2022-10-09 14:11:03 +08:00
Gabriel	34b14a71c8	[Improvement](string) Optimize scanning for string #12911 ~0.2X performance boost for queries containing string predicates	2022-09-29 15:11:16 +08:00
HappenLee	36bf8ad3eb	[Opt](Vec) Support const column check nullable and remove nullable (#13020 )	2022-09-29 08:39:19 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
Pxl	64988cb3d4	[Enhancement](optimize) optimize for insert_indices_from (#12807 )	2022-09-27 15:49:15 +08:00
starocean999	c4341d3d43	[fix](like)prevent null pointer by unimplemented like_vec functions (#12910 ) * [fix](like)prevent null pointer by unimplemented like_vec functions * fix pushed like predicate on dict encoded column bug	2022-09-27 10:02:10 +08:00
Shane	35076431ab	[fix](column)fix get_shrinked_column misspell (#12961 ) Fix misspell	2022-09-26 17:32:03 +08:00
Gabriel	f879a51ce9	[Improvement](dict) optimize dictionary column (#12852 )	2022-09-25 18:29:10 +08:00
Gabriel	d8e8bc0e69	[Improvement](predicate) Replace for-loop by memcpy (#12867 )	2022-09-25 18:27:59 +08:00
Shane	59699a4321	[feature](JSON datatype)Support JSON datatype (#10322 ) Add `JSON` datatype, following features are implemented by this PR: 1. `CREATE` tables with `JSON` type columns 2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 3. `SELECT` JSON columns Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type) * add JSONB data storage format type * fix JsonLiteral resolve bug * add DataTypeJson case in data_type_factory * add JSON syntax check in FE * add operators for jsonb_document, currently not support comparison between any JSON type value * add ColumnJson and DataTypeJson * add JsonField to store JsonValue * add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral * add push_json for MysqlResultWriter * JSON column need no zone_map_index * Revert "JSON column need no zone_map_index" This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79. * add JSON writer and reader, ignore zone-map for JSON column * add json_to_string for DataTypeJson * add olap_data_convertor for JSON type * add some enum * add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions * fix column_json offsets overflow bug, format code * remove useless TODOs, add CmpType cases for JSON type * add license header * format license * format be codes * resolve rebase master conflicts * fix bugs for CREATE and meta related code * refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes * modification be codes along code review advice * fix rebase conflicts with master * add unit test for json_value and column_json * fix rebase error * rename json to jsonb * fix some data convert bugs, set Mysql type to JSON	2022-09-25 14:06:49 +08:00
yiguolei	32551a7263	[bugfix](predicate column) data maybe wrong if not a single page (#12796 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-22 09:55:31 +08:00
Gabriel	3cfaae0031	[Improvement](sort) Use heap sort to optimize sort node (#12700 )	2022-09-21 10:01:52 +08:00

1 2 3

145 Commits