doris

Author	SHA1	Message	Date
Qi Chen	ef2fdb79bb	[Improvement](parquet-reader) Optimize and refactor parquet reader to improve performance. (#16818 ) Optimize and refactor parquet reader to improve performance. - Improve 2x performance for small dict string by aligned copying. - Refactor code to decrease condition(if) checking. - Don't call skip(0). - Don't read page index if no condition. ssb-flat-100: (single-machine, single-thread) \| Query \| before opt \| after opt \| \| ------------- \|:-------------:\| ---------:\| \| SELECT count(lo_revenue) FROM lineorder_flat \| 9.23 \| 9.12 \| \| SELECT count(lo_linenumber) FROM lineorder_flat \| 4.50 \| 4.36 \| \| SELECT count(c_name) FROM lineorder_flat \| 18.22 \| 17.88\| \| SELECT count(lo_shipmode) FROM lineorder_flat \|10.09 \| 6.15\|	2023-02-20 11:42:29 +08:00
Pxl	2bc014d83a	[Enchancement](function) remove unused params on aggregate function (#16886 ) remove unused params on aggregate function	2023-02-20 11:08:45 +08:00
ZhaoChangle	e958b13747	[Exec] Add conjection for union_node. (#16777 )	2023-02-20 10:48:58 +08:00
zhangstar333	5291f14aff	[vectorized](udf) java udf support array type (#16841 )	2023-02-20 10:00:25 +08:00
Kang	58c51086ca	[bugfix](topn) fix topn read_orderby_key_columns nullptr (#16896 ) The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5` make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`, triggered by skipping _colname_to_value_range init in #16818 . This PR makes two changes: 1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param 2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump	2023-02-19 23:28:33 +08:00
amory	8b70bfdc31	[Feature](map-type) Support stream load and fix some bugs for map type (#16776 ) 1、support stream load with json, csv format for map 2、fix olap convertor when compaction action in map column which has null 3、support select outToFile for map 4、add some regression-test	2023-02-19 15:11:54 +08:00
zhengshengjun	e2e6a0dd83	[Feature](load) Support mutable property for partition (#16036 ) The background is described in this issue: #15723, where users used Apache Druid to satisfy such lambada requirements before. We will not make Doris dropping data not belonged to current time window automatically like Druid, which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way: 1. Support mutable property for a partition. 2. The mutable property of a partition is passed from FE to BE in a load procedure 3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio', so that data write to immutable partition will be neglected and not cause load failure. Use Example: 1. Add immutable partition or modify an partition to be immutable: - alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true'); - alter table test_tbl modify partition xx set ('mutable' = 'false'); 2. Write 5 records into table, two of then belongs to immutable partition	2023-02-18 23:09:34 +08:00
ZhaoChangle	d6a841409f	[Enhancement](func)Introduce non_nullable extraction function. #16621 Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.	2023-02-18 20:44:07 +08:00
TengJianPing	ef2130de57	[improvement](memory) fix possible memory leak of vcollect iterator (#16822 ) Logic in function VCollectIterator::build_heap is not robust, which may cause memory leak: Level1Iterator* cumu_iter = new Level1Iterator( cumu_children, _reader, cumu_children.size() > 1, _is_reverse, _skip_same); RETURN_IF_NOT_EOF_AND_OK(cumu_iter->init()); std::list<LevelIterator> children; children.push_back(base_reader_child); children.push_back(cumu_iter); _inner_iter.reset( new Level1Iterator(children, _reader, _merge, _is_reverse, _skip_same)); cumu_iter will be leaked if cumu_iter->init()); is not success.	2023-02-17 14:40:15 +08:00
HappenLee	24ef60b491	[Opt](exec) opt aggreate function performance in nullable column	2023-02-16 22:26:12 +08:00
HappenLee	f08c1222cc	[Opt](exec) Refactor the code and logical functions to SIMD the code (#16785 )	2023-02-16 16:55:12 +08:00
HappenLee	de1337511c	[Bug](Datetime) Fix date time function mem use after free (#16814 )	2023-02-16 16:15:58 +08:00
Jibing-Li	292926e5aa	[Fix](multi catalog)Fix partition case bug (#16763 ) Set column names from path to lower case in case-insensitive case. This is for Iceberg columns from path. Iceberg columns are case sensitive, which may cause error for table with partitions.	2023-02-16 15:47:23 +08:00
Jibing-Li	de8d884ec3	[Fix](multi catalog)Fix iceberg parquet file doesn't have iceberg.schema meta problem (#16764 ) To support schema evolution, Iceberg add schema information to Parquet file metadata. But for early iceberg version, it doesn't write any schema information to Parquet file. This PR is to support read parquet without schema information.	2023-02-16 00:08:59 +08:00
Gabriel	dd06cc7609	[pipeline](shuffle) Improve broadcast shuffle (#16779 ) Now we reuse buffer pool for broadcast shuffle on pipeline engine. This PR ensures that a pipeline with a broadcast shuffle sink will not be scheduled if there are no available buffer in the buffer pool	2023-02-15 22:03:27 +08:00
Pxl	f50edff59d	[Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748 ) * enable fallthrough check annd fix some fallthrough bug * fix * fix	2023-02-15 15:58:43 +08:00
TengJianPing	9b8c91e18c	[improvement](rowset reader) fix possible memleak (#16680 ) * [improvement](rowset reader) fix possible memleak * fix be UT	2023-02-15 11:13:31 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
Gabriel	784c27deeb	[Bug](shuffle) fix mem leak in data stream sender (#16685 )	2023-02-14 16:40:13 +08:00
Pxl	ea78184551	[Feature](Materialized-View) support multiple slot on one column in materialized view (#16378 )	2023-02-14 16:10:50 +08:00
plat1ko	f1b9185830	[feature](cooldown) Implement cold data compaction (#16681 )	2023-02-14 15:21:54 +08:00
TengJianPing	fb0d08ff4c	[fix](mark join) fix bug of mark join with other conjuncts (#16655 ) Fix bug that probe_index is not increased for mark hash join with other conjuncts.	2023-02-14 14:47:15 +08:00
Jack Drogon	e1ef03b9d3	[Improvement](static variable) Fix exprs/MathFunctions static variable (#16687 ) Use static constexpr variable in impl file to avoid multi-addressing Remove unused my_double_round in vec/functions/math.cpp	2023-02-14 14:46:29 +08:00
Jibing-Li	0d9714b179	[Fix](multi catalog)Support read hive1.x orc file. (#16677 ) Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...). This will cause query result be NULL because column name in orc file doesn't match with column name in Doris table schema. This pr is to support query Hive orc files with internal column names. For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.	2023-02-14 14:32:27 +08:00
yiguolei	1b83829cff	[improvement](block exception safe) make block queue exception safe (#16657 ) * [improvement](block exception safe) make block queue exception safe This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-14 10:50:21 +08:00
HappenLee	a8a5cbb403	[Opt](Hash) Deduce virtual function call is null at in single nullable column (#16650 )	2023-02-14 08:44:12 +08:00
YueW	b642491555	[fix](regression) fix add drop inverted index case (#16673 )	2023-02-14 00:24:42 +08:00
YueW	f3ab55d27d	[Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569 )	2023-02-14 00:12:45 +08:00
YueW	ed3420000e	[fix](bthread) fix bthread hang (#16594 )	2023-02-14 00:08:57 +08:00
lihangyu	36955a6769	[regression-test](dynamic-table) add regression test for dynamic table (#16656 )	2023-02-14 00:03:19 +08:00
HappenLee	a34cc6ed23	[Refactor](exchange) Remove unless variable and change block mem count way (#16668 )	2023-02-13 19:14:01 +08:00
yiguolei	be9385d40a	[improvement](lock raii) use raii to lock and unlock (#16652 ) * [improvement](lock raii) use raii to lock and unlock This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-13 14:06:36 +08:00
奕冷	cf739e7496	[Enhancement](Stmt) Set insert_into timeout session variable separately (#16343 )	2023-02-12 16:56:10 +08:00
HappenLee	09b7c22f6b	[Opt](exec) remove unless null key when no split in convert key range (#16624 )	2023-02-11 15:44:35 +08:00
Kang	aba843bb2b	[Improvement](inverted index) inverted index query match bitmap cache (#16578 ) Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. Tests result: - large result: matching 99% out of 247 million rows shows 8x speed up. - small result: matching 0.1% out of 247 million rows shows 2x speed up.	2023-02-11 13:38:58 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
TengJianPing	171ae2892f	[improvement](batch size) pass batch size of exec engine to storage engine (#16614 ) Currently batch_size is not passed on to SegmentIterator, the SegmentIterator uses the hard coded value 4096 - 32 as the max row count of a block. * fix bug	2023-02-11 09:01:44 +08:00
yiguolei	75847f7f6a	[bugfix](exchange node) should not depend on eos to judge the ending of stream receiver (#16600 ) [bugfix](exchange node) should not depend on eos to judge the ending of stream receiver #16600 Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-10 20:35:49 +08:00
YueW	43eca4f209	[Feature-WIP](inverted index) Implementation for alter inverted index. (#16371 ) implementation for add/drop inverted index.	2023-02-10 17:56:17 +08:00
Jerry Hu	861f31205a	[fix](window function) invalid order_by_start in VAnalyticEvalNode (#16589 )	2023-02-10 17:40:40 +08:00
zhangstar333	b99e2dc727	[bug](jdbc) fix jdbc can't get object of PGobject (#16496 ) when pg table have some unsupported column type like: point, polygon, jsonb...... jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject Some test need this pr: #16442	2023-02-10 16:19:02 +08:00
xueweizhang	379bef598d	[fix-core](block) clear block row_same_bit when block reuse (#16172 )	2023-02-10 12:21:27 +08:00
xy720	1b3902baa2	[Feature](Complex-type) Add struct and map type to Doris (#16444 ) This commit support: 1、Insert + select for struct/map type 2、Json stream load for struct type 3、m[key] function for map type How to use: Set the fe config to create table for struct and map type 1、admin set frontend config("enable_struct_type" = "true"); 2、admin set frontend config("enable_map_type" = "true"); #16547 Co-authored-by: xy720 <xuyang25@baidu.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: cambyzju <zhuxiaoli01@baidu.com> Co-authored-by: hucheng01 <hucheng01@baidu.com>	2023-02-10 11:00:33 +08:00
Pxl	266bb971a6	[Enchancement](function) display elements number on check_chars_length #16570	2023-02-10 08:52:41 +08:00
Xinyi Zou	c1a1275870	[fix](memory) Fix parquet load stack overflow (#16537 )	2023-02-10 08:48:12 +08:00
Gabriel	a038fdaec6	[Bug](pipeline) Fix bug in non-local exchange on pipeline engine (#16463 ) Currently, for broadcast shuffle, we serialize a block once and then send it by RPC through multiple channel. After this, we will serialize next block in the same memory for consideration of memory reuse. However, since the RPC is asynchronized, maybe the next block serialization will happen before sending the previous block. So, in this PR, I use a ref count to identify if the serialized block can be reuse in broadcast shuffle.	2023-02-09 19:22:40 +08:00
HappenLee	7d035486ad	[Opt](vec) opt the fast execute logic to remove useless function call (#16532 )	2023-02-09 14:12:40 +08:00
yiguolei	646ba2cc88	[bugfix](scannode) 1. make rows_read correct 2. use single scanner if has limit clause (#16473 ) make rows_read correct so that the scheduler could using this correctly. use single scanner if has limit clause. Move it from fragment context to scannode. --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-02-09 14:12:18 +08:00
Xiaocc	0142ef8b95	[improvement](scanner) Supports bthread scanner (#16031 )	2023-02-09 10:24:56 +08:00
yixiutt	9f8753ffd2	[bugfix](vertical_compaction) fix base_compaction delete_sign handler (#16469 ) In vertical base compaction, same rows will be filtered in vertical_merge_iterator, we should skip these filtered rows when set agg flag of delete sign. For example, schema is a,b,delete_sign, and data is 1,1,1 1,1,0 1,1,0 2,2,1 2,2 and Block we get in VerticalBlockReader is 1,1,1 2,2,1 and we should set agg flag idex 0,4 to true when handle delete sign, so we add a function continuous_agg_count to skip same rows filtered in VerticalMergeIterator.	2023-02-09 10:13:41 +08:00

1 2 3 4 5 ...

1259 Commits