doris

Author	SHA1	Message	Date
Qi Chen	ef2fdb79bb	[Improvement](parquet-reader) Optimize and refactor parquet reader to improve performance. (#16818 ) Optimize and refactor parquet reader to improve performance. - Improve 2x performance for small dict string by aligned copying. - Refactor code to decrease condition(if) checking. - Don't call skip(0). - Don't read page index if no condition. ssb-flat-100: (single-machine, single-thread) \| Query \| before opt \| after opt \| \| ------------- \|:-------------:\| ---------:\| \| SELECT count(lo_revenue) FROM lineorder_flat \| 9.23 \| 9.12 \| \| SELECT count(lo_linenumber) FROM lineorder_flat \| 4.50 \| 4.36 \| \| SELECT count(c_name) FROM lineorder_flat \| 18.22 \| 17.88\| \| SELECT count(lo_shipmode) FROM lineorder_flat \|10.09 \| 6.15\|	2023-02-20 11:42:29 +08:00
Pxl	2bc014d83a	[Enchancement](function) remove unused params on aggregate function (#16886 ) remove unused params on aggregate function	2023-02-20 11:08:45 +08:00
Xin Liao	46d5cca661	[fix](merge-on-write) The delete bitmap of the currently imported rowset is not persistent (#16859 )	2023-02-20 11:02:41 +08:00
zhannngchen	b7d2bec8ea	[fix](merge-on-write) add check for segment num (#14032 )	2023-02-20 11:01:34 +08:00
ZhaoChangle	e958b13747	[Exec] Add conjection for union_node. (#16777 )	2023-02-20 10:48:58 +08:00
zhangstar333	5291f14aff	[vectorized](udf) java udf support array type (#16841 )	2023-02-20 10:00:25 +08:00
Xinyi Zou	2074b83c67	[enhancement](third-party) Upgrade JEMalloc version from 5.2.1 to 5.3.0 (#14871 ) https://github.com/jemalloc/jemalloc/releases	2023-02-20 00:00:40 +08:00
Kang	58c51086ca	[bugfix](topn) fix topn read_orderby_key_columns nullptr (#16896 ) The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5` make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`, triggered by skipping _colname_to_value_range init in #16818 . This PR makes two changes: 1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param 2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump	2023-02-19 23:28:33 +08:00
amory	8b70bfdc31	[Feature](map-type) Support stream load and fix some bugs for map type (#16776 ) 1、support stream load with json, csv format for map 2、fix olap convertor when compaction action in map column which has null 3、support select outToFile for map 4、add some regression-test	2023-02-19 15:11:54 +08:00
zhengshengjun	e2e6a0dd83	[Feature](load) Support mutable property for partition (#16036 ) The background is described in this issue: #15723, where users used Apache Druid to satisfy such lambada requirements before. We will not make Doris dropping data not belonged to current time window automatically like Druid, which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way: 1. Support mutable property for a partition. 2. The mutable property of a partition is passed from FE to BE in a load procedure 3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio', so that data write to immutable partition will be neglected and not cause load failure. Use Example: 1. Add immutable partition or modify an partition to be immutable: - alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true'); - alter table test_tbl modify partition xx set ('mutable' = 'false'); 2. Write 5 records into table, two of then belongs to immutable partition	2023-02-18 23:09:34 +08:00
ZhaoChangle	d6a841409f	[Enhancement](func)Introduce non_nullable extraction function. #16621 Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.	2023-02-18 20:44:07 +08:00
HappenLee	fda4afecf5	[RegressionTest](Pipeline) Fix pipeline failed in regression test (#16880 ) regression-test/suites/inverted_index_p0/test_add_drop_index_with_data.groovy	2023-02-17 20:49:17 +08:00
pengxiangyu	6a1e3d3435	[fix](cooldown)Fix bug for single cooldown compaction, add remote meta (#16812 ) * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction	2023-02-17 15:13:06 +08:00
Pxl	da147f1d1c	[Chore](build) remove memory_copy and remove some wno build check (#16831 ) * remove memory_copy and remove some wno cbuild check	2023-02-17 14:43:24 +08:00
TengJianPing	ef2130de57	[improvement](memory) fix possible memory leak of vcollect iterator (#16822 ) Logic in function VCollectIterator::build_heap is not robust, which may cause memory leak: Level1Iterator* cumu_iter = new Level1Iterator( cumu_children, _reader, cumu_children.size() > 1, _is_reverse, _skip_same); RETURN_IF_NOT_EOF_AND_OK(cumu_iter->init()); std::list<LevelIterator> children; children.push_back(base_reader_child); children.push_back(cumu_iter); _inner_iter.reset( new Level1Iterator(children, _reader, _merge, _is_reverse, _skip_same)); cumu_iter will be leaked if cumu_iter->init()); is not success.	2023-02-17 14:40:15 +08:00
YueW	30dafd6a44	[improve](inverted index) Add element count limit for inverted index searcher cache (#16758 ) The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file. If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files". So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.	2023-02-17 11:53:07 +08:00
airborne12	1a9eefebd4	[Fix](inverted index) fix array inverted index error match result when doing schema change add index (#16839 ) There is a bug in inverted_index_writer when adding multiple lines array values' index. This problem can cause error result when doing schema change adding index.	2023-02-17 11:50:39 +08:00
lihangyu	5dfd6d2390	[improve](dynamic table) refine SegmentWriter columns writer generate (#16816 ) * [improve](dynamic table) refine SegmentWriter columns writer generate ``` Dynamic Block consists of two parts, dynamic part of columns and static part of columns static dynamic \| ----- \| ------- \| the static ones are original _tablet_schame columns the dynamic ones are auto generated and extended from file scan. ``` We should only consisder to use Block info to generte columns when it's a dynamic table load procudure. And seperate the static ones and dynamic ones * test	2023-02-17 10:24:33 +08:00
lihangyu	2426d8e6e8	[chore](be-config) set disable_storage_row_cache default true to default disable row cache (#16827 )	2023-02-17 10:21:28 +08:00
Gabriel	3d6077efe0	[pipeline](profile) Support real-time profile report in pipeline (#16772 )	2023-02-17 10:01:34 +08:00
HappenLee	24ef60b491	[Opt](exec) opt aggreate function performance in nullable column	2023-02-16 22:26:12 +08:00
Xin Liao	2a9e748073	[enhancement](merge-on-write) do compaction with merge on read (#16799 ) To avoid data irrecoverable due to delete bitmap calculation error，do compaction with merge on read. Through this way ，even if the delete bitmap calculation is wrong, the data can be recovered by full compaction.	2023-02-16 19:20:15 +08:00
HappenLee	f08c1222cc	[Opt](exec) Refactor the code and logical functions to SIMD the code (#16785 )	2023-02-16 16:55:12 +08:00
HappenLee	de1337511c	[Bug](Datetime) Fix date time function mem use after free (#16814 )	2023-02-16 16:15:58 +08:00
Ashin Gau	e2245cbdd3	[improvement](filecache) split file cache into sharding directories (#16767 ) Save cached file segment into path like `cache_path / hash(filepath).substr(0, 3) / hash(filepath) / offset` to prevent too many directories in `cache_path`.	2023-02-16 16:04:29 +08:00
Jibing-Li	292926e5aa	[Fix](multi catalog)Fix partition case bug (#16763 ) Set column names from path to lower case in case-insensitive case. This is for Iceberg columns from path. Iceberg columns are case sensitive, which may cause error for table with partitions.	2023-02-16 15:47:23 +08:00
Gabriel	0bb6005143	[Improvement](thrift) optimize thrift messages (#16383 ) Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment	2023-02-16 11:07:46 +08:00
yixiutt	70d234ca6d	[bugfix](reader) make segment_overlapping meta correct (#16793 )	2023-02-16 08:41:52 +08:00
Jibing-Li	de8d884ec3	[Fix](multi catalog)Fix iceberg parquet file doesn't have iceberg.schema meta problem (#16764 ) To support schema evolution, Iceberg add schema information to Parquet file metadata. But for early iceberg version, it doesn't write any schema information to Parquet file. This PR is to support read parquet without schema information.	2023-02-16 00:08:59 +08:00
Gabriel	dd06cc7609	[pipeline](shuffle) Improve broadcast shuffle (#16779 ) Now we reuse buffer pool for broadcast shuffle on pipeline engine. This PR ensures that a pipeline with a broadcast shuffle sink will not be scheduled if there are no available buffer in the buffer pool	2023-02-15 22:03:27 +08:00
pengxiangyu	fe9b2fb803	fix bug, rename thread (#16780 )	2023-02-15 18:51:22 +08:00
Pxl	f50edff59d	[Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748 ) * enable fallthrough check annd fix some fallthrough bug * fix * fix	2023-02-15 15:58:43 +08:00
TengJianPing	9b8c91e18c	[improvement](rowset reader) fix possible memleak (#16680 ) * [improvement](rowset reader) fix possible memleak * fix be UT	2023-02-15 11:13:31 +08:00
Stalary	92417cedec	MOD: Reduce clang version (#16755 )	2023-02-15 08:58:17 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
plat1ko	7482b6bad2	[fix](cooldown) Add cold_compaction_lock to serialize any operations which may delete the input rowsets of cold data compaction (#16742 ) Add cold_compaction_lock to serialize tablet clone, cold data compaction and follow cooldowned data	2023-02-14 21:38:33 +08:00
Gabriel	784c27deeb	[Bug](shuffle) fix mem leak in data stream sender (#16685 )	2023-02-14 16:40:13 +08:00
Pxl	ea78184551	[Feature](Materialized-View) support multiple slot on one column in materialized view (#16378 )	2023-02-14 16:10:50 +08:00
plat1ko	f1b9185830	[feature](cooldown) Implement cold data compaction (#16681 )	2023-02-14 15:21:54 +08:00
TengJianPing	fb0d08ff4c	[fix](mark join) fix bug of mark join with other conjuncts (#16655 ) Fix bug that probe_index is not increased for mark hash join with other conjuncts.	2023-02-14 14:47:15 +08:00
huangzhaowei	af1329936e	[Improvement](ES)Supprt datav2 and datetimev2 for es query (#16633 ) * Supprt datav2 and datetimev2 for es query	2023-02-14 14:47:00 +08:00
Jack Drogon	e1ef03b9d3	[Improvement](static variable) Fix exprs/MathFunctions static variable (#16687 ) Use static constexpr variable in impl file to avoid multi-addressing Remove unused my_double_round in vec/functions/math.cpp	2023-02-14 14:46:29 +08:00
Jibing-Li	0d9714b179	[Fix](multi catalog)Support read hive1.x orc file. (#16677 ) Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...). This will cause query result be NULL because column name in orc file doesn't match with column name in Doris table schema. This pr is to support query Hive orc files with internal column names. For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.	2023-02-14 14:32:27 +08:00
Pxl	b1347f4c38	[Chore](build) make compile option work on C objects && some refactor of cmakelists (#16451 ) make compile option work on C objects && some refactor of cmakelists	2023-02-14 13:35:20 +08:00
yiguolei	1b83829cff	[improvement](block exception safe) make block queue exception safe (#16657 ) * [improvement](block exception safe) make block queue exception safe This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-14 10:50:21 +08:00
HappenLee	a8a5cbb403	[Opt](Hash) Deduce virtual function call is null at in single nullable column (#16650 )	2023-02-14 08:44:12 +08:00
YueW	b642491555	[fix](regression) fix add drop inverted index case (#16673 )	2023-02-14 00:24:42 +08:00
YueW	f3ab55d27d	[Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569 )	2023-02-14 00:12:45 +08:00
YueW	ed3420000e	[fix](bthread) fix bthread hang (#16594 )	2023-02-14 00:08:57 +08:00
yixiutt	de725d5d44	[bugfix](column_reader) index_page should not be pre-decoded (#16605 ) In our current logic, index page will be pre-decoded but it will return OK as index page use BinaryPlainPageBuilder and first 4 bytes of the page is a offset so it's high probablility not equal to EncodingTypePB::DICT_ENCODING which is 5. Code in bitshuffle_page_pre_decode.h ``` if constexpr (USED_IN_DICT_ENCODING) { auto type = decode_fixed32_le((const uint8_t*)&data.data[0]); if (static_cast<EncodingTypePB>(type) != EncodingTypePB::DICT_ENCODING) { return Status::OK(); } size_of_dict_header = BINARY_DICT_PAGE_HEADER_SIZE; data.remove_prefix(4); } ``` But if type just equal to EncodingTypePB::DICT_ENCODING and then it will use BitShuffle to decode BinaryPlainPage, which will leads to an fatal error.	2023-02-14 00:06:14 +08:00

1 2 3 4 5 ...

3821 Commits