doris

Author	SHA1	Message	Date
TengJianPing	883f575cfe	[fix](string function) fix wrong usage of iconv_open (#17048 ) * [fix](string function) fix wrong usage of iconv_open Also add test case for function convert * fix test case	2023-02-24 09:13:10 +08:00
amory	7229751bd9	[Improve](map-type) Add contains_null for map (#16948 ) Add contains_null for map type.	2023-02-23 20:47:26 +08:00
lihangyu	526a66e9fb	[Function](array-type) support array_apply (#17020 ) Filter array to match specific binary condition ``` mysql> select array_apply([1000000, 1000001, 1000002], '=', 1000002); +-------------------------------------------------------------+ \| array_apply(ARRAY(1000000, 1000001, 1000002), '=', 1000002) \| +-------------------------------------------------------------+ \| [1000002] \| +-------------------------------------------------------------+ ```	2023-02-23 17:38:16 +08:00
camby	52a731a2df	fix compile error while use gcc12 (#17016 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-02-23 15:07:31 +08:00
xy720	91fc9fae8e	[Bug](complex-type) Fix is null predicate in delete stmt for array/struct/map type (#17018 )	2023-02-23 15:06:49 +08:00
Ashin Gau	3ea6478ba8	[feature](multi-catalog) parquet reader support nested array column (#16961 ) Support to decode nested array column in parquet reader: 1. FE should generate the right nested column type. FE doesn't check the nesting depth and legality, like map\<array\<int\>, int\>. 2. `ParquetColumnReader` has removed the filtering of page index to support nested array type. It's too difficult to skip values in nested complex types. Maybe we should support the filtering of page index and lazy read in later PR. 3. `ExternalFileScanNode` has a bug in creating default value expression. 4. Maybe it's slow to read repetition levels in a while loop. I'll optimize this in next PR. 5. Array column has temporary `SchemaElement` in its thrift definition, we have removed them and keep its parent in former implementation. The remaining parent should inherit the repetition and definition level of its child.	2023-02-23 14:54:58 +08:00
Qi Chen	61826e3a77	[Improvement](parquet-reader) Improve performance of parquet reader filter calculation. (#16934 ) Improve performance of parquet reader filter calculation. - Use `filter_data` instead of `(*filter_ptr)` to merge filter to improve performance. - Use mutable column filter func instead of original new column filter func which introduced by #16850. - Avoid column ref-count increasing which caused unnecessary copying by passing column pointer ref.	2023-02-23 14:41:30 +08:00
Lijia Liu	8eeb435963	[improvement](meta) Enhance Doris's fault tolerance to disk error (#16472 ) Sense io error. Retry query when io error. Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.	2023-02-23 08:40:45 +08:00
Xinyi Zou	a1c0054b4c	[fix](memory) fix memory GC details and join probe catch bad_alloc (#16989 ) Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory. vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs. join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM. Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.	2023-02-23 08:33:30 +08:00
DuRipeng	e65a061256	[Enhancement](datetimev2-enhance) support 'microseconds_add' function for datetimev2 (#16970 ) support 'microseconds_add' function for datetimev2	2023-02-22 17:49:41 +08:00
zxealous	29c46d6926	[fix](struct-type) fix be core when load array orc file (#16978 ) * fix be core when load array orc file	2023-02-22 10:15:39 +08:00
Adonis Ling	4cb97b6fb7	[chore](macOS) Fix linkage errors for the release build (#17002 ) Issue Number: close #17003 ## Problem summary The linker couldn't find some symbols because the implementation of a template member function doris::vectorized::Decoder::init_decimal_converter is missing in the header file in which the corresponding declaration is placed.	2023-02-22 10:01:51 +08:00
HappenLee	f37da6e789	[Function](vec) use const column to opt function current_time() (#16953 )	2023-02-21 16:26:35 +08:00
TengJianPing	6f94e84da7	[improvement](memory) fix possible double free in vcollect iterator (#16875 ) This code in VCollectIterator::build_heap is possible to cause double free if cumu_iter->init() fails and returns early, becuase some LevelIterator* exists both in VCollectIterator::_children and cumu_iter::_children.	2023-02-21 14:18:04 +08:00
TengJianPing	5ec8c51366	[fix](union iterator) fix bug that result data order of VUnionIterator is different (#16938 ) Fix bug of #16680, data order of VUnionIterator outout block is changed, which will impact compaction.	2023-02-21 14:17:21 +08:00
Mingyu Chen	491d269412	[fix](tvf) fix bug that failed to get schema of tvf when file is empty (#16928 ) In previous implementation, when querying tvf, FE will get schema from BE. And BE will try to open the first file to get its schema info, but for orc or parquet format, if the file is empty, it will return error. But even for an empty file, we can still get schema info from file's footer. So we should handle the empty file to get schema info correctly. Also modify the catalog doc to add some FAQ.	2023-02-21 14:14:32 +08:00
Mingyu Chen	c0bb2e33a8	[improvement](scan) separate scanner into local and remote scanner pool (#16891 ) There are 2 kinds for scanner thread pool, local and remote. Local is for local file read, specially for olap scanner. Remote is for other external data source, such as file scanner, jdbc scanner. This PR mainly changes: For olap scanner, use cold or hot rowset to decide whether to use local or remote pool. For other scanner, user remote pool by default. Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512, indicate the max thread number of the remote scanner thread pool This will alleviate the problem of interaction between olap queries with load job and external queries.	2023-02-21 14:13:09 +08:00
lihangyu	113023fb86	(Enhancement)[load-json] support simdjson in new json reader (#16903 ) be config: enable_simdjson_reader=true related PR #11665	2023-02-21 11:31:00 +08:00
Jerry Hu	08adf914f9	[improvement](vec) avoid creating a new column while filtering mutable columns (#16850 ) Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.	2023-02-21 09:47:21 +08:00
yiguolei	e04c13b7a6	[enhancement](exception safe) make function state exception safe (#16771 )	2023-02-20 23:01:45 +08:00
Qi Chen	a46941c684	[Fix](multi-catalog) Fix switch-case fall-through issue in multi-catalog module. (#16931 ) Fix switch-case fall-through issue in multi-catalog module.	2023-02-20 21:35:41 +08:00
ElvinWei	f32cd2c123	[fix](statistics) fix a problem with histogram statistics collection parameters (#16918 ) 1. Fixed a problem with histogram statistics collection parameters. 2. Solved the problem that it takes a long time to collect histogram statistics. TODO: Optimize histogram statistics sampling method and make the sampling parameters effective. The problem is that the histogram function works as expected in the single-node test, but doesn't work in the multi-node test. In addition, the performance of the current support sampling to collect histogram is low, resulting in a large time consumption when collecting histogram information. Fixed the parameter issue and temporarily removed support for sampling to speed up the collection of histogram statistics. Will next support sampling to collect histogram information.	2023-02-20 16:33:18 +08:00
Qi Chen	ef2fdb79bb	[Improvement](parquet-reader) Optimize and refactor parquet reader to improve performance. (#16818 ) Optimize and refactor parquet reader to improve performance. - Improve 2x performance for small dict string by aligned copying. - Refactor code to decrease condition(if) checking. - Don't call skip(0). - Don't read page index if no condition. ssb-flat-100: (single-machine, single-thread) \| Query \| before opt \| after opt \| \| ------------- \|:-------------:\| ---------:\| \| SELECT count(lo_revenue) FROM lineorder_flat \| 9.23 \| 9.12 \| \| SELECT count(lo_linenumber) FROM lineorder_flat \| 4.50 \| 4.36 \| \| SELECT count(c_name) FROM lineorder_flat \| 18.22 \| 17.88\| \| SELECT count(lo_shipmode) FROM lineorder_flat \|10.09 \| 6.15\|	2023-02-20 11:42:29 +08:00
Pxl	2bc014d83a	[Enchancement](function) remove unused params on aggregate function (#16886 ) remove unused params on aggregate function	2023-02-20 11:08:45 +08:00
ZhaoChangle	e958b13747	[Exec] Add conjection for union_node. (#16777 )	2023-02-20 10:48:58 +08:00
zhangstar333	5291f14aff	[vectorized](udf) java udf support array type (#16841 )	2023-02-20 10:00:25 +08:00
Kang	58c51086ca	[bugfix](topn) fix topn read_orderby_key_columns nullptr (#16896 ) The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5` make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`, triggered by skipping _colname_to_value_range init in #16818 . This PR makes two changes: 1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param 2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump	2023-02-19 23:28:33 +08:00
amory	8b70bfdc31	[Feature](map-type) Support stream load and fix some bugs for map type (#16776 ) 1、support stream load with json, csv format for map 2、fix olap convertor when compaction action in map column which has null 3、support select outToFile for map 4、add some regression-test	2023-02-19 15:11:54 +08:00
zhengshengjun	e2e6a0dd83	[Feature](load) Support mutable property for partition (#16036 ) The background is described in this issue: #15723, where users used Apache Druid to satisfy such lambada requirements before. We will not make Doris dropping data not belonged to current time window automatically like Druid, which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way: 1. Support mutable property for a partition. 2. The mutable property of a partition is passed from FE to BE in a load procedure 3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio', so that data write to immutable partition will be neglected and not cause load failure. Use Example: 1. Add immutable partition or modify an partition to be immutable: - alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true'); - alter table test_tbl modify partition xx set ('mutable' = 'false'); 2. Write 5 records into table, two of then belongs to immutable partition	2023-02-18 23:09:34 +08:00
ZhaoChangle	d6a841409f	[Enhancement](func)Introduce non_nullable extraction function. #16621 Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.	2023-02-18 20:44:07 +08:00
TengJianPing	ef2130de57	[improvement](memory) fix possible memory leak of vcollect iterator (#16822 ) Logic in function VCollectIterator::build_heap is not robust, which may cause memory leak: Level1Iterator* cumu_iter = new Level1Iterator( cumu_children, _reader, cumu_children.size() > 1, _is_reverse, _skip_same); RETURN_IF_NOT_EOF_AND_OK(cumu_iter->init()); std::list<LevelIterator> children; children.push_back(base_reader_child); children.push_back(cumu_iter); _inner_iter.reset( new Level1Iterator(children, _reader, _merge, _is_reverse, _skip_same)); cumu_iter will be leaked if cumu_iter->init()); is not success.	2023-02-17 14:40:15 +08:00
HappenLee	24ef60b491	[Opt](exec) opt aggreate function performance in nullable column	2023-02-16 22:26:12 +08:00
HappenLee	f08c1222cc	[Opt](exec) Refactor the code and logical functions to SIMD the code (#16785 )	2023-02-16 16:55:12 +08:00
HappenLee	de1337511c	[Bug](Datetime) Fix date time function mem use after free (#16814 )	2023-02-16 16:15:58 +08:00
Jibing-Li	292926e5aa	[Fix](multi catalog)Fix partition case bug (#16763 ) Set column names from path to lower case in case-insensitive case. This is for Iceberg columns from path. Iceberg columns are case sensitive, which may cause error for table with partitions.	2023-02-16 15:47:23 +08:00
Jibing-Li	de8d884ec3	[Fix](multi catalog)Fix iceberg parquet file doesn't have iceberg.schema meta problem (#16764 ) To support schema evolution, Iceberg add schema information to Parquet file metadata. But for early iceberg version, it doesn't write any schema information to Parquet file. This PR is to support read parquet without schema information.	2023-02-16 00:08:59 +08:00
Gabriel	dd06cc7609	[pipeline](shuffle) Improve broadcast shuffle (#16779 ) Now we reuse buffer pool for broadcast shuffle on pipeline engine. This PR ensures that a pipeline with a broadcast shuffle sink will not be scheduled if there are no available buffer in the buffer pool	2023-02-15 22:03:27 +08:00
Pxl	f50edff59d	[Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748 ) * enable fallthrough check annd fix some fallthrough bug * fix * fix	2023-02-15 15:58:43 +08:00
TengJianPing	9b8c91e18c	[improvement](rowset reader) fix possible memleak (#16680 ) * [improvement](rowset reader) fix possible memleak * fix be UT	2023-02-15 11:13:31 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
Gabriel	784c27deeb	[Bug](shuffle) fix mem leak in data stream sender (#16685 )	2023-02-14 16:40:13 +08:00
Pxl	ea78184551	[Feature](Materialized-View) support multiple slot on one column in materialized view (#16378 )	2023-02-14 16:10:50 +08:00
plat1ko	f1b9185830	[feature](cooldown) Implement cold data compaction (#16681 )	2023-02-14 15:21:54 +08:00
TengJianPing	fb0d08ff4c	[fix](mark join) fix bug of mark join with other conjuncts (#16655 ) Fix bug that probe_index is not increased for mark hash join with other conjuncts.	2023-02-14 14:47:15 +08:00
Jack Drogon	e1ef03b9d3	[Improvement](static variable) Fix exprs/MathFunctions static variable (#16687 ) Use static constexpr variable in impl file to avoid multi-addressing Remove unused my_double_round in vec/functions/math.cpp	2023-02-14 14:46:29 +08:00
Jibing-Li	0d9714b179	[Fix](multi catalog)Support read hive1.x orc file. (#16677 ) Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...). This will cause query result be NULL because column name in orc file doesn't match with column name in Doris table schema. This pr is to support query Hive orc files with internal column names. For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.	2023-02-14 14:32:27 +08:00
yiguolei	1b83829cff	[improvement](block exception safe) make block queue exception safe (#16657 ) * [improvement](block exception safe) make block queue exception safe This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-14 10:50:21 +08:00
HappenLee	a8a5cbb403	[Opt](Hash) Deduce virtual function call is null at in single nullable column (#16650 )	2023-02-14 08:44:12 +08:00
YueW	b642491555	[fix](regression) fix add drop inverted index case (#16673 )	2023-02-14 00:24:42 +08:00
YueW	f3ab55d27d	[Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569 )	2023-02-14 00:12:45 +08:00

1 2 3 4 5 ...

1281 Commits