doris

Author	SHA1	Message	Date
xy720	3744321f01	[feature-wip](array-type) add function array_union/array_except/array_intersect (#10781 ) Add array_union/array_except/array_intersect function.	2022-07-22 13:50:13 +08:00
lihangyu	9d21b2154d	[Fix](Array) correct the offset when using get_data_at from _item_convertor (#11094 ) get_data_at should use offset - offsets[start_index] since start_index may be changed after OlapColumnDataConvertorArray::set_source_column. Using just offset may access the memory out of _item_convertor's data range,	2022-07-22 11:25:17 +08:00
xy720	40c8853c5d	[Fix] Fix select external table return “Lost connection to MySQL server during query” error	2022-07-22 11:24:09 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
huangzhaowei	7147a7c290	[feature-wip](multi-catalog) Support s3 storage for file scan node (#10977 ) This is an example of s3 hms_catalog: ```sql CREATE CATALOG hms_catalog properties( "type" = "hms", "hive.metastore.uris"="thrift://localhost:9083", "AWS_ACCESS_KEY" = "your access key", "AWS_SECRET_KEY"="your secret key", "AWS_ENDPOINT"="s3 endpoint", "AWS_REGION"="s3-region", "fs.s3a.paging.maximum"="1000"); ``` All these params are necessary;	2022-07-21 17:38:53 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Compilation Success	a1758bd139	[feature-wip](unique-key-merge-on-write) Add agg cache for delete bitmap DSIP-018 (#10921 ) Use global LRU for delete bitmap cache	2022-07-21 12:48:44 +08:00
Gabriel	b115b362fb	[Bug] fix bug for function `unix_timestamp` (#11041 ) * [Bug] fix bug for function `unix_timestamp`	2022-07-20 20:17:41 +08:00
HappenLee	d9b6e07e9d	[Vectorized] Support ODBC sink for vec exec engine (#11045 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-20 19:09:41 +08:00
Xin Liao	c037066163	[fix](cache) fix that ShardedLRUCache may coredump when destructor was called (#10995 )	2022-07-20 19:07:04 +08:00
plat1ko	2df1822269	[bugfix]fix DCHECK failure in remove_all_remote_rowsets (#10994 )	2022-07-20 19:06:21 +08:00
Adonis Ling	e5663f9872	[Bug](array-type) Fix the core dump caused by unaligned __int128 (#11020 ) Fix the core dump caused by unaligned __int128 and change DEFAULT_ALIGNMENT	2022-07-20 16:37:27 +08:00
Lightman	a71822a74d	[refactor]remove col_unique_id (#11025 )	2022-07-20 16:35:14 +08:00
zhannngchen	a1c1cfce47	Add some comments for the feature mow (#11028 )	2022-07-20 15:35:41 +08:00
zhannngchen	ec5471f048	[feature-wip](unique-key-merge-on-write) Implement tablet lookup interface, using rowset-tree, DSIP-018[3/5] (#10938 )	2022-07-20 14:52:14 +08:00
Mingyu Chen	56e036e68b	[feature-wip](multi-catalog) Support runtime filter for file scan node (#11000 ) * [feature-wip](multi-catalog) Support runtime filter for file scan node Co-authored-by: morningman <morningman@apache.org>	2022-07-20 12:36:57 +08:00
yixiutt	dc2b709f6f	[Bug](compaction) fix uniq key compaction bug that does not count merged rows right(#10971 ) When a rowset includes multiple segments, segments rows will be merged in generic_iterator but merged_rows is not maintained. Compaction will failed in check_correctness. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-07-20 12:07:45 +08:00
plat1ko	989e6d1cf9	[chore]fix clang compile error (#11021 )	2022-07-20 08:28:47 +08:00
Jerry Hu	fd2c374426	[fix]Empty string key in aggregation was output as NULL (#11011 )	2022-07-19 23:25:28 +08:00
Xin Liao	371c7be235	[feature-wip](unique-key-merge-on-write) add segment lookup interface implementation, DSIP-018 (#10922 )	2022-07-19 21:14:32 +08:00
Xinyi Zou	d5fa66d9a3	[Enhancement] [Memory] Limit memory usage use process actual physical memory (#10924 )	2022-07-19 11:08:39 +08:00
Jet He	f6cb7a838b	[Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355 ) * support like/not like conjuncts push down to storage engine * vectorized engine support like/not like conjuncts push down to storage engine * support both evaluate and evaluate_vec method in like predicate * reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts * change #ifndef to pragma once as per comments * change enable_function_pushdown default to false Co-authored-by: heguangnan <heguangnan@bytedance.com>	2022-07-19 08:33:04 +08:00
Gabriel	842ff2b1e2	[refactor] Refactor time LUT (#10982 )	2022-07-19 08:23:29 +08:00
slothever	8a366c9ba2	[feature](multi-catalog) read parquet file by start/offset (#10843 ) To avoid reading the repeat row group, we should align offsets	2022-07-18 20:51:08 +08:00
Ashin Gau	60dd322aba	[feature-wip](multi-catalog) Optimize threads and thrift interface of FileScanNode (#10942 ) FileScanNode in be will launch as many threads as the number of splits. The thrift interface of FileScanNode is excessive redundant.	2022-07-18 20:50:34 +08:00
Pxl	afc1d0c05c	[Chore][Compile] fix compile fail on clang (#10837 ) fix compile fail on clang because of output int128	2022-07-18 19:21:01 +08:00
Jerry Hu	899acb6564	[improvement][agg]import sub hashmap (#10937 )	2022-07-18 18:36:45 +08:00
yiguolei	a2ed4b5c78	[improvement] improvement for light weight schema change (#10860 ) * improvement for dynamic schema not use schema as lru cache key any more. load segment just use the rowset's original schema not the current read schema. generate column reader and column iterator using the original schema, using the read schema if it is a new column. using column unique id as key instead of column ordinals. Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-07-18 17:53:31 +08:00
TengJianPing	890fd70620	[improvement] dynamically calculate max rows to read in a batch to avoid oom (#10972 )	2022-07-18 17:43:53 +08:00
chenlinzhong	6736e06679	[feature](udf) Vectorization support remote udaf #10683 (#10685 )	2022-07-18 17:15:34 +08:00
Xinyi Zou	d9095922d9	[Enhancement] [Memory] add strict memory usage compile option STRICT_MEMORY_USE (#10936 ) In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value. STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.	2022-07-18 16:16:43 +08:00
Lei Zhang	cc7c31b080	[Bug](be) fix be coredump when receive singal(#10903 ). (#10953 )	2022-07-18 15:23:51 +08:00
Gabriel	238395e282	[Bug] fix decimal arithmetic calculations (#10963 )	2022-07-18 14:35:07 +08:00
deardeng	8c544b6e13	fix show storage policy null pointer and redundant log (#10906 ) * fix show storage policy null pointer and redundant log	2022-07-18 14:08:54 +08:00
xy720	77ef19dbcd	[BugFix](Array)Fix using Array aggregate function caused be coredump (#10649 )	2022-07-18 13:47:17 +08:00
carlvinhust2012	2d5aca18fb	[feature-wip](array) add the array_sort function (#10598 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-18 10:52:42 +08:00
plat1ko	ec5996f1f8	[improvement]do not acquire mutex in metric hook (#10941 )	2022-07-18 08:52:24 +08:00
plat1ko	523d395527	[refactor] Remove alpha rowset meta (#10933 ) * remove alpha_rowset_meta * remove alpha rowset related codes in compaction * remove alpha rowset related codes in RowsetMeta * fix be ut because some ut use alpha rowsetmeta	2022-07-18 08:45:46 +08:00
plat1ko	c3e1b73d15	revert cast_to_string (#10940 ) It leads schema change p0 tests to failure.	2022-07-17 18:34:39 +08:00
camby	09d19e3f0f	[feature-wip](array-type) explode support more sub types (#10673 ) 1. explode support more sub types; 2. explode support nullable elements; Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-07-17 18:08:30 +08:00
zxealous	5793cb11d0	[feature-wip] (array-type) function concat_ws support array (#10749 ) Issue #10052 function concat_ws support array	2022-07-17 17:50:39 +08:00
plat1ko	3bc6655069	[refactor] remove BlockManager (#10913 ) * remove BlockManager * remove deprecated field in tablet meta	2022-07-17 14:10:06 +08:00
plat1ko	eec142ae90	[Enhancement] Use shared file reader when read a segment (#10896 ) * readers under a segment use a shared FileReader * no need to cache fd in LocalFileReader	2022-07-17 07:54:58 +08:00
Gabriel	c45a98d4c0	[Bug] Fix invalid nullmap (#10925 )	2022-07-17 07:53:11 +08:00
Gabriel	0381cdc989	[Bug] fix core for min/max runtime filter (#10899 )	2022-07-16 22:22:51 +08:00
Yongqiang YANG	f78db1d773	release memory allocated in agg function in vec stream load (#10739 ) release memory allocated in agg function in vec stream load When a load is cancelled, memory allocated by agg functions should be freeed.	2022-07-16 15:32:53 +08:00
Gabriel	75ca21dafa	[Bug] handle null map right in vectorized load (#10883 )	2022-07-16 14:18:38 +08:00
camby	00c9455f16	[fix](array-type) fix arrow column to doris array column (#10855 ) * support merge array column, while convert from arrow column to doris array column * fix typo Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-07-16 11:49:42 +08:00
Mingyu Chen	ba1c527a23	[improvement](arrow) Avoid parse timezone for each datetime value (#10869 ) * [improvement](arrow) Avoid parse timezone for each datetime value Convert arrow batch to doris block is too slow when there are datetime values. Because we call `TimezoneUtils::find_cctz_time_zone` for each values. After modify, the tpch-100 q1 with external table cost from 40s -> 9s Co-authored-by: morningman <morningman@apache.org>	2022-07-15 21:19:36 +08:00
camby	7be2ef79ed	array column support read by rowids (#10886 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-07-15 21:19:02 +08:00

1 2 3 4 5 ...

2422 Commits