doris

Author	SHA1	Message	Date
Jerry Hu	0325fa436e	[fix](agg)Add field of 'is_first_phase' in TAggregationNode (#11321 )	2022-08-01 11:49:50 +08:00
Jerry Hu	d360974dce	[improvement](agg)Use phmap::flat_hash_set in AggregateFunctionUniq (#11363 ) This reverts commit 688b55053dd1fc5113343a6f565ad732ddd9612a.	2022-08-01 10:36:11 +08:00
Mingyu Chen	688b55053d	Revert "[improvement]Use phmap::flat_hash_set in AggregateFunctionUniq (#11257 )" (#11356 ) This reverts commit a7199fb98e18b925664b38460b667d04cbee8e01.	2022-07-30 23:15:36 +08:00
zhangstar333	1f30e563a7	[refactor][vectorized] refactor first/last value agg functions (#10661 ) * refactor first and last [refactor][vectorized] refactor first/last value agg functions * add some change * remove first/last about always nullable * remove always nullable and register it * refactor value remove bool null flag * refactor win first last to ptr and pos	2022-07-30 18:38:56 +08:00
Xinyi Zou	18864ab7fe	weak relationship between MemTracker and MemTrackerLimiter (#11347 )	2022-07-30 18:33:54 +08:00
Luwei	d6f937cb01	(performance)[scanner] Isolate local and remote queries using different scanner… (#11006 )	2022-07-29 19:14:46 +08:00
Ashin Gau	84ce2a1e98	[feature-wip](multi-catalog)(fix) partition value error when a block contains multiple splits (#11260 ) `FileArrowScanner::get_next` returns a block when full, so it maybe contains multiple splits in small files or crosses two splits in large files. However, a block can only fill the partition values from one file. Different splits may be from different files, causing the error of embed partition values.	2022-07-29 18:48:59 +08:00
Jerry Hu	a7199fb98e	[improvement]Use phmap::flat_hash_set in AggregateFunctionUniq (#11257 )	2022-07-29 16:55:22 +08:00
Gabriel	3fe7b21ac8	[Improvement](vectorized) Remove row-based conjuncts on vectorized nodes (#11324 )	2022-07-29 15:42:06 +08:00
slothever	e4bc3f6b6f	[feature-wip] (parquet-reader) add parquet reader impl template (#11285 )	2022-07-29 14:30:31 +08:00
Pxl	1b4a2c287e	[Improvement][chore] replace from_decv2_to_packed128 to decv2.value (#11261 )	2022-07-28 10:41:27 +08:00
zhannngchen	d4b4c9a9bf	[feature-wip](unique-key-merge-on-write) update counter, DSIP-018 (#11252 )	2022-07-28 10:32:26 +08:00
HappenLee	0b1d06bfd6	[Vectorized] Support order by aggregate function (#11187 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-28 09:12:58 +08:00
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
zhannngchen	93b0e002d1	[feature-wip](unique-key-merge-on-write) add delete bitmap in read path, DSIP-018[2/3] (#11136 )	2022-07-27 14:18:21 +08:00
Jerry Hu	b74f36e009	[improvement]Use phmap for aggregation with integer keys (#11175 )	2022-07-27 13:58:20 +08:00
TengJianPing	37dff975a7	[bugfix] fix ASAN error alloc-dealloc-mismatch (#11168 )	2022-07-25 18:14:20 +08:00
TengJianPing	00e2944102	[bugfix] fix coredump caused by wrong type cast of OlapScanNode (#11165 )	2022-07-25 17:57:53 +08:00
huangzhaowei	54f878b781	[feature-wip](multi-catalog) Support orc format file split for file scan node (#11046 )	2022-07-25 11:41:46 +08:00
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00
HappenLee	fdb4193e1b	[Vectorized][Refactor] Refactor the function of `tuple_is_null`, only do work in hash join node (#11109 )	2022-07-23 11:50:07 +08:00
Mingyu Chen	6422a5d4f7	[improvement](arrow) add arrow block convertion time profile (#11072 ) * [improvement](arrow) add arrow block convertion time profile	2022-07-22 22:11:33 +08:00
Jerry Hu	b7c9007776	[improvement][agg]Process aggregated results in the vectorized way (#11084 )	2022-07-22 22:04:43 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
huangzhaowei	7147a7c290	[feature-wip](multi-catalog) Support s3 storage for file scan node (#10977 ) This is an example of s3 hms_catalog: ```sql CREATE CATALOG hms_catalog properties( "type" = "hms", "hive.metastore.uris"="thrift://localhost:9083", "AWS_ACCESS_KEY" = "your access key", "AWS_SECRET_KEY"="your secret key", "AWS_ENDPOINT"="s3 endpoint", "AWS_REGION"="s3-region", "fs.s3a.paging.maximum"="1000"); ``` All these params are necessary;	2022-07-21 17:38:53 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Mingyu Chen	56e036e68b	[feature-wip](multi-catalog) Support runtime filter for file scan node (#11000 ) * [feature-wip](multi-catalog) Support runtime filter for file scan node Co-authored-by: morningman <morningman@apache.org>	2022-07-20 12:36:57 +08:00
Jet He	f6cb7a838b	[Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355 ) * support like/not like conjuncts push down to storage engine * vectorized engine support like/not like conjuncts push down to storage engine * support both evaluate and evaluate_vec method in like predicate * reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts * change #ifndef to pragma once as per comments * change enable_function_pushdown default to false Co-authored-by: heguangnan <heguangnan@bytedance.com>	2022-07-19 08:33:04 +08:00
slothever	8a366c9ba2	[feature](multi-catalog) read parquet file by start/offset (#10843 ) To avoid reading the repeat row group, we should align offsets	2022-07-18 20:51:08 +08:00
Ashin Gau	60dd322aba	[feature-wip](multi-catalog) Optimize threads and thrift interface of FileScanNode (#10942 ) FileScanNode in be will launch as many threads as the number of splits. The thrift interface of FileScanNode is excessive redundant.	2022-07-18 20:50:34 +08:00
Jerry Hu	899acb6564	[improvement][agg]import sub hashmap (#10937 )	2022-07-18 18:36:45 +08:00
Xinyi Zou	d9095922d9	[Enhancement] [Memory] add strict memory usage compile option STRICT_MEMORY_USE (#10936 ) In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value. STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.	2022-07-18 16:16:43 +08:00
Mingyu Chen	ba1c527a23	[improvement](arrow) Avoid parse timezone for each datetime value (#10869 ) * [improvement](arrow) Avoid parse timezone for each datetime value Convert arrow batch to doris block is too slow when there are datetime values. Because we call `TimezoneUtils::find_cctz_time_zone` for each values. After modify, the tpch-100 q1 with external table cost from 40s -> 9s Co-authored-by: morningman <morningman@apache.org>	2022-07-15 21:19:36 +08:00
Jibing-Li	ca5dbb1bcc	Fix olap scan node normalize_in_and_eq_predicate infinite loop bug. (#10817 )	2022-07-14 14:54:57 +08:00
Jerry Hu	d1573e1a4a	[improvement]Use phmap for aggregation with serialized key (#10821 )	2022-07-14 11:26:09 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Jerry Hu	89e2678f4e	[improvement]Increase min_ht_mem of StreamingHtMinReductionEntry (#10787 )	2022-07-12 22:20:02 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
TengJianPing	88f466ab86	[bugfix] temporarily disable pushing RF to scanner to avoid coredump (#10776 )	2022-07-11 22:48:08 +08:00
plat1ko	f21ce35059	[refactor]remove unused private field _profile (#10732 )	2022-07-11 14:04:09 +08:00
Gabriel	cc279d09a1	[BUG] Wrong result when build size is beyond IN runtime filter threshold (#10735 )	2022-07-11 12:19:38 +08:00
Mingyu Chen	639f1cd26c	[improvement](parquet-reader) Add some profile for parquet reader (#10740 )	2022-07-11 12:19:06 +08:00
Gabriel	8472ea8324	Revert "[Enhancement] Add column prune support for VOlapScanNode (#10615 )" (#10734 )	2022-07-11 12:16:08 +08:00
Gabriel	a044b5dcc5	[refactor](predicate) refactor predicates in scan node (#10701 ) * [reafactor](predicate) refactor predicates in scan node * update	2022-07-11 09:21:01 +08:00
Gabriel	7f9eeb8fc3	[BUG] runtime filter core dump (#10716 )	2022-07-09 21:36:22 +08:00
huangzhaowei	24d824a783	[improvement](multi-catalog) Impl parallel for file scanner to improve the scanner performance (#10620 ) Add multi-thread support in FileScanNode on be and impl the file spilt logic in fe.	2022-07-09 15:52:53 +08:00
luozenglin	d5ea677282	[feature](tracing) Support query tracing to improve doris observability by introducing OpenTelemetry. (#10533 ) The collection of query traces is implemented in fe and be, and the spans are exported to zipkin. DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry	2022-07-09 15:50:40 +08:00
Jerry Hu	e293fbd277	[improvement]pre-serialize aggregation keys (#10700 )	2022-07-09 06:21:56 +08:00
slothever	c358a43f35	[feature-wip] support parquet predicate push down (#10512 )	2022-07-08 23:11:25 +08:00
Kikyou1997	e37d29485f	[Enhancement] Add column prune support for VOlapScanNode (#10615 )	2022-07-08 13:56:26 +08:00

1 2 3 4

155 Commits