doris

Author	SHA1	Message	Date
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00
HappenLee	fdb4193e1b	[Vectorized][Refactor] Refactor the function of `tuple_is_null`, only do work in hash join node (#11109 )	2022-07-23 11:50:07 +08:00
Mingyu Chen	6422a5d4f7	[improvement](arrow) add arrow block convertion time profile (#11072 ) * [improvement](arrow) add arrow block convertion time profile	2022-07-22 22:11:33 +08:00
Jerry Hu	b7c9007776	[improvement][agg]Process aggregated results in the vectorized way (#11084 )	2022-07-22 22:04:43 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
huangzhaowei	7147a7c290	[feature-wip](multi-catalog) Support s3 storage for file scan node (#10977 ) This is an example of s3 hms_catalog: ```sql CREATE CATALOG hms_catalog properties( "type" = "hms", "hive.metastore.uris"="thrift://localhost:9083", "AWS_ACCESS_KEY" = "your access key", "AWS_SECRET_KEY"="your secret key", "AWS_ENDPOINT"="s3 endpoint", "AWS_REGION"="s3-region", "fs.s3a.paging.maximum"="1000"); ``` All these params are necessary;	2022-07-21 17:38:53 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Mingyu Chen	56e036e68b	[feature-wip](multi-catalog) Support runtime filter for file scan node (#11000 ) * [feature-wip](multi-catalog) Support runtime filter for file scan node Co-authored-by: morningman <morningman@apache.org>	2022-07-20 12:36:57 +08:00
Jet He	f6cb7a838b	[Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355 ) * support like/not like conjuncts push down to storage engine * vectorized engine support like/not like conjuncts push down to storage engine * support both evaluate and evaluate_vec method in like predicate * reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts * change #ifndef to pragma once as per comments * change enable_function_pushdown default to false Co-authored-by: heguangnan <heguangnan@bytedance.com>	2022-07-19 08:33:04 +08:00
slothever	8a366c9ba2	[feature](multi-catalog) read parquet file by start/offset (#10843 ) To avoid reading the repeat row group, we should align offsets	2022-07-18 20:51:08 +08:00
Ashin Gau	60dd322aba	[feature-wip](multi-catalog) Optimize threads and thrift interface of FileScanNode (#10942 ) FileScanNode in be will launch as many threads as the number of splits. The thrift interface of FileScanNode is excessive redundant.	2022-07-18 20:50:34 +08:00
Jerry Hu	899acb6564	[improvement][agg]import sub hashmap (#10937 )	2022-07-18 18:36:45 +08:00
Xinyi Zou	d9095922d9	[Enhancement] [Memory] add strict memory usage compile option STRICT_MEMORY_USE (#10936 ) In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value. STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.	2022-07-18 16:16:43 +08:00
Mingyu Chen	ba1c527a23	[improvement](arrow) Avoid parse timezone for each datetime value (#10869 ) * [improvement](arrow) Avoid parse timezone for each datetime value Convert arrow batch to doris block is too slow when there are datetime values. Because we call `TimezoneUtils::find_cctz_time_zone` for each values. After modify, the tpch-100 q1 with external table cost from 40s -> 9s Co-authored-by: morningman <morningman@apache.org>	2022-07-15 21:19:36 +08:00
Jibing-Li	ca5dbb1bcc	Fix olap scan node normalize_in_and_eq_predicate infinite loop bug. (#10817 )	2022-07-14 14:54:57 +08:00
Jerry Hu	d1573e1a4a	[improvement]Use phmap for aggregation with serialized key (#10821 )	2022-07-14 11:26:09 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Jerry Hu	89e2678f4e	[improvement]Increase min_ht_mem of StreamingHtMinReductionEntry (#10787 )	2022-07-12 22:20:02 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
TengJianPing	88f466ab86	[bugfix] temporarily disable pushing RF to scanner to avoid coredump (#10776 )	2022-07-11 22:48:08 +08:00
plat1ko	f21ce35059	[refactor]remove unused private field _profile (#10732 )	2022-07-11 14:04:09 +08:00
Gabriel	cc279d09a1	[BUG] Wrong result when build size is beyond IN runtime filter threshold (#10735 )	2022-07-11 12:19:38 +08:00
Mingyu Chen	639f1cd26c	[improvement](parquet-reader) Add some profile for parquet reader (#10740 )	2022-07-11 12:19:06 +08:00
Gabriel	8472ea8324	Revert "[Enhancement] Add column prune support for VOlapScanNode (#10615 )" (#10734 )	2022-07-11 12:16:08 +08:00
Gabriel	a044b5dcc5	[refactor](predicate) refactor predicates in scan node (#10701 ) * [reafactor](predicate) refactor predicates in scan node * update	2022-07-11 09:21:01 +08:00
Gabriel	7f9eeb8fc3	[BUG] runtime filter core dump (#10716 )	2022-07-09 21:36:22 +08:00
huangzhaowei	24d824a783	[improvement](multi-catalog) Impl parallel for file scanner to improve the scanner performance (#10620 ) Add multi-thread support in FileScanNode on be and impl the file spilt logic in fe.	2022-07-09 15:52:53 +08:00
luozenglin	d5ea677282	[feature](tracing) Support query tracing to improve doris observability by introducing OpenTelemetry. (#10533 ) The collection of query traces is implemented in fe and be, and the spans are exported to zipkin. DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry	2022-07-09 15:50:40 +08:00
Jerry Hu	e293fbd277	[improvement]pre-serialize aggregation keys (#10700 )	2022-07-09 06:21:56 +08:00
slothever	c358a43f35	[feature-wip] support parquet predicate push down (#10512 )	2022-07-08 23:11:25 +08:00
Kikyou1997	e37d29485f	[Enhancement] Add column prune support for VOlapScanNode (#10615 )	2022-07-08 13:56:26 +08:00
Gabriel	03296aedd5	[BUG] fix core dump caused by runtime filter (#10611 )	2022-07-08 08:28:39 +08:00
carlvinhust2012	cff9ffa0e1	fix the inaccurate comments (#10617 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-06 17:54:43 +08:00
TengJianPing	b4c5dfc28e	[Improvement] remove redundant code of VOlapScanner (#10621 )	2022-07-06 17:54:10 +08:00
Mingyu Chen	8e364fb848	[fix](load) skip empty orc file (#10593 ) Something the upstream system(eg, hive) may create empty orc file which only has a header and footer, without schema. And if we call `_reader->createRowReader()` with selected columns, it will throw ParserError: Invalid column selected xx. So here we first check its number of rows and skip these kind of files. This is only a fix for non-vec load, for vec load, it use arrow scanner to read orc file, which does not have this problem.	2022-07-05 22:18:56 +08:00
TengJianPing	3e87960202	[bugfix] fix bug of vhash join build (#10614 ) * [bugfix] fix bug of vhash join build * format code	2022-07-05 19:14:42 +08:00
Gabriel	a2f74bf260	[Improvement] remove profile with poor readability (#10581 )	2022-07-05 11:09:23 +08:00
TengJianPing	1f1bdaa9c3	[bugfix] fix coredump of left anti join (#10591 )	2022-07-04 22:29:41 +08:00
huangzhaowei	46bff6bba0	[fix](multi-catalog) fix the core dump on hms table (#10573 ) In the funciton `TextConverter::write_vec_column`, it should execute the statement `nullable_column->get_null_map_data().push_back(0);` for every row. Otherwise the null map will get error and cause the core dump.	2022-07-04 15:52:05 +08:00
zy-kkk	aecf6e09a9	[fix] fix agg_memleak (#10571 ) The previous code did not call 'destroy' to release the resource after the' create 'operation, resulting in a memory leak. So I added Destroy	2022-07-03 20:22:26 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
yiguolei	97996c9275	[fix](Insert) fix 5 concurrent "insert...select..." OOM (#10501 ) * [hotfix](dev-1.0.1) 5 concurrent insert...select... OOM Co-authored-by: minghong <minghong.zhou@163.com> Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-07-01 15:29:26 +08:00
TengJianPing	d9f2da8cf0	[bugfix] temporarily disable RF code to avoid core dump caused by vexpr destruction (#10504 ) Runtime filter handling in volap_scann_ode may cause double free in VExprContext, temporarily disable it to avoid it.	2022-06-30 14:54:44 +08:00
Adonis Ling	e42adbb959	Fix compilation error reported by clang (#10494 )	2022-06-29 20:38:06 +08:00
yiguolei	4ec6e3ee81	[refactor] Remove debug action since it is never used. (#10484 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-29 20:37:51 +08:00
Pxl	6a566ccb74	[Enhancement][Vectorized] add constexpr_loop_match (#10283 )	2022-06-29 14:58:50 +08:00
huangzhaowei	abd10f0f3e	[feature-wip](multi-catalog) Impl FileScanNode in be (#10402 ) Define a new file scanner node for hms table in be. This file scanner node is different from broker scan node as blow: 1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot. 2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed. 3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type, but FileScanNode will have the final type. Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.	2022-06-29 11:04:01 +08:00
Tiewei Fang	17eb8c00d3	[feature] add table valued function framework and numbers table valued function (#10214 )	2022-06-28 14:01:57 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00

1 2 3

136 Commits