doris

Author	SHA1	Message	Date
camby	738da0b139	[bugfix](join) inner join return wrong result (#13608 ) * bug fix for vhash join * add regression test Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-27 11:48:41 +08:00
starocean999	c874931ac8	[fix](join)output all value from no-null side of outer join (#13655 ) * [fix](joinoutput all value from no-null side of outer join * add regression test	2022-10-27 10:48:36 +08:00
starocean999	40e122e5ef	[fix](join)the build and probe expr should be calculated before converting input block to nullable (#13436 ) * [fix](join)the build and probe expr should be calculated before converting input block to nullable * remove_nullable can be called on const column	2022-10-24 14:50:06 +08:00
luozenglin	e17c2416f0	[fix](join) fix be core dump when using right join with other join predicates (#13511 )	2022-10-24 10:35:07 +08:00
Gabriel	3006b258b0	[Improvement](bloomfilter) allocate memory for BF in open phase (#13494 )	2022-10-21 17:37:26 +08:00
Gabriel	d3f65aa746	[Improvement](join) remove unnecessary state for join (#13472 )	2022-10-21 09:59:34 +08:00
TengJianPing	b5cd167713	[fix](hashjoin) fix coredump of hash join in ubsan build (#13479 ) * [fix](hashjoin) fix coredump of hash join in ubsan build	2022-10-20 10:16:19 +08:00
Gabriel	cd3450bd9d	[Improvement](join) optimize join probing phase (#13357 )	2022-10-18 12:37:17 +08:00
xy720	c114d87d13	[Enhancement](array-type) Tuple is null predicate support array type (#13307 ) Issue Number: #12689	2022-10-17 18:50:56 +08:00
Gabriel	baf2689610	[Improvement](join) compute hash values by vectorized way (#13335 )	2022-10-13 16:04:58 +08:00
Gabriel	dfe308f501	[Improvement](join) refine prefetch strategy (#13286 )	2022-10-12 19:02:06 +08:00
Kikyou1997	9a74ad1702	[feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842 ) We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE. This PR: - add projections on each ExecNode in BE - translate PhysicalProject into projections on PlanNode in FE - do column prune on ScanNode in FE Co-authored-by: HappenLee <happenlee@hotmail.com>	2022-08-30 16:17:10 +08:00
Xinyi Zou	b1fd701493	[fix](memtracker) Improve memory tracking accuracy for exec nodes (#11947 )	2022-08-22 08:56:05 +08:00
Pxl	64dc3b360f	[Bug](function) fix dcheck fail on close vexpr ctx (#11908 )	2022-08-19 19:11:10 +08:00
wangbo	3a49156e30	[performance] (vectorization)optimize In Expr (#11826 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-17 10:46:37 +08:00
Xinyi Zou	ecbf87d77b	[bugfix](memtracker)fix exceed memory limit log (#11485 )	2022-08-04 10:22:20 +08:00
zhangstar333	1f30e563a7	[refactor][vectorized] refactor first/last value agg functions (#10661 ) * refactor first and last [refactor][vectorized] refactor first/last value agg functions * add some change * remove first/last about always nullable * remove always nullable and register it * refactor value remove bool null flag * refactor win first last to ptr and pos	2022-07-30 18:38:56 +08:00
Xinyi Zou	18864ab7fe	weak relationship between MemTracker and MemTrackerLimiter (#11347 )	2022-07-30 18:33:54 +08:00
Gabriel	3fe7b21ac8	[Improvement](vectorized) Remove row-based conjuncts on vectorized nodes (#11324 )	2022-07-29 15:42:06 +08:00
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00
HappenLee	fdb4193e1b	[Vectorized][Refactor] Refactor the function of `tuple_is_null`, only do work in hash join node (#11109 )	2022-07-23 11:50:07 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Xinyi Zou	d9095922d9	[Enhancement] [Memory] add strict memory usage compile option STRICT_MEMORY_USE (#10936 ) In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value. STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.	2022-07-18 16:16:43 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
luozenglin	d5ea677282	[feature](tracing) Support query tracing to improve doris observability by introducing OpenTelemetry. (#10533 ) The collection of query traces is implemented in fe and be, and the spans are exported to zipkin. DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry	2022-07-09 15:50:40 +08:00
TengJianPing	3e87960202	[bugfix] fix bug of vhash join build (#10614 ) * [bugfix] fix bug of vhash join build * format code	2022-07-05 19:14:42 +08:00
TengJianPing	1f1bdaa9c3	[bugfix] fix coredump of left anti join (#10591 )	2022-07-04 22:29:41 +08:00
yiguolei	4ec6e3ee81	[refactor] Remove debug action since it is never used. (#10484 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-29 20:37:51 +08:00
Pxl	6a566ccb74	[Enhancement][Vectorized] add constexpr_loop_match (#10283 )	2022-06-29 14:58:50 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00
Gabriel	14a9a676e7	[BUG] fix DCHECK failed (#10396 )	2022-06-25 17:08:40 +08:00
HappenLee	2cc670dba6	[fix](vectorized) Support outer join for vectorized exec engine (#10323 ) In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple.	2022-06-24 08:59:30 +08:00
HappenLee	fa13bef3da	[Bug][Vectorized] Fix coredump in other join conjunt is const expr (#10223 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-23 13:27:32 +08:00
Gabriel	60147ad7a5	[Improvement] build runtime filters asynchronously (#10186 )	2022-06-17 11:09:13 +08:00
Xinyi Zou	c8d303a82c	[bugfix] Fix BE core about vectorized join build thread memtracker switch, and FileStat duplicate	2022-05-31 19:12:42 +08:00
Amos Bird	63aab5ee5d	[Bugfix(Vec)] Fix some memory leak issues (#9824 )	2022-05-29 23:04:11 +08:00
jacktengg	f4dd3bf013	[bugfix] fix memleak in olapscannode(#9736 )	2022-05-26 15:06:54 +08:00
Dongyang Li	90e8cda5f2	[Enhancement](Vectorized)build hash table with new thread, as non-vec… (#9290 ) * [Enhancement][Vectorized]build hash table with new thread, as non-vectorized past do edit after comments * format code with clang format Co-authored-by: lidongyang <dongyang.li@rateup.com.cn> Co-authored-by: stephen <hello-stephen@qq.com>	2022-05-24 10:23:15 +08:00
HappenLee	5039ec4570	[vec][opt] opt hash join build resize hash table before insert data (#9735 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-23 15:13:57 +08:00
HappenLee	500c36717d	[Bug-Fix][Vectorized] Full join return error result (#9690 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-23 13:29:37 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
HappenLee	71ac86b183	[improvement](join) Support join project in query engine (#8722 )	2022-03-31 23:00:07 +08:00
Xinyi Zou	aaaaae53b5	[feature] (memory) Switch TLS mem tracker to separate more detailed memory usage (#8605 ) In pr #8476, all memory usage of a process is recorded in the process mem tracker, and all memory usage of a query is recorded in the query mem tracker, and it is still necessary to manually call `transfer to` to track the cached memory size. We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker. In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately counts more detailed memory usage than before.	2022-03-24 14:29:34 +08:00
HappenLee	36c85d2f06	[fix][vectorized] Fix bug of left semi/anti with other join conjunct (#8596 )	2022-03-23 10:34:47 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
awakeljw	705989d239	[improvement](VHashJoin) add probe timer (#8233 )	2022-03-13 20:54:44 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00

1 2

59 Commits