doris

Author	SHA1	Message	Date
HappenLee	228e5afad8	[Load](Sink) remove validate the column data when data is NULL (#13919 )	2022-11-03 08:33:45 +08:00
Fy	e021705053	[feature](nereids) support common table expression (#12742 ) Support common table expression(CTE) in Nereids： - Just implemented inline CTE, which means we will copy the logicalPlan of CTE everywhere it is referenced; - If the name of CTE is the same as an existing table or view, we will choose CTE first;	2022-11-02 23:41:53 +08:00
qiye	b83744d2f6	[feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766 )	2022-11-02 23:15:57 +08:00
Mingyu Chen	0ea7f85986	[fix](keyword) add BIN as keyword (#13907 )	2022-11-02 22:30:43 +08:00
HappenLee	fbc8b7311f	[Opt](function) opt the function of ndv (#13887 )	2022-11-02 22:21:20 +08:00
Jerry Hu	62f765b7f5	[improvement](scan) speed up inserting strings into ColumnString (#13397 )	2022-11-02 22:19:02 +08:00
mch_ucchi	53814e466b	[Enhancement](Nereids)optimize merge group in memo #13900	2022-11-02 20:42:55 +08:00
zhangstar333	374303186c	[Vectorized](function) support topn_array function (#13869 )	2022-11-02 19:49:23 +08:00
ZenoYang	b26d8f284c	[fix](rpc) The proxy removed when rpc exception occurs is not an abnormal proxy (#13836 ) `BackendServiceProxy.getInstance()` uses the round robin strategy to obtain the proxy, so when the current RPC request is abnormal, the proxy removed by `BackendServiceProxy.getInstance().removeProxy(...)` is not an abnormal proxy.	2022-11-02 19:39:33 +08:00
924060929	6eea855e78	[feature](Nereids) Support lots of scalar function and fix some bug (#13764 ) Proposed changes 1. function interfaces that can search the matched signature, say ComputeSignature. It's equal to the Function.CompareMode. - IdenticalSignature: equal to Function.CompareMode.IS_IDENTICAL - NullOrIdenticalSignature: equal to Function.CompareMode.IS_INDISTINGUISHABLE - ImplicitlyCastableSignature: equal to Function.CompareMode.IS_SUPERTYPE_OF - ExplicitlyCastableSignature: equal to Function.CompareMode.IS_NONSTRICT_SUPERTYPE_OF 3. generate lots of scalar functions 4. bug-fix: disassemble avg function compute wrong result because the wrong input type, the AggregateParam.inputTypesBeforeDissemble is use to save the origin input type and pass to backend to find the correct global aggregate function. 5. bug-fix: subquery with OneRowRelation will crash because wrong nullable property Note: 1. currently no more unit test/regression test for the scalar functions, I will add the test until migrate aggregate functions for unified processing. 2. A known problem is can not invoke the variable length function, I will fix it later.	2022-11-02 18:01:08 +08:00
shee	a871fef815	[Improve](Nereids): refactor eliminate outer join (#13402 ) Refactor eliminate outer join #12985 Evaluate the expression with ConstantFoldRule. If the evaluation result is NULL or FALSE, then the elimination condition is satisfied.	2022-11-02 17:39:05 +08:00
morrySnow	1bafb26217	[fix](Nereids) throw NPE when call getOutputExprIds in LogicalProperties (#13898 )	2022-11-02 16:52:18 +08:00
morrySnow	699ffbca0e	[enhancement](Nereids) generate correct distribution spec after project (#13725 ) after project, some Slot maybe project to another one. So we need to replace ExprId in DistributionSpecHash to the new one. if we do project other than Alias, We need to return DistributionSpecAny other than child's DistributionSpec.	2022-11-02 16:50:44 +08:00
xueweizhang	f2a0adf34e	[fix](fe) Inconsistent behavior for string comparison in FE and BE (#13604 )	2022-11-02 15:32:13 +08:00
morrySnow	6f3db8b4b4	[enhancement](Nereids) add eliminate unnecessary project rule (#13886 ) This rule eliminate project that output set is same with its child. If the project is the root of plan, the elimination condition is project's output is exactly the same with its child. The reason to add this rule is when we do join reorder in optimization, the root of plan after transformed maybe a Project and its output set is same with the root of plan before transformed. If we had a Project on the top of the root and its output set is same with the root of plan too. We will have two exactly same projects in memo. One of them is the parent of the other. After MergeProject, we will get a new Project exactly same like the child and need to add to parent's group. Then we trigger Merge Group. Since merge will produce a cycle, the merge will be denied and we will get a final plan with two consecutive projects. ## for example: BEFORE OPTIMIZATION ``` LogicalProject1( projects=[c_custkey#0, c_name#1]) [GroupId#1] +--LogicalJoin(type=LEFT_SEMI_JOIN) [GroupId#2] \|--LogicalProject(...) \| +--LogicalJoin(type=INNER_JOIN) \| ... +--LogicalOlapScan(...) ``` AFTER APPLY RULE: LOGICAL_SEMI_JOIN_LOGICAL_JOIN_TRANSPOSE_PROJECT ``` LogicalProject1( projects=[c_custkey#0, c_name#1]) [GroupId#1] +--LogicalProject2( projects=[c_custkey#0, c_name#1]) [GroupId#2] +--LogicalJoin(type=INNER_JOIN) [GroupId#10] \|--LogicalProject(...) \| +--LogicalJoin(type=LEFT_SEMI_JOIN) \| ... +--LogicalOlapScan(...) ``` AFTER APPLY RULE: MERGE_PROJECTS ``` LogicalProject3( projects=[c_custkey#0, c_name#1]) [should be in GroupId#1, but in GroupId#2 in fact] +--LogicalJoin(type=INNER_JOIN) [GroupId#10] \|--LogicalProject(...) \| +--LogicalJoin(type=LEFT_SEMI_JOIN) \| ... +--LogicalOlapScan(...) ``` Since we have exaclty GroupExpression(LogicalProject3 and LogicalProject2) in GroupId#1 and GroupId#2, we need to do MergeGroup(GroupId#1, GroupId#2). But we have child of GroupId#1 in GroupId#2. So the merge is denied. If the best GroupExpression in GroupId#2 is LogicalProject3, we will get two consecutive projects in the final plan.	2022-11-02 14:16:03 +08:00
Adonis Ling	ba918b40e2	[chore](macOS) Fix compilation errors caused by the deprecated function (#13890 )	2022-11-02 13:34:51 +08:00
Mingyu Chen	ee8dffbfb7	[meta](recover) change dropInfo and RecoverInfo to GSON (#13830 )	2022-11-02 13:32:46 +08:00
luozenglin	e6080a6e4c	[regression](join) add right anti join with other predicate regression case (#13815 )	2022-11-02 13:27:58 +08:00
Mingyu Chen	d5becdb4a1	[fix](dynamic-partition) fix wrong check of replication num (#13755 )	2022-11-02 12:55:33 +08:00
Mingyu Chen	667cfe5598	[community](collaborators) add more collaborators (#13880 )	2022-11-02 12:54:04 +08:00
Pxl	be124523f4	[enhancement](profile) add profile to show column predicates (#13862 )	2022-11-02 09:07:26 +08:00
starocean999	277025b046	[fix](join)ColumnNullable need handle const column with nullable const value (#13866 )	2022-11-02 08:52:49 +08:00
caoliang-web	bd6070d9b3	[doc](spark-doris-connetor)Add spark Doris connector to support streamload documentation #13834	2022-11-02 08:43:52 +08:00
wxy	3fc1b27c40	[docs](tablet-docs) fix the tablet-repair-and-balance.md doucument. (#13853 ) Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2022-11-02 08:43:08 +08:00
wxy	947e67fa76	[enhancement](test) retry start be or fe when port has been bind. (#13860 ) Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2022-11-02 08:42:35 +08:00
Mingyu Chen	0eeb4d2881	[minor](log) remove some e.printStackTrace() (#13870 )	2022-11-02 08:42:10 +08:00
yiguolei	de1dc62843	[enhancement](olap scanner) Scanner row bytes buffer is too small bug (#13874 ) * [enhancement](olap scanner) Scanner row bytes buffer is too small, please try to increase be config Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-02 08:41:50 +08:00
jiafeng.zhang	7fedfdcf6a	[fix](spark load)The where condition does not take effect when spark load loads the file (#13803 )	2022-11-01 23:01:45 +08:00
Yulei-Yang	3924ecead5	[minor](load) Improve error message for string type in loading process (#13718 )	2022-11-01 22:02:33 +08:00
Yongqiang YANG	8b3afd431e	[improvement](memory) simplify memory config related to tcmalloc (#13781 ) There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage. If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.	2022-11-01 21:45:19 +08:00
Gabriel	287a739510	[javaudf](string) Fix string format in java udf (#13854 )	2022-11-01 21:25:12 +08:00
minghong	7f34698eef	[enhancement](Nereids) use join estimation v2 only when stats derive v2 is enable (#13845 ) join estimation V2 should be invoked when enableNereidsStatsDeriveV2=true	2022-11-01 20:38:39 +08:00
minghong	f0c9867af3	[fix](nereids) map literal to double in FilterSelectivityCalculator (#13776 ) fix literal to double bug: all literal type implements getDouble() function	2022-11-01 20:20:44 +08:00
morrySnow	01f9f8ad43	[enhancement](Nereids) add merge project rule to column prune rule set (#13835 ) when we do column prune, we add project on child plan. If child plan is Project. we need to merge them.	2022-11-01 20:17:53 +08:00
qiye	61c817f4cc	[feature](syntax) support SELECT * EXCEPT (#13844 ) * [feature](syntax) support SELECT * EXCEPT: add regression test	2022-11-01 19:41:25 +08:00
minghong	1eef986e75	[feature](nereids) add rule for semi/anti join exploration, when there is project between them (#13756 )	2022-11-01 19:07:25 +08:00
Lightman	f30b974d54	[Bugfix](upgrade) Fix 1.1 upgrade 1.2 coredump when schema change (#13822 ) When upgrade 1.2 version from 1.1, FE version will don't match BE version for a period of time. After upgrade BE and doing schema change, BE will use a field desc_tbl that add in 1.2 version FE. BE will coredump because the field desc_tbl is nullptr. So it need to refuse the request.	2022-11-01 17:35:24 +08:00
TengJianPing	c14277e587	[fix](analytic) fix coredump cause by empty analytic parameter types (#13808 ) * fix fe compile error	2022-11-01 17:25:36 +08:00
jakevin	83e55cade8	[feature](Nereids): add rule for matching plan into HyperGraph. (#13805 )	2022-11-01 14:57:25 +08:00
Mingyu Chen	942611c185	Revert "[enhancement](compaction) opt compaction task producer and quick compaction (#13495 )" (#13833 ) This reverts commit 4f2ea0776ca3fe5315ab5ef7e00eefabfb5771a0.	2022-11-01 14:22:12 +08:00
AlexYue	7db916fc85	[enhancement](metric)Add metric for exec_state prepare function (#13646 ) * add bvar metric for exec_state prepare function	2022-11-01 14:09:47 +08:00
HappenLee	e63608b556	[Bug](test) fix some test case result is ramdom (#13837 )	2022-11-01 14:06:47 +08:00
Gabriel	42b2725f03	[Bug](delete) Fix wrong delete operation (#13840 )	2022-11-01 13:38:43 +08:00
morrySnow	34e68a41dd	[enhancement](explain) add cardinality to explain string and explain graph (#13720 ) 1. set cardinality when translate Nereids plan to legacy planner's plan 2. print cardinality when use EXPLAIN GRAPH	2022-11-01 11:43:21 +08:00
Pxl	164ca1e1a8	[Bug](function) change log fatal to log warning to avoid code dump on nullable double column cast to decimal column (#13819 )	2022-11-01 09:54:35 +08:00
TengJianPing	7f2166b1fd	[fix](thrift) fix that thrift struct sequence number is not consistent in 1.1-lts and master (#13829 )	2022-11-01 09:14:33 +08:00
morrySnow	b27714542d	[fix](planner) infer predicate could generate predicates in another scope (#13691 ) * [fix](planner) infer predicate could generate predicates in another scope	2022-11-01 09:03:41 +08:00
minghong	d2c5c1af3b	[feature](regression) add custom config file for Regression: regression-conf-custom.groovy (#13783 )	2022-10-31 22:49:06 +08:00
carlvinhust2012	cc0fa5fef6	[fix](array-type) fix the be core dump when import array<largeint> (#13821 ) - this pr is used to fix the be core dump when import array. - before the change, we import array by rapidjson string will core dump under the non-vectorized scenario. - after the change, we can import array by rapidjson string successfully.	2022-10-31 22:08:55 +08:00
jakevin	36a47dfe16	[enhancement](Nereids): use ImmutableList explicitly in Plan (#13817 )	2022-10-31 20:23:30 +08:00

... 25 26 27 28 29 ...

8276 Commits