doris

Author	SHA1	Message	Date
lihangyu	e7f143c266	[Fix](topn opt) forbit outfile when using 2phase read (#21991 ) "Enabling two-phase query for similar select * from tbl into outfile "file:/xxx/" format as orc; queries can lead to performance issues due to the fetch operation."	2023-07-20 10:32:30 +08:00
zhangstar333	c364196577	[fuzzy](test) set topnOptLimitThreshold to 0 in fuzzy test temporary (#21952 ) Now P0 pipeline test have some failed cese about topn, but can't reproduce at local So set this threshold to 0 temporary.	2023-07-20 10:22:22 +08:00
Jack Drogon	28a6a2e44d	[Enhancement](binlog) Add partitionRange && indexIds in UpsertRecord && PartitionCommitInfo (#22005 )	2023-07-20 09:52:21 +08:00
zy-kkk	2daad2151d	[enhancement](jdbc catalog) Add mysql jdbc catalog function to filter push-down identification (#21745 )	2023-07-19 23:48:23 +08:00
LiBinfeng	58f2593ba1	[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171 ) Problem: When inferring predicate, we assume that slot reference need to be inferred. But in this case: carete table tb1(l1 smallint) ...; create table tb2(l2 int) ...; select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1; We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2. Solved: Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error. For example: select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax); tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.	2023-07-19 23:14:26 +08:00
zhangstar333	a51aab6d29	[FE](compile) fix master fe compile failed (#21971 ) fix master fe compile failed	2023-07-19 18:02:00 +08:00
jakevin	0fa3efae1d	[fix](Nereids): removePhysicalExpression() should clear empty Group. (#21951 )	2023-07-19 14:41:06 +08:00
minghong	bd40767754	[stats](nereids) dump col stats for all physical plan node and cost details in memo #21902 1. print cost detail 2. dump col stats in memo	2023-07-19 14:10:26 +08:00
mch_ucchi	f668b3965e	[Enhancement](Nereids)enable nereids DML by default. (#21539 ) TODO: fix cast agg_state type when do insert	2023-07-19 13:52:15 +08:00
Xiaocc	d8272b16e9	[fix](fe) fd leak of ssl #19645	2023-07-19 12:45:54 +08:00
morrySnow	d987f782d2	[refactor](Nereids) refactor cte analyze, rewrite and reuse code (#21727 ) REFACTOR: 1. Generate CTEAnchor, CTEProducer, CTEConsumer when analyze. For example, statement `WITH cte1 AS (SELECT * FROM t) SELECT * FROM cte1`. Before this PR, we got analyzed plan like this: ``` logicalCTE(LogicalSubQueryAlias(cte1)) +-- logicalProject() +-- logicalCteConsumer() ``` we only have LogicalCteConsumer on the plan, but not LogicalCteProducer. This is not a valid plan, and should not as the final result of analyze. After this PR, we got analyzed plan like this: ``` logicalCteAnchor() \|-- logicalCteProducer() +-- logicalProject() +-- logicalCteConsumer() ``` This is a valid plan with LogicalCteProducer and LogicalCteConsumer 2. Replace re-analyze unbound plan with deepCopy plan when do CTEInline Because we generate LogicalCteAnchor and LogicalCteProducer when analyze. So, we could not do re-analyze to gnerate CTE inline plan anymore. The another reason is, we reuse relation id between unbound and bound relation. So, if we do re-analyze on unresloved CTE plan, we will get two relation with same RelationId. This is wrong, because we use RelationId to distinguish two different relations. This PR implement two helper class to deep copy a new plan from CTEProducer. `LogicalPlanDeepCopier` and `ExpressionDeepCopier` 3. New rewrite framework to ensure do CTEInline in right way. Before this PR, we do CTEInline before apply any rewrite rule. But sometimes, some CteConsumer could be eliminated after rewrite. After this PR, we do CTEInline after the plans relaying on CTEProducer have been rewritten. So we could do CTEInline if some the count of CTEConsumer decrease under the threshold of CTEInline. 4. add relation id to all relation plan node 5. let all relation generated from table implement trait CatalogRelation 6. reuse relation id between unbound relation and relation after bind ENHANCEMENT: 1. Pull up CTEAnchor before RBO to avoid break other rules' pattern Before this PR, we will generate CTEAnchor and LogicalCTE in the middle of plan. So all rules should process LogicalCTEAnchor, otherwise will generate unexpected plan. For example, push down filter and push down project should add pattern like: ``` logicalProject(logicalCTE) ... logicalFilter(logicalCteAnchor) ... ``` project and filter must be push through these virtual plan node to ensure all project and filter could be merged togather and get right order of them. for Example: ``` logicalProject +-- logicalFilter +-- logicalCteAnchor +-- logicalProject +-- logicalFilter +-- logicalOlapScan ``` upper plan will lead to translation error. because we could not do twice filter and project on bottom logicalOlapScan. BUGFIX: 1. Recursive analyze LogicalCTE to avoid bind outer relation on inner CTE For example ```sql SELECT * FROM (WITH cte1 AS (SELECT * FROM t1) SELECT * FROM cte1)v1, cte1 v2; ``` Before this PR, we will use nested cte name to bind outer plan. So the outer cte1 with alias v2 will bound on the inner cte1. After this PR, the sql will throw Table not exists exception when binding. 2. Use right way do withChildren in CTEProducer and remove projects in it Before this PR, we add an attr named projects in CTEProducer to represent the output of it. This is because we cannot get right output of it by call `getOutput` method on it. The root reason of that is the wrong implementation of computeOutput of LogicalCteProducer. This PR fix this problem and remove projects attr of CTEProducer. 3. Adjust nullable rule update CTEConsumer's output by CTEProducer's output This PR process nullable on LogicalCteConsumer to ensure CteConsumer's output with right nullable info, if the CteProducer's output nullable has been adjusted. 4. Bind set operation expression should not change children's output's nullable This PR use fix a problem introduced by prvious PR #21168. The nullable info of SetOperation's children should not changed after binding SetOperation.	2023-07-19 11:41:41 +08:00
lihangyu	c28b90a301	[Bug](topn opt) disable topn 2 phase read when storage policy is not emtpy (#21909 )	2023-07-19 10:28:41 +08:00
zclllyybb	1818526fba	[fix](profile) Fix wrong instance number in query profile (#21808 )	2023-07-19 10:00:48 +08:00
xzj7019	c993663827	[fix](nereids) fix cte as bc right side hang bug (#21897 ) During original computeMultiCastFragmentParams process, we don't handle the scenario the cte as the broadcast right side, which will lead the missing setting of the buildHashTableForBroadcastJoin flag true and finally the sql hang.	2023-07-19 09:43:31 +08:00
starocean999	5b043a980e	[fix](planner)only forbid literal value in AnalyticExpr's order by list (#21819 ) * [fix](planner)only forbid literal value in AnalyticExpr's order by list	2023-07-19 09:40:55 +08:00
AKIRA	d349c955f0	[fix](nereids) Disable auto analyze temporarily #21919	2023-07-19 09:27:24 +08:00
Siyang Tang	24c00698f2	[fix](stmt-forward) fix should-be-required fields in forward params (#21945 ) * fix-optional-fields-in-forward-param * fix reviewed	2023-07-19 01:52:50 +08:00
Pxl	0de94e857f	[Bug](materialized view) fix wrong match mv when mv have where clause (#21797 )	2023-07-19 01:11:39 +08:00
herry2038	802d73f16d	[optimization](heartbeart) Rm startuptime from front heart beart class (#21904 ) --------- Co-authored-by: yuxianbing <iloveqaz123>	2023-07-19 00:56:36 +08:00
Stalary	f6bfe058be	[Fix](information_schema) Schema table varchar len error #21308	2023-07-19 00:50:01 +08:00
starocean999	fff1983f40	[fix](planner)use tupleId of agg node to get its unsigned conjuncts (#21949 )	2023-07-19 00:46:49 +08:00
yujun	beec0e9169	[Improvement](tablet clone) impr tablet sched speed and fix tablet sched failed too many times (#21856 )	2023-07-18 23:25:22 +08:00
Ashin Gau	dcb165cc9f	[opt](hudi) get hudi split concurrently by using parallelStream (#21871 ) This PR contains two optimizations: 1. Using parallel stream to get hoodie splits concurrently. It reduce the split time from 1min20s to 12s when splitting 10,000 partitions. 2. Reading hoodie meta table to get table partitions. It reduce the getting partition time from 12min to 3s when reading 10,000 partitions.	2023-07-18 23:19:34 +08:00
morrySnow	d6d27ef428	[fix](Nereids) join other conjuncts should get slot from join output (#21840 )	2023-07-18 18:22:40 +08:00
Siyang Tang	e654b5ddfc	[enhancement](broker-load) support special partition path pattern (#21778 ) Some users may have non-ACID path like `/path/to/k=v/1/filename`, introducing by HQL statement `insert into union all`, for which path partition `k=v` should be parsed normally in broker load.	2023-07-18 14:50:37 +08:00
starocean999	ec12a4159a	[fix](planner) push conjuncts into SetOperationStmt inline view (#21718 ) * [fix](planner)push conjuncts into SetOperationStmt inline view	2023-07-18 14:17:07 +08:00
Xiangyu Wang	50b81a9c13	[Fix](multi-catalog) Filter invisible files for hive table. (#21867 ) In fact, hive can not read files which startswith "." or "_", so we need filter these files.	2023-07-18 13:08:12 +08:00
Pxl	417e3e5616	[Feature](delete) support fold constant on delete stmt (#21833 ) support fold constant on delete stmt	2023-07-18 12:56:28 +08:00
Pxl	19492b06c1	[Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890 ) fix failed on test_dup_tab_decimalv3 due to wrong precision	2023-07-18 12:53:09 +08:00
mch_ucchi	e1a116af94	[fix](planner)normalize the behavior of from_unixtime() according to Nereids planner (#21723 ) if from_unixtime() receive an integer out of int range, the function returns null.	2023-07-18 12:15:38 +08:00
starocean999	07e720e65d	[fix](planner)need recalculate nullable info of output slots for join node (#21650 ) * [fix](planner)need recalculate nullable info of output slots for join node	2023-07-18 12:10:27 +08:00
Jibing-Li	489171e4c1	[Fix](multi catalog)Fix hive partition value contains special character such as / bug (#21876 ) Hive escapes some special characters in partition value to %XX, for example, / is escaped to %2F. Doris didn't handle this case which will cause doris failed to list the files under partition with special characters. This pr is to fix this bug.	2023-07-18 11:20:38 +08:00
yujun	ebd2a4b707	[fix](dynamic partition) fix create hot partition failed without error response (#20996 )	2023-07-18 10:56:37 +08:00
Mryange	b656f31cf2	[Enchancement](compatible) show decimalv3 to decimal (#21782 )	2023-07-18 09:17:14 +08:00
zhangstar333	b6517ed83b	[Enhance](function) add boolean type for sum agg function (#21862 ) before the sum agg not register for boolean type, so it need cast to other type can execute.	2023-07-18 08:06:52 +08:00
Tiewei Fang	83e5a29855	[Fix](Export) fix nullptr exception when upgrading from 1.2.3 to 2.0 (#21799 )	2023-07-18 00:07:09 +08:00
AKIRA	05cf095506	[feature](stats) Support full auto analyze (#21192 ) 1. Auto analyze all tables except for internal tables 2. make resource used by analyze configurable	2023-07-17 20:42:57 +08:00
yujun	be750e88b2	[fix](clone) fix cannot further repair clone replica which miss version data (#21382 )	2023-07-17 20:00:50 +08:00
zy-kkk	014b34bebb	[enhancement](jdbc catalog) Add mysql jdbc url param `rewriteBatchedStatements=true` (#21864 ) When `rewriteBatchedStatements=false`, the JDBC driver will not merge multiple insert statements into one larger insert statement. Therefore, during the batch insertion process, each insert statement needs to be sent to the MySQL server individually, leading to a higher number of network roundtrips. Network latency could potentially be a significant factor contributing to the performance degradation. For this reason, we propose to set this parameter to true by default, to enhance the performance of prepared statement batch inserts.	2023-07-17 17:39:26 +08:00
Jibing-Li	a92508c3f9	[Fix](statistics) Fix analyze db always use internal catalog bug (#21850 ) `Analyze database db_name ` command couldn't use current catalog, it is always using the internal catalog. This will cause the command failed to find the db. This pr is to fix this bug.	2023-07-17 15:28:54 +08:00
Mingyu Chen	5fc0a84735	[improvement](catalog) reduce the size thrift params for external table query (#21771 ) ### 1 In previous implementation, for each FileSplit, there will be a `TFileScanRange`, and each `TFileScanRange` contains a list of `TFileRangeDesc` and a `TFileScanRangeParams`. So if there are thousands of FileSplit, there will be thousands of `TFileScanRange`, which cause the thrift data send to BE too large, resulting in: 1. the rpc of sending fragment may fail due to timeout 2. FE will OOM For a certain query request, the `TFileScanRangeParams` is the common part and is same of all `TFileScanRange`. So I move this to the `TExecPlanFragmentParams`. After that, for each FileSplit, there is only a list of `TFileRangeDesc`. In my test, to query a hive table with 100000 partitions, the size of thrift data reduced from 151MB to 15MB, and the above 2 issues are gone. ### 2 Support when setting `max_external_file_meta_cache_num` <=0, the file meta cache for parquet footer will not be used. Because I found that for some wide table, the footer is too large(1MB after compact, and much more after deserialized to thrift), it will consuming too much memory of BE when there are many files. This will be optimized later, here I just support to disable this cache.	2023-07-17 13:37:02 +08:00
lihangyu	1101d7d947	[chore](topn opt) disable two phase read when light schema change is disabled (#21809 )	2023-07-17 12:46:28 +08:00
zy-kkk	03b575842d	[Feature](table function) support explode_json_array_json (#21795 )	2023-07-17 11:40:02 +08:00
zclllyybb	d0775f8209	[log](profile) add doris version info to query profile (#21501 )	2023-07-17 11:18:05 +08:00
Pxl	86841d8653	[Bug](materialized-view) fix some problems of mv and make ssb mv work on nereids (#21559 ) fix some problems of mv and make ssb mv work on nereids	2023-07-17 10:08:25 +08:00
herry2038	6fba092741	[optimization](show-frontends) Add start time in Show frontends (#21844 ) --------- Co-authored-by: yuxianbing <iloveqaz123>	2023-07-17 05:09:43 +08:00
starocean999	7a61953d17	[fix](nereids)SimplifyComparisonPredicate rule need special care for deicmalv3 and datetimev2 literal (#21575 )	2023-07-14 23:05:14 +08:00
mch_ucchi	c9a99ce171	[Feature](Nereids) support udf for Nereids (#18257 ) Support alias function, Java UDF, Java UDAF for Nereids. Implementation: UDFs(alias function, Java UD(A)F) are saved in database object, we get it by FunctionDesc, which requires function name and arg types. So firstly we bind expressions of its children so that we can get the return type of args. Then we get the best selection. Secondly: For alias function: The original function of the alias function is represented as original planner-style function, it's too hard to translate it to nereids-style expression hence we transfer it to the corresponding sql and parse it. Now we get the nereids-style function, and try to bind the function. the bound function will also change the type by add cast node of its children to its expecting input types, so that if we travel a bound function more than one times, the cast node will be different. To solve the problem, we add a flag isAnalyzedFunction. it's set false by default and will be set true when return from the visitor function. If the flag is true, it will return immediately in visitor function. Now we can ensure that the bound functions in children will be the same though we travel it more than one time. we can replace the alias function to its original function and bind the unbound functions. For JavaUDF and JavaUDAF JavaUDF and JavaUDAF can be recognized as a catalog function and hard to be entirely translated to Nereids-style function, we create a nereids expression object JavaUdf and JavaUdaf to wrap it. All in all, now Nereids support UDFs and nesting them.	2023-07-14 17:02:01 +08:00
DeadlineFen	d57bb84842	[Enhancement] (binlog) TBinlog and BinlogManager V2 (#21674 )	2023-07-14 16:59:32 +08:00
minghong	f95d728d3e	[shape](nereids) TPCDS check all query shape, except ds64 (#21742 ) there is a known bug on ds64 analyze. add ds 64 shape check latter	2023-07-14 16:56:46 +08:00

... 60 61 62 63 64 ...

8289 Commits