Commit Graph

5755 Commits

Author SHA1 Message Date
ab11dea98d [Enhancement](config) optimize behavior of default_storage_medium (#20739) 2023-07-20 22:00:11 +08:00
7d488688b4 [fix](multi-catalog)fix minio default region and throw minio error msg, support s3 bucket root path (#21994)
1. check minio region, set default region if user region is not provided, and throw minio error msg
2. support read root path s3://bucket1
3. fix max compute public access
2023-07-20 20:48:55 +08:00
eabd5d386b [Fix](multi catalog)Fix nereids context table always use internal catalog bug (#21953)
The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal 
catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a 
table or get the wrong table while querying external table. This pr is to fix this bug.
2023-07-20 20:32:01 +08:00
e4ac52b2aa [Improvement](profile)Add init and finalize external scan node time in profile (#21749)
Add more profile information for external table plan time. Including init and finalize scan node time, getSplits time, create scan range time, get all partitions time and get all files for all partitions time. Also modified the Indentation to make it easier to read.

This is an example output of the new profile summary. 
```
    Execution  Summary:
          -  Analysis  Time:  3ms
          -  Plan  Time:  26s885ms
              -  JoinReorder  Time:  N/A
              -  CreateSingleNode  Time:  N/A
              -  QueryDistributed  Time:  N/A
              -  Init  Scan  Node  Time:  1ms
              -  Finalize  Scan  Node  Time:  26s868ms
                  -  Get  Splits  Time:  26s554ms
                      -  Get  PARTITIONS  Time:  20s189ms
                      -  Get  PARTITION  FILES  Time:  6s289ms
                  -  Create  Scan  Range  Time:  314ms
          -  Schedule  Time:  1s67ms
          -  Fetch  Result  Time:  56ms
          -  Write  Result  Time:  0ms
          -  Wait  and  Fetch  Result  Time:  57ms
```
2023-07-20 20:29:18 +08:00
0e8432526e [fix](multi-catalog) check properties when alter catalog (#20130)
When we were altering the catalog, we did not verify the new parameters of the catalog, and now we have added verification
My changes:
When We are altering the catalog, I have carried out a full inspection, and if an exception occurs, the parameters will be rolled back
2023-07-20 20:18:14 +08:00
aabe379527 [fix](stats) support utf-8 string range compare (#22024)
in previous version, some utf-8 string literal are mapped to negative double. this issue makes our range check misfunction.
2023-07-20 18:39:41 +08:00
ee65e0a6b1 [fix](Nereids) should not remove any limit from uncorrelated subquery (#21976)
We should not remove any limit from uncorrelated subquery. For Example
```sql
-- should return nothing, but return all tuple of t if we remove limit from exists
SELECT * FROM t WHERE EXISTS (SELECT * FROM t limit 0);

-- should return the tuple with smallest c1 in t,
-- but report error if we remove limit from scalar subquery
SELECT * FROM t WHERE c1 = (SELECT * FROM t ORDER BY c1 LIMIT 1);
```
2023-07-20 18:37:04 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
be2754e1a2 [fuzzy](modify) enable pipeline and nereids in regression env by default (#21824)
enable pipeline and nereids in regression env by default
2023-07-20 17:12:21 +08:00
4cfe990095 [enhancement](Nereids) add test framework for otherjoin (#21887) 2023-07-20 16:35:55 +08:00
365afb5389 [fix](sparkdpp) Hive table properties not take effect when create spark session (#21881)
When creating a Hive external table for Spark loading, the Hive external table includes related information such as the Hive Metastore. However, when submitting a job, it is required to have the hive-site.xml file in the Spark conf directory; otherwise, the Spark job may fail with an error message indicating that the corresponding Hive table cannot be found.

The SparkEtlJob.initSparkConfigs method sets the properties of the external table into the Spark conf. However, at this point, the Spark session has already been created, and the Hive-related parameters will not take effect. To ensure that the Spark Hive catalog properly loads Hive tables, you need to set the Hive-related parameters before creating the Spark session.

Co-authored-by: zhangshixin <zhangshixin@youzan.com>
2023-07-20 14:36:00 +08:00
86d7233b06 [fix](nereids) ExtractAndNormalizeWindowExpression rule should push down correct exprs to child (#21827)
consider the window function:
```sql
substr(
ref_1.cp_type,
sum(CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END) OVER (),
1)
```
Before the pr, only "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END" is pushed down.
But both "ref_1.cp_type" and "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END"
should be pushed down.
This pr fix it
2023-07-20 11:47:55 +08:00
0f116ce148 Revert "[Enhancement](Nereids)enable nereids DML by default. (#21539)" (#22013)
This reverts commit f668b3965effbd5df4902f20b496cb6b6642414c.
2023-07-20 11:32:54 +08:00
a859a93b22 [fix](Nereids) should not push down project to the nullable side of outer join (#21999) 2023-07-20 11:10:11 +08:00
3d0832d973 [fix](stmt-forward) fix forward null packet (#21979) 2023-07-20 10:45:16 +08:00
5c7c4d90b4 [improvement](catalog) return the root cause of error when forwarding init request to master FE (#22001) 2023-07-20 10:42:29 +08:00
e7f143c266 [Fix](topn opt) forbit outfile when using 2phase read (#21991)
"Enabling two-phase query for similar select * from tbl into outfile "file:/xxx/" format as orc; queries can lead to performance issues due to the fetch operation."
2023-07-20 10:32:30 +08:00
c364196577 [fuzzy](test) set topnOptLimitThreshold to 0 in fuzzy test temporary (#21952)
Now P0 pipeline test have some failed cese about topn, but can't reproduce at local
So set this threshold to 0 temporary.
2023-07-20 10:22:22 +08:00
28a6a2e44d [Enhancement](binlog) Add partitionRange && indexIds in UpsertRecord && PartitionCommitInfo (#22005) 2023-07-20 09:52:21 +08:00
2daad2151d [enhancement](jdbc catalog) Add mysql jdbc catalog function to filter push-down identification (#21745) 2023-07-19 23:48:23 +08:00
58f2593ba1 [Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171)
Problem:
When inferring predicate, we assume that slot reference need to be inferred. But in this case:
carete table tb1(l1 smallint) ...;
create table tb2(l2 int) ...;
select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1;
We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2.

Solved:
Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error.
For example:
select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax);
tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.
2023-07-19 23:14:26 +08:00
a51aab6d29 [FE](compile) fix master fe compile failed (#21971)
fix master fe compile failed
2023-07-19 18:02:00 +08:00
0fa3efae1d [fix](Nereids): removePhysicalExpression() should clear empty Group. (#21951) 2023-07-19 14:41:06 +08:00
bd40767754 [stats](nereids) dump col stats for all physical plan node and cost details in memo #21902
1. print cost detail
2. dump col stats in memo
2023-07-19 14:10:26 +08:00
f668b3965e [Enhancement](Nereids)enable nereids DML by default. (#21539)
TODO: fix cast agg_state type when do insert
2023-07-19 13:52:15 +08:00
d8272b16e9 [fix](fe) fd leak of ssl #19645 2023-07-19 12:45:54 +08:00
d987f782d2 [refactor](Nereids) refactor cte analyze, rewrite and reuse code (#21727)
REFACTOR:

1. Generate CTEAnchor, CTEProducer, CTEConsumer when analyze.

For example, statement `WITH cte1 AS (SELECT * FROM t) SELECT * FROM cte1`.
Before this PR, we got analyzed plan like this:
```
logicalCTE(LogicalSubQueryAlias(cte1))
+-- logicalProject()
    +-- logicalCteConsumer()
```
we only have LogicalCteConsumer on the plan, but not LogicalCteProducer.
This is not a valid plan, and should not as the final result of analyze.
After this PR, we got analyzed plan like this:
```
logicalCteAnchor()
|-- logicalCteProducer()
+-- logicalProject()
    +-- logicalCteConsumer()
```
This is a valid plan with LogicalCteProducer and LogicalCteConsumer

2. Replace re-analyze unbound plan with deepCopy plan when do CTEInline

Because we generate LogicalCteAnchor and LogicalCteProducer when analyze.
So, we could not do re-analyze to gnerate CTE inline plan anymore.
The another reason is, we reuse relation id between unbound and bound relation.
So, if we do re-analyze on unresloved CTE plan, we will get two relation
with same RelationId. This is wrong, because we use RelationId to distinguish
two different relations.
This PR implement two helper class to deep copy a new plan from CTEProducer.
`LogicalPlanDeepCopier` and `ExpressionDeepCopier`

3. New rewrite framework to ensure do CTEInline in right way.

Before this PR, we do CTEInline before apply any rewrite rule.
But sometimes, some CteConsumer could be eliminated after rewrite.
After this PR, we do CTEInline after the plans relaying on CTEProducer have
been rewritten. So we could do CTEInline if some the count of CTEConsumer
decrease under the threshold of CTEInline.

4. add relation id to all relation plan node
5. let all relation generated from table implement trait CatalogRelation
6. reuse relation id between unbound relation and relation after bind


ENHANCEMENT:

1. Pull up CTEAnchor before RBO to avoid break other rules' pattern

Before this PR, we will generate CTEAnchor and LogicalCTE in the middle of plan.
So all rules should process LogicalCTEAnchor, otherwise will generate unexpected plan.
For example, push down filter and push down project should add pattern like:
```
logicalProject(logicalCTE)
...
logicalFilter(logicalCteAnchor)
...
```
project and filter must be push through these virtual plan node to ensure all project
and filter could be merged togather and get right order of them. for Example:
```
logicalProject
+-- logicalFilter
    +-- logicalCteAnchor
        +-- logicalProject
            +-- logicalFilter
                +-- logicalOlapScan
```
upper plan will lead to translation error. because we could not do twice filter and
project on bottom logicalOlapScan.


BUGFIX:

1. Recursive analyze LogicalCTE to avoid bind outer relation on inner CTE

For example
```sql
SELECT * FROM (WITH cte1 AS (SELECT * FROM t1) SELECT * FROM cte1)v1, cte1 v2; 
```
Before this PR, we will use nested cte name to bind outer plan.
So the outer cte1 with alias v2 will bound on the inner cte1.
After this PR, the sql will throw Table not exists exception when binding.

2. Use right way do withChildren in CTEProducer and remove projects in it

Before this PR, we add an attr named projects in CTEProducer to represent the output
of it. This is because we cannot get right output of it by call `getOutput` method on it.
The root reason of that is the wrong implementation of computeOutput of LogicalCteProducer.
This PR fix this problem and remove projects attr of CTEProducer.

3. Adjust nullable rule update CTEConsumer's output by CTEProducer's output

This PR process nullable on LogicalCteConsumer to ensure CteConsumer's output with right
nullable info, if the CteProducer's output nullable has been adjusted.

4. Bind set operation expression should not change children's output's nullable

This PR use fix a problem introduced by prvious PR #21168. The nullable info of
SetOperation's children should not changed after binding SetOperation.
2023-07-19 11:41:41 +08:00
c28b90a301 [Bug](topn opt) disable topn 2 phase read when storage policy is not emtpy (#21909) 2023-07-19 10:28:41 +08:00
1818526fba [fix](profile) Fix wrong instance number in query profile (#21808) 2023-07-19 10:00:48 +08:00
c993663827 [fix](nereids) fix cte as bc right side hang bug (#21897)
During original computeMultiCastFragmentParams process, we don't handle the scenario the cte as the broadcast right side, which will lead the missing setting of the buildHashTableForBroadcastJoin flag true and finally the sql hang.
2023-07-19 09:43:31 +08:00
5b043a980e [fix](planner)only forbid literal value in AnalyticExpr's order by list (#21819)
* [fix](planner)only forbid literal value in AnalyticExpr's order by list
2023-07-19 09:40:55 +08:00
d349c955f0 [fix](nereids) Disable auto analyze temporarily #21919 2023-07-19 09:27:24 +08:00
24c00698f2 [fix](stmt-forward) fix should-be-required fields in forward params (#21945)
* fix-optional-fields-in-forward-param

* fix reviewed
2023-07-19 01:52:50 +08:00
Pxl
0de94e857f [Bug](materialized view) fix wrong match mv when mv have where clause (#21797) 2023-07-19 01:11:39 +08:00
802d73f16d [optimization](heartbeart) Rm startuptime from front heart beart class (#21904)
---------

Co-authored-by: yuxianbing <iloveqaz123>
2023-07-19 00:56:36 +08:00
f6bfe058be [Fix](information_schema) Schema table varchar len error #21308 2023-07-19 00:50:01 +08:00
fff1983f40 [fix](planner)use tupleId of agg node to get its unsigned conjuncts (#21949) 2023-07-19 00:46:49 +08:00
beec0e9169 [Improvement](tablet clone) impr tablet sched speed and fix tablet sched failed too many times (#21856) 2023-07-18 23:25:22 +08:00
dcb165cc9f [opt](hudi) get hudi split concurrently by using parallelStream (#21871)
This PR contains two optimizations:
1. Using parallel stream to get hoodie splits concurrently. It reduce the split time from 1min20s to 12s when splitting 10,000 partitions.
2. Reading hoodie meta table to get table partitions. It reduce the getting partition time from 12min to 3s when reading 10,000 partitions.
2023-07-18 23:19:34 +08:00
d6d27ef428 [fix](Nereids) join other conjuncts should get slot from join output (#21840) 2023-07-18 18:22:40 +08:00
e654b5ddfc [enhancement](broker-load) support special partition path pattern (#21778)
Some users may have non-ACID path like `/path/to/k=v/1/filename`, introducing by HQL statement `insert into union all`, for which path partition `k=v` should be parsed normally in broker load.
2023-07-18 14:50:37 +08:00
ec12a4159a [fix](planner) push conjuncts into SetOperationStmt inline view (#21718)
* [fix](planner)push conjuncts into SetOperationStmt inline view
2023-07-18 14:17:07 +08:00
50b81a9c13 [Fix](multi-catalog) Filter invisible files for hive table. (#21867)
In fact, hive can not read files which startswith "." or "_", so we need filter these files.
2023-07-18 13:08:12 +08:00
Pxl
417e3e5616 [Feature](delete) support fold constant on delete stmt (#21833)
support fold constant on delete stmt
2023-07-18 12:56:28 +08:00
Pxl
19492b06c1 [Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890)
fix failed on test_dup_tab_decimalv3 due to wrong precision
2023-07-18 12:53:09 +08:00
e1a116af94 [fix](planner)normalize the behavior of from_unixtime() according to Nereids planner (#21723)
if from_unixtime() receive an integer out of int range, the function returns null.
2023-07-18 12:15:38 +08:00
07e720e65d [fix](planner)need recalculate nullable info of output slots for join node (#21650)
* [fix](planner)need recalculate nullable info of output slots for join node
2023-07-18 12:10:27 +08:00
489171e4c1 [Fix](multi catalog)Fix hive partition value contains special character such as / bug (#21876)
Hive escapes some special characters in partition value to %XX, for example, / is escaped to %2F.
Doris didn't handle this case which will cause doris failed to list the files under partition with special characters.
This pr is to fix this bug.
2023-07-18 11:20:38 +08:00
ebd2a4b707 [fix](dynamic partition) fix create hot partition failed without error response (#20996) 2023-07-18 10:56:37 +08:00
b656f31cf2 [Enchancement](compatible) show decimalv3 to decimal (#21782) 2023-07-18 09:17:14 +08:00