Commit Graph

7218 Commits

Author SHA1 Message Date
d50c8b6d3a [Improvement](nereids) Query rewrite by mv support bitmap_union and bitmap_union_count roll up (#29418)
Query rewrite by mv support bitmap_union and bitmap_union_count roll up, aggregate functions which supports roll up is listed as following:

| 查询中函数            | 物化视图中函数      | 函数上卷后              |
|------------------|--------------|--------------------|
| max              | max          | max                |
| min              | min          | min                |
| sum              | sum          | sum                |
| count            | count        | sum                |
| count(distinct ) | bitmap_union | bitmap_union_count |
| bitmap_union | bitmap_union | bitmap_union|
| bitmap_union_count | bitmap_union | bitmap_union_count |

this depends on  https://github.com/apache/doris/pull/29256
2024-01-12 11:44:21 +08:00
2ca90b2bf1 [Refactor](dialect) Add sql dialect converter plugins (#28890)
The current logic for SQL dialect conversion is all in the `fe-core` module, which may lead to the following issues:
- Changes to the dialect conversion logic may occur frequently, requiring users to upgrade the Doris version frequently within the fe-core module, leading to a longer change cycle.
- The cost of customized development is high, requiring users to replace the fe-core JAR package.

Turning it into a plugin can address the above issues properly.
2024-01-12 11:44:20 +08:00
40badbf5c5 Fix analyze empty external NPE bug. (#29675) 2024-01-12 11:41:21 +08:00
54d2528c69 [Fix](Nereids) fix fe ut failed cause of getting statement context (#29683)
Problem:
fe ut failed cause of null pointer error
Cause:
fe ut getting statement context from connection context failed
Resolved:
add null pointer judgement
2024-01-12 11:41:06 +08:00
a2da434e3b [refactor](Nereids): refactor PredicatePropagation & support to infer Equal Condition (#29644) 2024-01-12 11:40:57 +08:00
8fc9c18c85 [improvement](jdbc catalog) Put the jdbc connection pool parameters into catalog properties (#29195) 2024-01-12 11:40:28 +08:00
30e46ee5ad [Fix](Job)Fixed the problem of not deleting JOB during DROP JOB metadata playback (#29543) 2024-01-12 11:40:19 +08:00
3cd1c7745a [fix](jdbc catalog) Fix the precision of decimal type mapping to 0 (#29407) 2024-01-12 11:39:57 +08:00
c10bcb666d [Fix](Nereids) change log level of warning of converting error to debug (#29660)
Co-authored-by: libinfeng <libinfeng@selectdb.com>
2024-01-12 11:39:49 +08:00
eea657a610 [rf](nereids)prune rf for external db according to jump count (#29634)
* prune some rf for external db
2024-01-12 11:37:16 +08:00
b59a8c9365 [feature](Nereids): refresh view hypergraph after inferring join (#29469) 2024-01-12 11:36:21 +08:00
971bc804ac [fix](Nereids) update and delete may produce exprs with same exprid (#29656) 2024-01-12 11:35:49 +08:00
847898bf26 [fix](Nereids) delete using should support sql without where (#29518) 2024-01-12 11:35:29 +08:00
ddaa645a4f [improvement](statistics) Force to use zonemap for collecting string type min max. (#29631)
Force to use zonemap for collecting string type min max.
String type is not using zonemap for min max, because zonemap value at BE side is truncated at 512 bytes which may cause the value not accurate. But it's OK for statisitcs min max, and this could also avoid scan whole table while sampling.
2024-01-12 11:34:07 +08:00
223e466514 [fix](insert-into) fix insert into lose data (#29802) 2024-01-11 16:47:25 +08:00
443b79d6ba [pipelineX](bug) Fix correctness problem using multiple BE (#29765) 2024-01-10 17:13:13 +08:00
3675e0302c [fix](nereids) generate correct order for runtime filter when contains NullSafeEquals hash condition (#29726)
Be do not support RF for NullSafeEquals, so fe not generate RF for them.
However, after we support NullSafeEquals as Hash join condition,
the order of RF is wrong when generating RF in FE. this PR fix it.
2024-01-10 10:33:45 +08:00
59d7f64360 [Fix](Nereids) fix pipelineX distribute expr list with child output expr ids (#29621) 2024-01-08 10:46:27 +08:00
0bdd007926 [improve](insert-into) add log when instance mark and done (#29636) 2024-01-08 10:11:12 +08:00
1ea51e9f20 [Feature](group commit) Support table property "group commit data bytes" (#29484) 2024-01-07 19:46:42 +08:00
2d89b7aed4 [fix](tablet sched) disable disk balance for single replica (#29576) 2024-01-07 19:21:42 +08:00
0b731800a0 [enhancement](group_commit) refector wal manager code (#29560) 2024-01-07 18:54:41 +08:00
eb4c389b0b [feature](function) support ip functions isipv4string and isipv6string (#28556) 2024-01-07 13:03:11 +08:00
734b258e15 [feature](create table) show create table print storage medium (#29080) 2024-01-06 22:40:51 +08:00
99754d7460 [improve](routine-load) remove maximum limit of routine load max_batch_interval (#29071) 2024-01-06 20:09:54 +08:00
db17f5fe79 [improve](move-memtbale) enable move memtable in routine load (#28974) 2024-01-06 18:22:01 +08:00
2adb0fcc50 [opt](hive) support orc generated from hive 1.x for all file scan node (#28806) 2024-01-06 17:33:16 +08:00
720bee7c1e [improve](stream-load) choose stream load coordinator by round robin (#28915) 2024-01-06 17:20:48 +08:00
bdc69a4175 [fix](nereids)index type should be converted to upper case for later comparasion (#29524) 2024-01-06 17:18:42 +08:00
612e0631ac Do not collect min max for agg table value columns while doing sample analyze. (#29483) 2024-01-06 17:15:40 +08:00
911635fac6 [feature](nereids) judge if the join is at bottom of join cluster (#29383) 2024-01-06 17:15:19 +08:00
cc7b9480cf [fix](polixy)support drop policy for user or role (#29488) 2024-01-06 17:14:47 +08:00
75efdd6e1f [fix](http) throw RejectedExecutionException to prevent http hanging by Future (#29607) 2024-01-06 16:17:07 +08:00
2c888667ed [improvement](function) standardize some ip functions' signatures #29614
The signatures of functions in these PRs should be more standard:
#27342,
#25510,
#20936,
including the following:
ipv4numtostring,
ipv4stringtonum,
ipv4stringtonumordefault,
ipv4stringtonumornull,
ipv6numtostring.

This PR will add necessary underscores between the words of each of them,
like changing ipv4numtostring to ipv4_num_to_string.
2024-01-06 16:16:38 +08:00
5789b7e380 [fix](jin) add datetimev2 precision (#29528) 2024-01-06 13:35:26 +08:00
8908a347bc [fix](nereids)need do type coercion after simplify comparasion predicate (#29546) 2024-01-06 13:34:04 +08:00
05d1f4f71d [fix](planner) Fix table sample not take effect (#29594)
Modify computeSampleTabletIds location. table sample not take effect because selectedIndexId=-1 in computeSampleTabletIds, the previous run FE UT had special processing, so FE UT did not discover the BUG in time.
2024-01-06 02:08:40 +08:00
7a0734dbd6 [feature](Nereids): InferPredicates support In (#29458) 2024-01-05 21:25:30 +08:00
7402fee1fc [feature](function) support ip function ipv6_string_to_num(_or_default, _or_null), inet6_aton (#28361) 2024-01-05 19:24:45 +08:00
67b9d38d83 [fix](delete) fix incorrect tablet schema of delete predicate rowset (#29536) 2024-01-05 18:26:30 +08:00
2b3e75bb27 [fix](Nereids) exists should not return null (#29435) 2024-01-05 18:13:21 +08:00
132ff6c6de [opt](Nereids) add float type signature for sum aggregate function (#29503)
* [opt](Nereids) add float type signature for sum aggregate function
2024-01-05 18:06:16 +08:00
39887a5cbf [ut](fe) remove useless and unsatbale tests (#29555) 2024-01-05 12:03:30 +08:00
64696829d1 [fix](Nereids) mark join should not eliminate join when child is empty (#29409) 2024-01-05 11:55:37 +08:00
f3bbc7b876 [enchancement](delete) fix delete stmt return error with fold on be (#28557) 2024-01-05 11:27:21 +08:00
baec2657dd [fix](Nereids) should cast NOT's child to boolean when analyze (#29433) 2024-01-05 11:20:39 +08:00
c0f63915f7 [chore](test) make configuartion of parallel scan be fuzzy (#29356) 2024-01-05 11:09:43 +08:00
c1ddcc5751 [opt](config) create custom conf dir if not exists (#29391) 2024-01-05 00:14:16 +08:00
9aafcb18bd [fix](move-memtable) disable move memtable when light schema change is false (#29362) 2024-01-04 23:03:35 +08:00
6a836a53df [feature](mv) add mv rewrite info to explain (#29153)
In query rewrite by mv process, we may want know the mv rewrite process info
such as which materializedView is used by rewrite, which materializedView is rewritten successfully, and 
chose which materializedView by cost finally.

We can run sql as following to see the mv rewrite process summary info
`explain <your_query_sql>`

MaterializedView rewrite info is under the **MATERIALIZATIONS** tag.
For example as following:
we can see that materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedView with name `mv2_4` and `mv1_3` is avaliable but rewrite fail

Materialized View

MaterializedViewRewriteFail:

  name: mv2_4
  FailSummary: The graph logic between query and view is not consistent

  name: mv1_3
  FailSummary: Match mode is invalid

MaterializedViewRewriteSuccessButNotChose:
  Names: 

MaterializedViewRewriteSuccessAndChose:
  Names: mv2_3

`MaterializedViewRewriteFail`:
it means that it's failure when try to use this materilaized view to represnt the query,
`NAME` is the name of MTMV.
`FAIL_SUMMARY` is the summary for the fail reason.

`MaterializedViewRewriteSuccessButNotChose`
it means that try to use this  materilaized view to represnt the query successfully, but cbo optimizer doesn't chose it finally.

`MaterializedViewRewriteSuccessAndChose`
it means that try to use this  materilaized view to represnt the query successfully and cbo optimizer  chose it finally.


If want to see detail info, we can also run sql as following to see the mv rewrite process detail info

`explain memo plan <your_query_sql>`

MaterializedView rewrite info is under the **MATERIALIZATIONS** tag, 
For example as following:

we can see the materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedViews with name of `mv2_4` and `mv1_3` is failed with falil reason.

========== MATERIALIZATIONS ==========
materializationContexts:

MaterializationContext[mv1_3] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#257.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

ObjectId : ObjectId#260.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

ObjectId : ObjectId#251.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

ObjectId : ObjectId#254.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

] )

MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#771.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
 query join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
 view join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
 {}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].

ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
 query join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
 view join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
 {}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
] )

 MaterializationContext[mv2_3] ( rewriteSuccess=true, failReason=[
] )

`ObjectId` is the id of group expression.
`Summary`is is the summary for the fail reason.
`Reason` is the detail fail reason

such as the info as above

MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
 query join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
 view join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
 {}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
]

`0` represent table lineitem
`1` represent table orders
`[<{0} --LEFT_OUTER_JOIN-- {1}>]` means the edge which is lineitem left outer join orders
`[<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]` means there is filter above orders which can not pull up because the edge `[<{0} --LEFT_OUTER_JOIN-- {1}>]`.
this can not rewrite because `[(o_orderdate#20 = 2023-12-01)]` in query is not found in **mv2_4**



**mv1_3**  def as following:
CREATE MATERIALIZED VIEW mv1_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as 
select 
  o_orderstatus, 
  o_clerk 
from 
  orders 
where 
  O_ORDERDATE = '2023-12-01'
group by 
  o_orderstatus, 
 o_clerk;

**mv2_3**  def as following:
CREATE MATERIALIZED VIEW mv2_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as 
select 
  l_linestatus, 
 o_clerk, 
from 
 (
   select 
     * 
   from 
     lineitem 
   where 
     l_shipdate = '2023-12-01'
 ) t1 
 left join (
   select 
     * 
   from 
     orders 
   where 
     o_orderdate = '2023-12-01'
 ) t2 on l_orderkey = o_orderkey 
group by 
 l_linestatus, 
 o_clerk;

**mv2_4**  def as following:
CREATE MATERIALIZED VIEW mv2_4
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as 
select 
 l_linestatus, 
 o_clerk, 
from 
 (
   select 
     * 
   from 
     lineitem 
   where 
     l_shipdate >= '2023-12-01' and l_shipdate <= '2023-12-05'
 ) t1 
 left join (
   select 
     * 
   from 
     orders 
   where 
     o_orderdate >= '2023-12-01' and o_orderdate <= '2023-12-05'
 ) t2 on l_orderkey = o_orderkey 
group by 
 l_linestatus, 
 o_clerk;
2024-01-04 23:01:55 +08:00