Commit Graph

18429 Commits

Author SHA1 Message Date
99edbaf3cf [improve](move-memtable) add stream sink file writer finalize fault injection (#29552) 2024-01-04 23:02:58 +08:00
bfd23e30f6 [improve](load) handle EAGAIN in load stream (#29437) 2024-01-04 23:02:11 +08:00
6a836a53df [feature](mv) add mv rewrite info to explain (#29153)
In query rewrite by mv process, we may want know the mv rewrite process info
such as which materializedView is used by rewrite, which materializedView is rewritten successfully, and 
chose which materializedView by cost finally.

We can run sql as following to see the mv rewrite process summary info
`explain <your_query_sql>`

MaterializedView rewrite info is under the **MATERIALIZATIONS** tag.
For example as following:
we can see that materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedView with name `mv2_4` and `mv1_3` is avaliable but rewrite fail

Materialized View

MaterializedViewRewriteFail:

  name: mv2_4
  FailSummary: The graph logic between query and view is not consistent

  name: mv1_3
  FailSummary: Match mode is invalid

MaterializedViewRewriteSuccessButNotChose:
  Names: 

MaterializedViewRewriteSuccessAndChose:
  Names: mv2_3

`MaterializedViewRewriteFail`:
it means that it's failure when try to use this materilaized view to represnt the query,
`NAME` is the name of MTMV.
`FAIL_SUMMARY` is the summary for the fail reason.

`MaterializedViewRewriteSuccessButNotChose`
it means that try to use this  materilaized view to represnt the query successfully, but cbo optimizer doesn't chose it finally.

`MaterializedViewRewriteSuccessAndChose`
it means that try to use this  materilaized view to represnt the query successfully and cbo optimizer  chose it finally.


If want to see detail info, we can also run sql as following to see the mv rewrite process detail info

`explain memo plan <your_query_sql>`

MaterializedView rewrite info is under the **MATERIALIZATIONS** tag, 
For example as following:

we can see the materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedViews with name of `mv2_4` and `mv1_3` is failed with falil reason.

========== MATERIALIZATIONS ==========
materializationContexts:

MaterializationContext[mv1_3] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#257.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

ObjectId : ObjectId#260.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

ObjectId : ObjectId#251.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

ObjectId : ObjectId#254.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.

] )

MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#771.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
 query join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
 view join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
 {}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].

ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
 query join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
 view join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
 {}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
] )

 MaterializationContext[mv2_3] ( rewriteSuccess=true, failReason=[
] )

`ObjectId` is the id of group expression.
`Summary`is is the summary for the fail reason.
`Reason` is the detail fail reason

such as the info as above

MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
 query join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
 view join edges is
 [<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
 is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
 {}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
]

`0` represent table lineitem
`1` represent table orders
`[<{0} --LEFT_OUTER_JOIN-- {1}>]` means the edge which is lineitem left outer join orders
`[<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]` means there is filter above orders which can not pull up because the edge `[<{0} --LEFT_OUTER_JOIN-- {1}>]`.
this can not rewrite because `[(o_orderdate#20 = 2023-12-01)]` in query is not found in **mv2_4**



**mv1_3**  def as following:
CREATE MATERIALIZED VIEW mv1_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as 
select 
  o_orderstatus, 
  o_clerk 
from 
  orders 
where 
  O_ORDERDATE = '2023-12-01'
group by 
  o_orderstatus, 
 o_clerk;

**mv2_3**  def as following:
CREATE MATERIALIZED VIEW mv2_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as 
select 
  l_linestatus, 
 o_clerk, 
from 
 (
   select 
     * 
   from 
     lineitem 
   where 
     l_shipdate = '2023-12-01'
 ) t1 
 left join (
   select 
     * 
   from 
     orders 
   where 
     o_orderdate = '2023-12-01'
 ) t2 on l_orderkey = o_orderkey 
group by 
 l_linestatus, 
 o_clerk;

**mv2_4**  def as following:
CREATE MATERIALIZED VIEW mv2_4
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as 
select 
 l_linestatus, 
 o_clerk, 
from 
 (
   select 
     * 
   from 
     lineitem 
   where 
     l_shipdate >= '2023-12-01' and l_shipdate <= '2023-12-05'
 ) t1 
 left join (
   select 
     * 
   from 
     orders 
   where 
     o_orderdate >= '2023-12-01' and o_orderdate <= '2023-12-05'
 ) t2 on l_orderkey = o_orderkey 
group by 
 l_linestatus, 
 o_clerk;
2024-01-04 23:01:55 +08:00
35150bbc22 [improve](move-memtable) log rpc failures in stream file writer (#29267) 2024-01-04 23:00:46 +08:00
96acef908a [fix](move-memtable) check eos when close stream (#29547) 2024-01-04 22:56:52 +08:00
7a4ef90110 [Improve](regresstests)add test cases for array functions (#28492) 2024-01-04 20:39:35 +08:00
d8ad6ebff2 [enhancement](disk) log disk path when creating tablet (#29464) 2024-01-04 20:36:37 +08:00
43b19fd99e [docs](timezone) refactor docs of timezone 2024-01-04 20:20:40 +08:00
92533d544f [LOG](exec) Add timer for hostname_to_ip (#29497) 2024-01-04 18:27:27 +08:00
6ae2a11d07 [ci](case) exclude case test_dump_image (#29539) 2024-01-04 17:58:33 +08:00
abd9000368 [Feat](Nereids) add distribute hint to leading hint (#28562)
add distribute hint to leading hint, we can use leading like:
/*+ leading(t1 broadcase{t2 t3}) */ after this commit
2024-01-04 17:51:06 +08:00
Pxl
441fb49345 [Bug](load) fix load failed on stream load tvf into agg state (#28420)
fix load failed on stream load tvf into agg state
2024-01-04 17:38:31 +08:00
e0e34b8f93 [doc](fix) fix dead link of mv doc (#29530) 2024-01-04 17:20:46 +08:00
3cfc1507b2 [doc](mv) add mv docs to sidebar (#29506)
Followup #29370 and #27549
2024-01-04 16:07:46 +08:00
f28dbc702c [bugfix](scanner done) should not set process status to query context (#29512)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-04 15:18:10 +08:00
329d20df3b [fix](regression) spare .testfile to make disk checker happy when injecting fault (#29477)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2024-01-04 15:09:57 +08:00
bfe65565d8 [feature](paimon)support native reader (#29339)
Support native reader fro paimon.

Upgrade paimon 0.5 to 0.6 : apache/doris-shade#32
2024-01-04 14:31:48 +08:00
Pxl
d8a08dad90 [Bug](mark-join) fix wrong result on mark join + other conjunct (#29321)
fix wrong result on mark join + other conjunct
2024-01-04 11:58:39 +08:00
Pxl
b26f3c37bd [Bug](config) set enabe_agg_state to need forward (#29454)
set enabe_agg_state to need forward
2024-01-04 10:47:29 +08:00
5e39cdf053 [doc](nereids)Add query rewrite by materialized view feature summary and desc doc (#29370) 2024-01-04 10:38:30 +08:00
4ba4767eef [improvement](scan) make global runtime filter support in-list filter (#29394) 2024-01-04 09:12:30 +08:00
3b7d5feb84 fix: en doc list partition column must be NOT NULL (#29414) 2024-01-03 23:23:38 +08:00
3888a7cc0b [fix](group_commit) Fix check auth error when relaying wal (#29461) 2024-01-03 23:19:16 +08:00
0d0b9d64dd [improve](move-memtable) add move memtable too many segments fault injection (#29342) 2024-01-03 21:27:54 +08:00
afaefa3a9e [regression](decimalv2) add schema change test case for decimalv2 (#29474) 2024-01-03 21:02:10 +08:00
d6cb2d6d5c [improvement](compaction) start 1 cumu compaction thread each disk by default (#29430) 2024-01-03 20:48:11 +08:00
d93812d23f [fix](auditloader) support audit table millisecond and fix stmt truncated by '\r' (#29479)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2024-01-03 20:47:56 +08:00
bd8113f424 [bugfix](scannerscheduler) should minus num_of_scanners before check should schedule #28926 (#29331)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-03 20:47:35 +08:00
c84cd30223 [pipelineX](fix) Fix query cancel timeout (#29460)
There are 2 potential reasons to cancel pipelineX query timeout.

Cancel fragment context first and set ready to execute will set cancel flag to false.
Dead lock.
2024-01-03 20:29:04 +08:00
49a3bab399 [fix](nereids) fix aggregate function roll up when expression arguments is not equals (#29256)
when aggregate function roll up, we should check the qury and mv function argument is equal
such as mv def and query sql as following, it should not rewrite success, because the  bitmap_union_basic field augument is
not equal to the `count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)`  field in query

mv def:
>      select l_shipdate, o_orderdate, l_partkey, l_suppkey, 
>            sum(o_totalprice) as sum_total, 
>            max(o_totalprice) as max_total, 
>            min(o_totalprice) as min_total, 
>            count(*) as count_all, 
>            bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic 
>           from lineitem 
>           left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate 
>            group by 
>         l_shipdate, 
>         o_orderdate, 
>          l_partkey, 
>         l_suppkey;

query sql:

>             select t1.l_partkey, t1.l_suppkey, o_orderdate,
>           sum(o_totalprice),
>            max(o_totalprice),
>           min(o_totalprice),
>           count(*),
>            count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)
>            from (select * from lineitem where l_shipdate = '2023-12-11') t1
>            left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate
>            group by
>            o_orderdate, 
>            l_partkey,
>            l_suppkey;
2024-01-03 18:58:18 +08:00
2386a1ce5a [fix](docs)Fix description for error code 1051 #27751 (#27977) 2024-01-03 18:15:00 +08:00
d19530c4c2 [Fix](Nereids) fix leading hint dealing with big brace (#29405)
Co-authored-by: libinfeng <libinfeng@selectdb.com>
2024-01-03 18:13:38 +08:00
44628d37c8 Enable minmax push down for unique table while doing analyze. (#29462) 2024-01-03 18:10:38 +08:00
b2a28fcfaa [fix](ci) exclude case load_stream_fault_injection (#29465) 2024-01-03 16:02:15 +08:00
e3c9f535dc [refactor](wal) refactor some wal code (#29434) 2024-01-03 14:45:57 +08:00
329d57fdd7 [regression](move-memtable) test LoadStream on_idle_timeout (#29354)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2024-01-03 14:07:51 +08:00
28ff349381 [doc](fix)invalid character 。 in en docs (#29355)
Co-authored-by: Rohit Satardekar <rohitrs1983@gmail.com>
2024-01-03 12:59:59 +08:00
193b7518ab [enhancement](nereids)throw readable exception when meet missing column in agg's output (#29243) 2024-01-03 12:59:32 +08:00
12286f0a63 [docs][hive-transactional-tables] Add hive transactional tables documents. (#29369) 2024-01-03 12:57:20 +08:00
2a9b4a0f76 [enhancement](paimon)support predict for null and notnull (#29134) 2024-01-03 12:53:39 +08:00
79eb575d7c [Improvement](nereids)Support ODBC table for new planner. (#29129) 2024-01-03 12:51:07 +08:00
1fbbff32b2 [fix](pipelinex) coredump caused by VRuntimeFilterSlots::_is_global was not set (#29446) 2024-01-03 12:40:41 +08:00
dbf61005df change boost thirdparty url to offical archives.boost.io (#29401) 2024-01-03 12:12:43 +08:00
c0db8533af [fix](load) fix single replica load with auto partition 2024-01-03 11:53:09 +08:00
5eb38301dd [Chore](CI)Re-configure branch-2.0 shell-check as a required check (#29448)
https://github.com/apache/doris/pull/29289 done
2024-01-03 11:44:38 +08:00
xy
fab1a627fc [fix](scan) _insert_data_normal should catch exception when BlockReader::_unique_key_next_block (#29426)
Co-authored-by: xingying01 <xingying01@corp.netease.com>
2024-01-03 11:44:02 +08:00
08353f6027 [Enhance](fe) Iceberg table in HMS catalog supports broker scan (#28107)
My organization uses HMS catalog to accelerate Lake query. Sine we have custom distributed file system and hard to integrate to FE / BE, we introduce HMS Catalog broker scan support (#24830) and implement custom distributed file system adaption in broker.

We want to expand the scope of use to Iceberg table scan in HMS Catalog. This PR introduces broker-scan-related `IcebergBrokerIO`, `BrokerInputFile`, `BrokerInputStream` for Iceberg table scan
2024-01-03 11:29:12 +08:00
1e8bb75182 [improve](move-memtable) add log on idle timeout (#29438) 2024-01-03 11:26:26 +08:00
be1d9c3358 [fix](memory) Fix mem tracker web page notice #29361 2024-01-03 11:25:00 +08:00
14e7eb7624 [Opt](rf) Opt broadcast join remote runtime filter merge and wait (#29439) 2024-01-03 11:21:28 +08:00