Support to query rewrite by materialized view when join input has aggregate, the aggregate should be simple
For example as following:
The materialized view def is
> select
> l_linenumber,
> count(distinct l_orderkey),
> sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
> max(case when l_orderkey in (4, 5) then (l_quantity *2 + part_supp_a.qty_max) * 0.88 else 100 end),
> avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
> from lineitem
> left join orders on l_orderkey = o_orderkey
> left join
> (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
> min(ps_availqty) qty_min,
> avg(ps_supplycost) cost_avg
> from partsupp
> group by ps_partkey,ps_suppkey) part_supp_a
> on l_partkey = part_supp_a.ps_partkey
> and l_suppkey = part_supp_a.ps_suppkey
> group by l_linenumber;
when query is like following, it can be rewritten by mv above
> select
> l_linenumber,
> sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
> avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
> from lineitem
> left join orders on l_orderkey = o_orderkey
> left join
> (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
> min(ps_availqty) qty_min,
> avg(ps_supplycost) cost_avg
> from partsupp
> group by ps_partkey,ps_suppkey) part_supp_a
> on l_partkey = part_supp_a.ps_partkey
> and l_suppkey = part_supp_a.ps_suppkey
> group by l_linenumber;
1. Make sure instance when change params of StructInfo,Predicates.
2. Catch and record exception for every materialization context, this make sure that if throw exception when one materialization context rewrite, it will not influence others.
3. Support to mv rewrite when hava count function when aggregate without group by
1. do not change RuntimeFilter Type from IN-OR_BLOOM to BLOOM on broadcast join
tpcds1T, q48 improved from 4.x sec to 1.x sec
2. skip some redunant runtime filter
example: A join B on A.a1=B.b and A.a1 = A.a2
RF B.b->(A.a1, A.a2)
however, RF(B.b->A.a2) is implied by RF(B.a->A.a1) and A.a1=A.a2
we skip RF(B.b->A.a2)
Issue Number: close #xxx
1. add volume for es logs
2. optimize health check, waiting for es status to be green
3. fix es6 valume path error
4. optimize disk watermark to avoid es disk watermark error
5. fix es6 create index error
6. add custom elasticsearch.yml for es6
7. add log4j2.properties for es6, es7, es8
This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant.
About Upgrade
To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.
* if column stats are unknown, do not use dphyp
tpcds query64 is optimized in case of no stats
sf500, query64 improved from 15sec to 7sec on hdfs, and from 4sec to 3.85sec on olaptable
In the previous logic, when we restored the Column in the predicate pushdown based on the logical syntax tree for JdbcScanNode, in order to avoid query errors caused by keywords such as `key`, we added escape characters for it, but before we only Binary predicates are processed, which is imperfect. We should add escape characters to all columns that appear in the predicate to avoid errors with keywords or illegal characters.
estimate column stats for "cast(col, XXXType)"
-----cast-est------
query4 41169 40335 40267 40267
query58 463 361 401 361
Total cold run time: 41632 ms
Total hot run time: 40628 ms
----master------
query4 40624 40180 40299 40180
query58 487 389 420 389
Total cold run time: 41111 ms
Total hot run time: 40569 ms
When varchar literal contains chinese, the length of varchar should not be the length of the varchar, it should be
the actual length of the using byte.
Chinese is represented by unicode, a chinese char occypy 4 byte at mostly. So if meet chinese in varchar literal, we
set the length is 4* length.
for example as following:
> CREATE MATERIALIZED VIEW test_varchar_literal_mv
> BUILD IMMEDIATE REFRESH AUTO ON MANUAL
> DISTRIBUTED BY RANDOM BUCKETS 2
> PROPERTIES ('replication_num' = '1')
> AS
> select case when l_orderkey > 1 then "一二三四" else "五六七八" end as field_1 from lineitem;
mysql> desc test_varchar_literal_mv;
the def of materialized view is as following:
+---------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-------+---------+-------+
| field_1 | VARCHAR(16) | No | false | NULL | NONE |
+---------+-------------+------+-------+---------+-------+
Fix cte rewrite by mv wrongly when query has scalar aggregate but view no
For example as following, it should not be rewritten by materialized view successfully
// materialzied view define
def mv20_1 = """
select
l_shipmode,
l_shipinstruct,
sum(l_extendedprice),
count()
from lineitem
left join
orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY
group by
l_shipmode,
l_shipinstruct;
"""
// query sql
def query20_1 =
"""
select
sum(l_extendedprice),
count()
from lineitem
left join
orders
on lineitem.L_ORDERKEY = orders.O_ORDERKEY
"""
Fix predicates compensation by mistake
For example as following, it can return right result, but it's wrong earlier.
// materialzied view define
def mv7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from lineitem
left join orders
on lineitem.l_orderkey = orders.o_orderkey
where l_shipdate = '2023-12-08' and o_orderdate = '2023-12-08';
"""
// query sql
def query7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from (select * from lineitem where l_shipdate = '2023-10-17' ) t1
left join orders
on t1.l_orderkey = orders.o_orderkey;
"""
and optimize some code usage and add more comment for method
Current union rf push down only support rf from parent join, but not support ancestor join.
The pr fixes this problem on project/distribute node's rf pushing down checking.