Commit Graph

2414 Commits

Author SHA1 Message Date
dfde10d4c8 [improvement](function) switch inet(6)_aton alias origin function (#30196) 2024-01-23 10:09:54 +08:00
4480f751e6 [Improve](Variant) support implicit cast to numeric and string type (#30029) 2024-01-23 10:09:54 +08:00
e5f1d8d7ec [fix](phrase_prefix) fix match_phrase_prefix query incorrect result (#29946) 2024-01-23 10:09:54 +08:00
332b9cb619 [opt](nereids) do not change RuntimeFilter Type from IN-OR_BLOOM to BLOOM on broadcast join (#30148)
1. do not change RuntimeFilter Type from IN-OR_BLOOM to BLOOM on broadcast join
    tpcds1T, q48 improved from 4.x sec to 1.x sec
    2. skip some redunant runtime filter
    example: A join B on A.a1=B.b and A.a1 = A.a2
    RF B.b->(A.a1, A.a2)
    however, RF(B.b->A.a2) is implied by RF(B.a->A.a1) and A.a1=A.a2
    we skip RF(B.b->A.a2)
    Issue Number: close #xxx
2024-01-23 10:07:51 +08:00
ead3b4ac1d [feature](function) support ip function is_ipv4_compat, is_ipv4_mapped (#29954) 2024-01-23 10:07:51 +08:00
ddeed079d4 [opt](Nereids)make orToIn rule appliable to in-pred (#29990)
make orToIn rule appliable to in-pred
2024-01-19 15:48:56 +08:00
97b2a3b993 [improvement](ip function) refactor some ip functions and remove dirty codes (#30080) 2024-01-19 15:48:56 +08:00
e560f31692 [fix](Nereids): fix eliminate join test for pk-fk constraint (#30094) 2024-01-19 15:48:56 +08:00
fac0580eae [opt](docker)optimize ES docker compose (#30068)
1. add volume for es logs
2. optimize health check, waiting for es status to be green
3. fix es6 valume path error
4. optimize disk watermark to avoid es disk watermark error
5. fix es6 create index error
6. add custom elasticsearch.yml for es6
7. add log4j2.properties for es6, es7, es8
2024-01-19 15:48:56 +08:00
097641b543 [fix](Nereids): fix AssertNumRows StatsCalculator (#30053) 2024-01-19 15:48:15 +08:00
Pxl
2ccb69dbed [Feature](materialized-view) support some case unmached to materialized-view (#30036)
same column appears in key and value like select id,count(id) group by id;
complex expr in sum select sum(if(xxx));
2024-01-18 12:03:07 +08:00
0ccd706a30 [Enhancement](Jdbc Catalog) Map Jdbc Catalog JSON Type to String for Improved Performance and Compatibility (#30035)
This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant.

About Upgrade
To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.
2024-01-18 12:03:07 +08:00
44ba9e102c [feature](statistics)support statistics for iceberg/paimon/hudi table (#29868) 2024-01-18 12:03:07 +08:00
ade720470d [Improve](config)delete confused config for nested complex type (#29988) 2024-01-18 12:03:07 +08:00
e894911cda [function](char) change char function behaviour same with mysql (#30034)
select char(0) = '\0';
should return true;
2024-01-18 10:04:21 +08:00
Pxl
b0c49024cb [Feature](materialized-view) support match function with alias in materialized-view (#30025)
support match function with alias in materialized-view
2024-01-18 10:04:21 +08:00
3deee14680 [fix](Nereids): find hash condition after infer predicate (#30026) 2024-01-18 10:03:01 +08:00
74991c4af2 [bugfix](paimon)support native and jni to read paimon for minio/cos #29933 2024-01-16 18:49:01 +08:00
4bf4239d7a [feature](Nereids): optimize logical group expression in dphyp (#30000) 2024-01-16 18:48:20 +08:00
f53d2c28cb [improvement](catalog) fix jdbc mysql catalog to_date fun pushdown (#29900) 2024-01-16 18:46:19 +08:00
22978726e3 [opt](nereids) if column stats are unknown, 10-20 table-join optimization use cascading instead of dphyp (#29902)
* if column stats are unknown, do not use dphyp
tpcds query64 is optimized in case of no stats
sf500, query64 improved from 15sec to 7sec on hdfs, and from 4sec to 3.85sec on olaptable
2024-01-16 18:46:19 +08:00
07de535c4c [fix](Nereids) should not fold constant when do ordinal group by (#29976) 2024-01-16 18:46:19 +08:00
66513d57f9 [feature](function) support ip function named ipv6_cidr_to_range(addr, cidr) (#29812) 2024-01-16 18:42:09 +08:00
d5dcdf3e07 [Improve](array) support array_enumerate_uniq and array_suffle for nereids (#29936) 2024-01-16 18:40:32 +08:00
f6dc6ea13b [improvement](catalog) Escape characters for columns in recovery predicate pushdown in SQL (#29854)
In the previous logic, when we restored the Column in the predicate pushdown based on the logical syntax tree for JdbcScanNode, in order to avoid query errors caused by keywords such as `key`, we added escape characters for it, but before we only Binary predicates are processed, which is imperfect. We should add escape characters to all columns that appear in the predicate to avoid errors with keywords or illegal characters.
2024-01-16 18:39:00 +08:00
8ca807578f [fix](migrate disk) fix migrate disk lost data during publish version (#29887)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2024-01-16 18:37:06 +08:00
a69ce49b07 [fix](Nereids) adjust min/max stats for cast function if types are comparable (#28166)
estimate column stats for "cast(col, XXXType)"

-----cast-est------
query4 41169 40335 40267 40267
query58 463 361 401 361
Total cold run time: 41632 ms
Total hot run time: 40628 ms

----master------
query4 40624 40180 40299 40180
query58 487 389 420 389
Total cold run time: 41111 ms
Total hot run time: 40569 ms
2024-01-16 18:31:59 +08:00
0b16938b7f [Fix](Nereids) Fix datatype length wrong when string contains chinese (#29885)
When varchar literal contains chinese, the length of varchar should not be the length of the varchar, it should be 
the actual length of the using byte.
Chinese is represented by unicode, a chinese char occypy 4 byte at mostly. So if meet chinese in varchar literal, we 
set the length is 4* length.

for example as following:
>        CREATE MATERIALIZED VIEW test_varchar_literal_mv
>             BUILD IMMEDIATE REFRESH AUTO ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1')
>             AS
>             select case when l_orderkey > 1 then "一二三四" else "五六七八" end as field_1 from lineitem;

mysql> desc test_varchar_literal_mv;
the def of materialized view is as following:
+---------+-------------+------+-------+---------+-------+
| Field   | Type        | Null | Key   | Default | Extra |
+---------+-------------+------+-------+---------+-------+
| field_1 | VARCHAR(16) | No   | false | NULL    | NONE  |
+---------+-------------+------+-------+---------+-------+
2024-01-16 18:31:59 +08:00
115815739c [bugfix](fe) add check for leg/lead function params (#29617) 2024-01-16 18:31:59 +08:00
d47adbb81f [Fix](nereids) Fix cte rewrite by mv failure and predicates compensation by mistake (#29820)
Fix cte rewrite by mv wrongly when query has scalar aggregate but view no
For example as following, it should not be rewritten by materialized view successfully

// materialzied view define
def mv20_1 = """
select
l_shipmode,
l_shipinstruct,
sum(l_extendedprice),
count()
from lineitem
left join
orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY
group by
l_shipmode,
l_shipinstruct;
"""
// query sql
def query20_1 =
"""
select
sum(l_extendedprice),
count()
from lineitem
left join
orders
on lineitem.L_ORDERKEY = orders.O_ORDERKEY
"""

Fix predicates compensation by mistake
For example as following, it can return right result, but it's wrong earlier.

// materialzied view define
def mv7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from lineitem
left join orders
on lineitem.l_orderkey = orders.o_orderkey
where l_shipdate = '2023-12-08' and o_orderdate = '2023-12-08';
"""
// query sql
def query7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from (select * from lineitem where l_shipdate = '2023-10-17' ) t1
left join orders
on t1.l_orderkey = orders.o_orderkey;
"""

and optimize some code usage and add more comment for method
2024-01-16 18:31:27 +08:00
e417128fb9 [bug](bitmap) should return error status when execute failed (#29841) 2024-01-16 18:30:23 +08:00
1998735432 [Improvement](function) enable ipv6_num_to_string function to support handling of IPv6 type (#29886)
Enable ipv6_num_to_string function to handle IPv6 type normally in addition to handling 16 byte string types
2024-01-16 18:30:23 +08:00
ee66f1563e [fix](Nereids) fix rf push down union (#29847)
Current union rf push down only support rf from parent join, but not support ancestor join.
The pr fixes this problem on project/distribute node's rf pushing down checking.
2024-01-16 18:30:22 +08:00
ebfbe0c8dd [opt](information_schema) support information_schema in external catalog (#28919)
Add `information_schema` database for all catalog.
This is useful when using BI tools to connect to Doris,
the tools can get meta info from `information_schema`.

This PR mainly changes:

1. There will be a `information_schema` db in each catalog.
2. Each `information_schema` db only store the meta info of the catalog it belongs to.
3. For `information_schema`, the `TABLE_SCHEMA` column's value is the database name.
4. There is a new global variable `show_full_dbname_in_info_schema_db`, default is false, if set to true,
    The `TABLE_SCHEMA` column's value is the like `ctl.db`, because:

	When connect to Doris, the `database` info in connection url will be: `xxx?db=ctl.db`.
	
	And then some BI will try to query `information_schema` with sql like:
	
	`select * from information_schema.columns where TABLE_SCHEMA = "ctl.db"`
	
	So it has to be format as `ctl.db`
	
	eg, the `information_schema.columns` table in external catalog `doris` is like:
	
	```
	mysql> select * from information_schema.columns limit 1\G
	*************************** 1. row ***************************
	           TABLE_CATALOG: doris
	            TABLE_SCHEMA: doris.__internal_schema
	              TABLE_NAME: column_statistics
	             COLUMN_NAME: id
	        ORDINAL_POSITION: 1
	          COLUMN_DEFAULT: NULL
	             IS_NULLABLE: NO
	               DATA_TYPE: varchar
	CHARACTER_MAXIMUM_LENGTH: 4096
	  CHARACTER_OCTET_LENGTH: 16384
	       NUMERIC_PRECISION: NULL
	           NUMERIC_SCALE: NULL
	      DATETIME_PRECISION: NULL
	      CHARACTER_SET_NAME: NULL
	          COLLATION_NAME: NULL
	             COLUMN_TYPE: varchar(4096)
	              COLUMN_KEY:
	                   EXTRA:
	              PRIVILEGES:
	          COLUMN_COMMENT:
	             COLUMN_SIZE: 4096
	          DECIMAL_DIGITS: NULL
	   GENERATION_EXPRESSION: NULL
	                  SRS_ID: NULL
	```
	
6. Modify the behavior of

	- show tables
	- shwo databases
	- show columns
	- show table status

	The above statements may query the `information_schema` db if there is `where` predicate after them
2024-01-12 13:58:19 +08:00
f67a00ffbb [opt](nereids) prune runtime redundant filters (#29828)
1. expand_runtime_filter_by_inner_join will create some redundant rfs,e.g., tpch q5 and q9, we need to remove one
2. hive: prune rf if target only used as probe
2024-01-12 13:58:19 +08:00
4d97f8ea75 [enhance](function) support two special format for str_to_date (#29823) 2024-01-12 12:00:32 +08:00
885d8b28ba [fix](Nerids): fix function deps when check unique and not null #29797 2024-01-12 11:59:52 +08:00
c9a949130b [Case](wal) Add wal group commit sink case with low disk space fault injection (#29731) 2024-01-12 11:59:52 +08:00
e93a16ac6e [fix](Nereids) support complex literal cast in fe (#29599) 2024-01-12 11:59:52 +08:00
17a2b89945 [runtimeFilter](nereids) expand runtime filter by join condition by default (#29633)
1. expand rf by join condition 
2. fix ignore_shape_nodes bug
2024-01-12 11:59:27 +08:00
e17809a684 [fix](nereids)logicalhaving is in wrong place after logicalagg and logicalwindow (#29463) 2024-01-12 11:48:39 +08:00
2c44951543 [fix](planner)only allow null safe equal when both children are nullable (#29470) 2024-01-12 11:46:29 +08:00
Pxl
7738eca6da [Bug](stream-load) fix stream load failed on table with rollup (#29665)
fix stream load failed on table with rollup
2024-01-12 11:46:29 +08:00
9cbb55d49b [fix](Nereids) create double literal when create decimal literal failed (#28959)
FIX
1. remove float and double literal toString and getStringValue introduced by
  PR #23504 and PR #23271
  These functions lead to wrong cast result of double and float literal
2. fix compute signature for datetimev2 always produce scale 6
3. fix stats calculator failed when generate node stats with two same column
4. constant fold on fe failed when cast double to integral

TODO
after fix the first problem, some mv matching not work well, fix them later
- test_dup_mv_div
- test_dup_mv_json
- test_tcu
2024-01-12 11:46:29 +08:00
fda001b6d3 [Improvement](nereids) Support join derivation when mv rewrite (#29609)
materialized view def is as following:
>            select l_linenumber, o_custkey
>           from orders
>            left join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY
>            where o_custkey = 1;

when query is as following, it can be rewritten by mv above
it requires that query has reject null filters on the join right input, 
current supported filter are  "=", "<", "<=", ">", ">=", "<=>" 
>            select IFNULL(orders.O_CUSTKEY, 0) as custkey_not_null,
>           case when l_linenumber in (1,2,3) then l_linenumber else o_custkey end as case_when
>            from orders
>            inner join lineitem on orders.O_ORDERKEY = lineitem.L_ORDERKEY
>            where o_custkey = 1 and l_linenumber > 0;
2024-01-12 11:44:21 +08:00
34fe5ee38b [feat](Nereids) support show constraint command (#29667)
show constraints from t1;
+------+-------------+-----------------------------------------+
| Name | Type        | Definition                              |
+------+-------------+-----------------------------------------+
| fk   | FOREIGN KEY | FOREIGN KEY (id) REFERENCES cir.t1 (id) |
| uk   | UNIQUE      | UNIQUE (id)                             |
| pk   | PRIMARY KEY | PRIMARY KEY (id)                        |
+------+-------------+-----------------------------------------+
2024-01-12 11:44:21 +08:00
be56bf06cf [feature](function) support ip function named is_ip_address_in_range(addr, cidr) (#29681) 2024-01-12 11:44:21 +08:00
028e59efab [refactor](Nereids): unify all replaceNamedExpressions (#28228)
Use a unified function `replaceNamedExpressions ` instead of implementing it yourself repeatedly.
2024-01-12 11:44:21 +08:00
d50c8b6d3a [Improvement](nereids) Query rewrite by mv support bitmap_union and bitmap_union_count roll up (#29418)
Query rewrite by mv support bitmap_union and bitmap_union_count roll up, aggregate functions which supports roll up is listed as following:

| 查询中函数            | 物化视图中函数      | 函数上卷后              |
|------------------|--------------|--------------------|
| max              | max          | max                |
| min              | min          | min                |
| sum              | sum          | sum                |
| count            | count        | sum                |
| count(distinct ) | bitmap_union | bitmap_union_count |
| bitmap_union | bitmap_union | bitmap_union|
| bitmap_union_count | bitmap_union | bitmap_union_count |

this depends on  https://github.com/apache/doris/pull/29256
2024-01-12 11:44:21 +08:00
87023d3b7a [Fix](inverted index) fix memory leak in inverted index when encountering fault (#29676) 2024-01-12 11:44:21 +08:00