Commit Graph

741 Commits

Author SHA1 Message Date
471db80f69 [Bug](date) Fix invalid date (#16205)
Issue Number: close #15777
2023-01-31 10:08:44 +08:00
e7c1d81419 [fix](planner) Pushdown constant predicate to all scan node in the lieteral view. #16217
Before this PR, planner might push a constant FALSE predicate to the wrong scan nodes in the literal view, and make this predicate useless
2023-01-30 22:18:43 +08:00
75c8670286 [Feature-WIP](inverted index) Filter out and remain predicates that do not support apply by inverted index, and add inverted index regression case (#16167)
1. Filter out and remain predicates that do not support applying on inverted index,
    like `BF` predicate, `IS_NULL` predicate, `IS_NOT_NULL` predicate.
2. Add inverted index regression case that based on tpcds_sf1 data set.
2023-01-30 22:16:08 +08:00
b480db2e11 [test](pipeline) Run nereids cases in p1/p2 (#16130) 2023-01-30 18:33:31 +08:00
342f3168b5 [fix](Nereids) return null when encryption is invalid rather than throw exception (#16234) 2023-01-30 16:52:42 +08:00
6bebf92254 [fix][FE] fix be coredump when children of FunctionCallExpr is folded (#16064)
Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>
fix be coredump when children of FunctionCallExpr is folded
2023-01-30 15:25:00 +08:00
5c00caa259 [refactor](Nereids) refactor BindSlotReference for easy merge all bind process in one rule (#16156) 2023-01-30 10:57:39 +08:00
bd1b7e190c [fix](Nereids): fix field(). (#16214) 2023-01-30 10:55:02 +08:00
69e748b076 [fix](schema scanner)change schema_scanner::get_next_row to get_next_block (#15718) 2023-01-30 10:01:50 +08:00
98649ec9f8 [fix](Nereids): Fix some functions error (#16197)
* fix bugs in regexp_extract_all

* fix rpad

* fix weekofday

* fix cryptor

* fix timestamp

* fix st_ function
2023-01-30 00:41:31 +08:00
7d648a94d0 [fix](Nereids): fix scalar_function A-F. (#16209)
* [fix](Nereids): fix scalar_function A-F.

* [Fix](regression-test)fix regression test framework cannot compare double value nan and inf.

* revert dround()
2023-01-30 00:37:34 +08:00
5eaa995704 [refactor](some mempool) not memset 0 in default value iterator (#16194)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-29 22:50:39 +08:00
1db7882bb5 [Fix](Nereids): fix error of X-Z function for nereids (#16171) 2023-01-29 20:42:30 +08:00
1ec88cbff6 [fix](nereids) AggregationNode process null as key column in wrong way (#16125)
in AggregationNode, _merge_with_serialized_key_helper method should convert the key column to full column if the key column is null literal.
2023-01-29 20:12:07 +08:00
04ed83cb36 [fix](Nereids): remove DataV2Type in ConvertTz SIGNATURES (#16170)
* [fix](Nereids): remove `DataV2Type` in ConvertTz SIGNATURES

* remove it in doris_builtins_functions.py
2023-01-29 16:11:17 +08:00
b11d056fe5 [fix](Nereids): fix function name. (#16188) 2023-01-29 15:21:24 +08:00
c6bc0a03a4 [feature](Load)Suppot MySQL Load Data (#15511)
Main subtask of [DSIP-28](https://cwiki.apache.org/confluence/display/DORIS/DSIP-028%3A+Suppot+MySQL+Load+Data)

## Problem summary
Support mysql load syntax as below: 
```sql
LOAD DATA
    [LOCAL]
    INFILE 'file_name'
    INTO TABLE tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    [COLUMNS TERMINATED BY 'string']
    [LINES TERMINATED BY 'string']
    [IGNORE number {LINES | ROWS}]
    [(col_name_or_user_var [, col_name_or_user_var] ...)]
    [SET (col_name={expr | DEFAULT} [, col_name={expr | DEFAULT}] ...)]
    [PROPERTIES (key1 = value1 [, key2=value2]) ]
```

For example, 
```sql
            LOAD DATA 
            LOCAL
            INFILE 'local_test.file'
            INTO TABLE db1.table1
            PARTITION (partition_a, partition_b, partition_c, partition_d)
            COLUMNS TERMINATED BY '\t'
            (k1, k2, v2, v10, v11)
            set (c1=k1,c2=k2,c3=v10,c4=v11)
            PROPERTIES ("auth" = "root:", "strict_mode"="true")
```

Note that in this pr the property named `auth` must be set since stream load need auth. I will optimize it later.
2023-01-29 14:44:59 +08:00
eb7da1c0ee [fix](datatype) fix some bugs about data type array datetimev2 and decimalv3 (#16132) 2023-01-29 14:26:08 +08:00
578a855b3e [Bug](topn-opt) filter condition for analytic info for two phase read opt (#16173)
two phase read optimization should not be enabled when query has analytic info
2023-01-29 12:06:18 +08:00
ce487e2b11 [fix](Nereids): fix dceil() dfloor() (#16174) 2023-01-29 11:59:23 +08:00
b7379daffa [test](Nereids) changing test data for more effectively testing for nereids_function_p0 (#16163) 2023-01-28 21:23:40 +08:00
3151d94e9e [fix](Nereids): fix Ceiling. (#16164) 2023-01-28 20:26:20 +08:00
26fc7c8196 [Bug](decimalv3) fix BE crash for function if (#16152) 2023-01-28 19:37:50 +08:00
7cf7706eb1 [Bug](runtimefilter) Fix wrong runtime filter on datetime (#16102) 2023-01-28 18:16:06 +08:00
b919cbe487 [ehancement](nereids) Enhancement for limit clause (#16114)
support limit offset without order by.
the legacy planner supoort this feature in PR #15218
2023-01-28 11:04:03 +08:00
1589d453a3 [fix](multi catalog)Support parquet and orc upper case column name (#16111)
External hms catalog table column names in doris are all in lower case,
while iceberg table or spark-sql created hive table may contain upper case column name,
which will cause empty query result. This pr is to fix this bug.
1. For parquet file, transfer all column names to lower case while parse parquet metadata.
2. For orc file, store the origin column names and lower case column names in two vectors, use the suitable names in different cases.
3. FE side, change the column name back to the origin column name in iceberg while doing convertToIcebergExpr.
2023-01-27 23:52:11 +08:00
25046fabec [regression-test](sub query) add regression test for subquery with limit (#16051)
* [regression-test](sub query) add regression test for subquery with limit

* add lisence header
2023-01-21 08:06:49 +08:00
9ffd109b35 [fix](datetimev2) Fix BE datetimev2 type returning wrong result (#15885) 2023-01-20 22:25:20 +08:00
6b110aeba6 [test](Nereids) add regression cases for all functions (#15907) 2023-01-20 22:17:27 +08:00
3b08a22e61 [test](Nereids) add p0 regression test for Nereids (#15888) 2023-01-20 18:50:23 +08:00
1638936e3f [fix](oracle catalog) oracle catalog support TIMESTAMP dateType of oracle (#16113)
`TIMESTAMP` dateType of Oracle will map to `DateTime` dateType of Doris
2023-01-20 14:47:58 +08:00
101bc568d7 [fix](Nereids) fix bugs about date function (#16112)
1. when casting constant, check the value is whether in the range of targetType
2. change the scale of dateTimeV2 to 6
2023-01-20 14:11:17 +08:00
cbb203efd2 [fix](nereids) fix test_join regression test for nereids (#16094)
1. add TypeCoercion for (string, decimal) and (date, decimal)
2. The equality of LogicalProject node should consider children in some case
3. don't push down join condition like "t1 join t2 on true/false"
4. add PUSH_DOWN_FILTERS after FindHashConditionForJoin
5. nestloop join should support all kind of join
6. the intermediate tuple should contains slots from both children of nest loop join.
2023-01-20 14:02:29 +08:00
116e17428b [Enhancement](point query optimize) improve performace of point query on primary keys (#15491)
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
2023-01-20 13:33:01 +08:00
3ebc98228d [feature wip](multi catalog)Support iceberg schema evolution. (#15836)
Support iceberg schema evolution for parquet file format.
Iceberg use unique id for each column to support schema evolution.
To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side.
Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.
2023-01-20 12:57:36 +08:00
ba71516eba [feature](jdbc catalog) support SQLServer jdbc catalog (#16093) 2023-01-20 12:37:38 +08:00
60231454cc [fix](nereids) fix bug in multiply return data type (#15949) 2023-01-20 11:44:24 +08:00
2018b49ef0 [opt](test) scalar_types_p0 use 100k lines dataset and scalar_types_p2 use 1000k (#16104) 2023-01-19 22:59:29 +08:00
dd869077f8 [fix](nereids) do not generate compare between Date to Date (#16061)
BE storage Engine has some bug in Date comparison, and hence if we push down predicates like Date'x' < Date 'y', we get error results.
This pr just convert expr like ’Date'x' < Date 'y',‘ to DateTime'x' < DateTime 'y'

TODO:
do storage engine support date slot compare with datetime?
if it support, we could avoid add cast on the slot
and then, this expression could push down to storage engine.
2023-01-19 15:56:51 +08:00
21b78cb820 [fix](nereids) Fix bind failed of the slots in the group by clause (#16077)
Child's slot with same name to the slots in the outputexpression would be discarded which would cause the bind failed, since the slots in the group by expressions cannot find the corresponding bound slots from the child's output
2023-01-19 15:36:13 +08:00
0144c51ddb [fix](nereids) fix bug in CaseWhen.getDataType and add some missing case for findTightestCommonType (#15776) 2023-01-19 15:30:25 +08:00
6e090e4daf [Bug](predicate) fix date predicate (#16053) 2023-01-19 14:14:48 +08:00
c5beab39c0 [fix](nereids) Bind slot in having to its direct child instead of grand child (#16047)
For example, in this case, the `date` in having clause should be bind to alias which has same name, instead of `date` field of the relation

SELECT date_format(date, '%x%v') AS `date` FROM `tb_holiday` WHERE `date` between 20221111 AND 20221116 HAVING date = 202245 ORDER BY date;
2023-01-19 13:19:16 +08:00
abdf56bfa5 [fix](Nereids) wrong result of group_concat with order by or null args (#16081)
1. signatures without order element are wrong
2. signature with one arg is miss
3. group_concat should be NullableAggregateFunction
4. fold constant on fe should not fold NullableAggregateFunction with null arg

TODO
1. reorder rewrite rules, and then only forbid fold constant on NullableAggregateFunction with alwaysNullable == true
2023-01-19 11:22:30 +08:00
d8f598eeab [enhancement](Nereids) add timestampadd, timestampdiff functions (#16072) 2023-01-19 01:05:25 +08:00
baf62b4418 [test](Nereids) add regression-test for running_difference and regexp_extract_all (#16049) 2023-01-18 22:24:52 +08:00
feeb69438b [opt](Nereids) optimize DistributeSpec generator of OlapScan (#15965)
use the size of selected partitions instead of olap table partition size to decide whether generate hashDistributeSpec
2023-01-18 20:18:11 +08:00
0916cbcb10 [ehancement](nereids) Made the parse for named expression more complete (#16010)
After this PR, we could support such grammar.

SELECT SUBSTRING("dddd编", 0, 3) AS "测试";
SELECT SUBSTRING("dddd编", 0, 3) "测试";
2023-01-18 19:44:51 +08:00
4035bd83c3 [fix](jdbc) fix jdbc driver bug and external datasource p2 test case issue (#16033)
Fix bug that when create jdbc resource with only jdbc driver file name, it will failed to do checksum
This is because we forgot the pass the full driver url to JdbcClient.

Add ResultSet.FETCH_FORWARD and set AutoCommit to false to jdbc connection, so to avoid OOM when fetching large amount of data

set useCursorFetch in jdbc url for both MySQL and PostgreSQL.

Fix some p2 external datasource bug
2023-01-18 17:48:06 +08:00
1fa2b662cf [opt](Nereids) add date_add/sub function (#16048)
1. add week_add week_diff function
2. register all date_add/date_diff function
2023-01-18 17:11:44 +08:00