Commit Graph

3636 Commits

Author SHA1 Message Date
5c00caa259 [refactor](Nereids) refactor BindSlotReference for easy merge all bind process in one rule (#16156) 2023-01-30 10:57:39 +08:00
bd1b7e190c [fix](Nereids): fix field(). (#16214) 2023-01-30 10:55:02 +08:00
ec4a56922f [enhancement](memory) reduce memory usage for failed broker loads (#15895)
* [enhancement](memory) reduce memory usage for failed  broker loads
2023-01-30 10:22:31 +08:00
7d437d5706 [Bug](function) running_difference function coredump in regression test (#16215) 2023-01-30 09:58:27 +08:00
b8a7297109 [Enhancement](profile) fill user field for profile. (#16212)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-01-30 09:15:02 +08:00
d56043ab5a [feature-wip](MTMV) Support setting variables in query statement (#16060)
## Use case

```shell
mysql> CREATE TABLE t_user (
    ->   event_day DATE,
    ->   id bigint,
    ->   username varchar(20)
    -> )
    -> DISTRIBUTED BY HASH(id) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.07 sec)

mysql> CREATE TABLE t_user_pv(
    ->   event_day DATE,
    ->   id bigint,
    ->   pv bigint
    -> )
    -> DISTRIBUTED BY HASH(id) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.09 sec)

mysql> CREATE MATERIALIZED VIEW mv
    -> BUILD IMMEDIATE REFRESH COMPLETE
    -> KEY (username)
    -> DISTRIBUTED BY HASH(username) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1')
    -> AS SELECT /*+ SET_VAR(exec_mem_limit=1048576, query_timeout=3600) */ t1.username ,t2.pv FROM t_user t1 LEFT JOIN t_user_pv t2 on t1.id = t2.id;
Query OK, 0 rows affected (0.10 sec)
```
2023-01-30 01:05:41 +08:00
98649ec9f8 [fix](Nereids): Fix some functions error (#16197)
* fix bugs in regexp_extract_all

* fix rpad

* fix weekofday

* fix cryptor

* fix timestamp

* fix st_ function
2023-01-30 00:41:31 +08:00
7d648a94d0 [fix](Nereids): fix scalar_function A-F. (#16209)
* [fix](Nereids): fix scalar_function A-F.

* [Fix](regression-test)fix regression test framework cannot compare double value nan and inf.

* revert dround()
2023-01-30 00:37:34 +08:00
217db3e4c8 [refactor](built-in function) remove symbols for vectorized function (#16189)
* [refactor](built-in function) remove symbols for vectorized function

* update

* update
2023-01-29 21:30:09 +08:00
1db7882bb5 [Fix](Nereids): fix error of X-Z function for nereids (#16171) 2023-01-29 20:42:30 +08:00
1ec88cbff6 [fix](nereids) AggregationNode process null as key column in wrong way (#16125)
in AggregationNode, _merge_with_serialized_key_helper method should convert the key column to full column if the key column is null literal.
2023-01-29 20:12:07 +08:00
1ad6ef939b [refactor](Nereids) use immutable collections as far as possible (#16193) 2023-01-29 16:48:58 +08:00
04ed83cb36 [fix](Nereids): remove DataV2Type in ConvertTz SIGNATURES (#16170)
* [fix](Nereids): remove `DataV2Type` in ConvertTz SIGNATURES

* remove it in doris_builtins_functions.py
2023-01-29 16:11:17 +08:00
abc50c6fe5 [enhance](Nereids): remove duplicated alias Function. (#16187) 2023-01-29 14:56:20 +08:00
c6bc0a03a4 [feature](Load)Suppot MySQL Load Data (#15511)
Main subtask of [DSIP-28](https://cwiki.apache.org/confluence/display/DORIS/DSIP-028%3A+Suppot+MySQL+Load+Data)

## Problem summary
Support mysql load syntax as below: 
```sql
LOAD DATA
    [LOCAL]
    INFILE 'file_name'
    INTO TABLE tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    [COLUMNS TERMINATED BY 'string']
    [LINES TERMINATED BY 'string']
    [IGNORE number {LINES | ROWS}]
    [(col_name_or_user_var [, col_name_or_user_var] ...)]
    [SET (col_name={expr | DEFAULT} [, col_name={expr | DEFAULT}] ...)]
    [PROPERTIES (key1 = value1 [, key2=value2]) ]
```

For example, 
```sql
            LOAD DATA 
            LOCAL
            INFILE 'local_test.file'
            INTO TABLE db1.table1
            PARTITION (partition_a, partition_b, partition_c, partition_d)
            COLUMNS TERMINATED BY '\t'
            (k1, k2, v2, v10, v11)
            set (c1=k1,c2=k2,c3=v10,c4=v11)
            PROPERTIES ("auth" = "root:", "strict_mode"="true")
```

Note that in this pr the property named `auth` must be set since stream load need auth. I will optimize it later.
2023-01-29 14:44:59 +08:00
eb7da1c0ee [fix](datatype) fix some bugs about data type array datetimev2 and decimalv3 (#16132) 2023-01-29 14:26:08 +08:00
578a855b3e [Bug](topn-opt) filter condition for analytic info for two phase read opt (#16173)
two phase read optimization should not be enabled when query has analytic info
2023-01-29 12:06:18 +08:00
ce487e2b11 [fix](Nereids): fix dceil() dfloor() (#16174) 2023-01-29 11:59:23 +08:00
35398ad8d9 [fix](multi-catalog)Use -1 for column_statistics internal table idx_id default value instead of null, for external catalog (#16177)
The internal statistic table column_statistics has a non-null field idx_id, the insert sql for hive table set the default value to NULL, which will failed to insert the result. Change it to -1.
2023-01-29 11:29:25 +08:00
Pxl
2b5f95f08a [Bug](function) remove datev2 signature of hour_ceil/hour_floor #16168 2023-01-29 11:27:56 +08:00
3151d94e9e [fix](Nereids): fix Ceiling. (#16164) 2023-01-28 20:26:20 +08:00
da28d2faee [deps](http)Upgrade springboot version to 2.7.8 (#16158)
* Upgrade springboot version to 2.7.8

* fix
2023-01-28 20:13:50 +08:00
c506b4a1e3 [bug](cooldown)add config for Cooldown Job 2023-01-28 19:58:50 +08:00
26fc7c8196 [Bug](decimalv3) fix BE crash for function if (#16152) 2023-01-28 19:37:50 +08:00
7e7fd5d049 [cleanup](fe) cleanup useless code. (#16129)
* [cleanup](Nereids): cleanup useless code.

* revert ErrorCode.java
2023-01-28 18:44:43 +08:00
49395390be [bugfix](metareader) meta reader could not load image (#16148)
This bug is introduced by PR #16009.
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-28 14:22:18 +08:00
b919cbe487 [ehancement](nereids) Enhancement for limit clause (#16114)
support limit offset without order by.
the legacy planner supoort this feature in PR #15218
2023-01-28 11:04:03 +08:00
1589d453a3 [fix](multi catalog)Support parquet and orc upper case column name (#16111)
External hms catalog table column names in doris are all in lower case,
while iceberg table or spark-sql created hive table may contain upper case column name,
which will cause empty query result. This pr is to fix this bug.
1. For parquet file, transfer all column names to lower case while parse parquet metadata.
2. For orc file, store the origin column names and lower case column names in two vectors, use the suitable names in different cases.
3. FE side, change the column name back to the origin column name in iceberg while doing convertToIcebergExpr.
2023-01-27 23:52:11 +08:00
a3cd0ddbdc [refactor](remove broker scan node) it is not useful any more (#16128)
remove broker scannode
remove broker table
remove broker scanner
remove json scanner
remove orc scanner
remove hive external table
remove hudi external table
remove broker external table, user could use broker table value function instead
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-23 19:37:38 +08:00
ab04a458aa [Enhancement](export) cancel all running coordinators when execute cancel-export statement. (#15801) 2023-01-22 23:11:32 +08:00
253445ca46 [vectorzied](jdbc) fix jdbc executor for get result by batch and memo… (#15843)
result set should be get by batch size2.
fix memory leak3.
2023-01-21 08:22:22 +08:00
87c7f2fcc1 [Feature](profile) set sql and defaultDb fields in show-load-profile. (#15875)
When execute show load profile '/', the value of SQL and DefaultDb columns are all 'N/A', but we can fill these fields,the result of this pr is as follows:

Execute show load profile '/'\G:

MySQL [test_d]> show load profile '/'\G
*************************** 1. row ***************************
   QueryId: 652326
      User: N/A
 DefaultDb: default_cluster:test_d
       SQL: LOAD LABEL `default_cluster:test_d`.`xxx`  (APPEND DATA INFILE ('hdfs://xxx/user/hive/warehouse/xxx.db/xxx/*')  INTO TABLE xxx FORMAT AS 'ORC' (c1, c2, c3) SET (`c1` = `c1`, `c2` = `c2`, `c3` = `c3`))  WITH BROKER broker_xxx (xxx)  PROPERTIES ("max_filter_ratio" = "0", "timeout" = "30000")
 QueryType: Load
 StartTime: 2023-01-12 18:33:34
   EndTime: 2023-01-12 18:33:46
 TotalTime: 11s613ms
QueryState: N/A
1 row in set (0.01 sec)
2023-01-21 08:10:15 +08:00
8b40791718 [Feature](ES): catalog support mapping es _id #15943 2023-01-21 08:08:32 +08:00
01c001e2ac [refactor](javaudf) simplify UdfExecutor and UdafExecutor (#16050)
* [refactor](javaudf) simplify UdfExecutor and UdafExecutor

* update

* update
2023-01-21 08:07:28 +08:00
2daa5f3fef [fix](statistics) Fix statistics related threads continuously spawn as doing checkpoint #16088 2023-01-21 07:58:33 +08:00
7814d2b651 [Fix](Oracle External Table) fix that oracle external table can not insert batch values (#16117)
Issue Number: close #xxx

This pr fix two bugs:

_jdbc_scanner may be nullptr in vjdbc_connector.cpp, so we use another method to count jdbc statistic. close [Enhencement](jdbc scanner) add profile for jdbc scanner #15914
In the batch insertion scenario, oracle database does not support syntax insert into tables values (...),(...); , what it supports is:
insert all
into table(col1,col2) values(c1v1, c2v1)
into table(col1,col2) values(c1v2, c2v2)
SELECT 1 FROM DUAL;
2023-01-21 07:57:12 +08:00
5514b1c1b7 [enhancement](tablet_report) accelerate deleteFromBackend function to avoid tablet report task blocked (#16115) 2023-01-20 20:11:58 +08:00
0305aad097 [fix](privilege)fix grant resource bug (#16045)
GRANT USAGE_PRIV ON RESOURCE * TO user;
user will see all database

Describe your changes.

Set a PrivPredicate for show resources and remove USAGE under PrivPredicate in SHOW_ PRIV
2023-01-20 19:00:44 +08:00
a4265fae70 [enhancement](query) Make query scan nodes more evenly distributed (#16037)
Add replicaNumPerHost into consideration while schedule scan node to host to make final query scan nodes more evenly distributed in cluster
2023-01-20 16:24:49 +08:00
419f433d21 [fix](Nereids) topn arg check is not compatible with legacy planner (#16105) 2023-01-20 15:08:10 +08:00
72df283344 [fix](planner) extract common factor rule should consider not only where predicate (#16110)
This PR #14381 limit the `ExtractCommonFactorsRule` to handle only `WHERE` predicate,
but the predicate in `ON` clause should also be considered. Such as:

```
CREATE TABLE `nation` (
  `n_nationkey` int(11) NOT NULL,
  `n_name` varchar(25) NOT NULL,
  `n_regionkey` int(11) NOT NULL,
  `n_comment` varchar(152) NULL
)
DUPLICATE KEY(`n_nationkey`)
DISTRIBUTED BY HASH(`n_nationkey`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);


select * from
nation n1 join nation n2
on (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
```

There should be predicates:
```
PREDICATES: `n1`.`n_name` IN ('FRANCE', 'GERMANY')
PREDICATES: `n2`.`n_name` IN ('FRANCE', 'GERMANY')
```
On each scan node.

This PR fix this issue by removing the limit of `ExtractCommonFactorsRule`
2023-01-20 14:53:48 +08:00
1638936e3f [fix](oracle catalog) oracle catalog support TIMESTAMP dateType of oracle (#16113)
`TIMESTAMP` dateType of Oracle will map to `DateTime` dateType of Doris
2023-01-20 14:47:58 +08:00
726427b795 [refactor](fe) refactor and upgrade dependency tree of FE and support AWS glue catalog (#16046)
1. Spark dpp
 
	Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module.
	So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar`
	will not be moved into `fe/lib`, which reduce the size of FE output.
	
2. Modify start_fe.sh

	Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that
	when loading classes with same qualified name, it will be got from doris-fe.jar firstly.
	
3. Upgrade hadoop and hive version

	hadoop: 2.10.2 -> 3.3.3
	hive: 2.3.7 -> 3.1.3
	
4. Override the IHiveMetastoreClient implementations from dependency

	`ProxyMetaStoreClient.java` for Aliyun DLF.
	`HiveMetaStoreClient.java` for origin Apache Hive metastore.

	Because I need to modified some of their method to make them compatible with
	different version of Hive.
	
5. Exclude some unused dependencies to reduce the size of FE output

	Now it is only 370MB (Before is 600MB)
	
6. Upgrade aws-java-sdk version to 1.12.31

7. Support AWS Glue Data Catalog

8. Remove HudiScanNode(no longer support)
2023-01-20 14:42:16 +08:00
3652cb3fe9 [test](Nereids)add test aboule dateType and dateTimeType (#16098) 2023-01-20 14:15:54 +08:00
101bc568d7 [fix](Nereids) fix bugs about date function (#16112)
1. when casting constant, check the value is whether in the range of targetType
2. change the scale of dateTimeV2 to 6
2023-01-20 14:11:17 +08:00
cbb203efd2 [fix](nereids) fix test_join regression test for nereids (#16094)
1. add TypeCoercion for (string, decimal) and (date, decimal)
2. The equality of LogicalProject node should consider children in some case
3. don't push down join condition like "t1 join t2 on true/false"
4. add PUSH_DOWN_FILTERS after FindHashConditionForJoin
5. nestloop join should support all kind of join
6. the intermediate tuple should contains slots from both children of nest loop join.
2023-01-20 14:02:29 +08:00
116e17428b [Enhancement](point query optimize) improve performace of point query on primary keys (#15491)
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
2023-01-20 13:33:01 +08:00
3ebc98228d [feature wip](multi catalog)Support iceberg schema evolution. (#15836)
Support iceberg schema evolution for parquet file format.
Iceberg use unique id for each column to support schema evolution.
To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side.
Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.
2023-01-20 12:57:36 +08:00
ab4127d0b2 [Fix][regression-test] Fix test_hdfs_tvf.groovy by update HDFS conf URI to uri and better error msg handling. (#16029)
Fix test_hdfs_tvf.groovy by update HDFS conf URI to uri and better error msg handling.
test_hdfs_tvf.groovy didn't passed.
2023-01-20 12:40:25 +08:00
ba71516eba [feature](jdbc catalog) support SQLServer jdbc catalog (#16093) 2023-01-20 12:37:38 +08:00