Commit Graph

5755 Commits

Author SHA1 Message Date
678f34cad3 [fix](planner) insert default value should not change return type of function object in function set (#17536)
function now's return type changed to datetimev2 by mistake.
It can be reproduced in the following way

CREATE TABLE `testdt` (
  `c1` int(11) NULL,
  `c2` datetimev2 NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=OLAP
DUPLICATE KEY(`c1`, `c2`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false"
);

 insert into testdt2(c1) values(1);

select now();
2023-03-08 17:08:28 +08:00
b1ca87eb9b [FIX](complex-type) fix Is null predict for map/struct (#17497)
Fix is null predicate is not supported in select statement for map and struct column
2023-03-08 17:03:06 +08:00
feacb15e71 [Improvement](datev2) push down datev2 predicates with date literal (#17522) 2023-03-08 16:54:54 +08:00
36b6cea462 [feature-wip](nereids) Support Q-Error to measure the accuracy of derived statistics (#17185)
Collect each estimated output rows and exact output rows for each plan node, and use this to measure the accuracy of derived statistics. The estimated result is managed by ProfileManager. We would get this estimated result in the http request by query id later.
2023-03-08 16:26:24 +08:00
d908d5fe01 [dependency](fe)Dependency Upgrade (#17377)
* Upgrade log4j to 2.X
  - binding log4j version to 2.18.0
  - used log4j-1.2-api complete smooth upgrade
* Upgrade filerupload to 1.5
* Upgrade commons-io to 2.7
* Upgrade commons-compress to 1.22
* Upgrade gson to 2.8.9
* Upgrade guava to 30.0-jre
* Binding jackson version to 2.14.2
* Upgrade netty-all to 4.1.89.final
* Upgrade protobuf to 3.21.12
* Upgrade kafka-clints to 3.4.0
* Upgrade calcite version to 1.33.0
* Upgrade aws-java-sdk to 1.12.302
* Upgrade hadoop to 3.3.4
* Upgrade zookeeper to 3.4.14
* Binding tomcat-embed-core to 8.5.86
* Upgrade apache parent pom to 25
* Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately
* Basic public dependencies are extracted to parent dependencies
* Use jackson uniformly as the basic json tool
* Remove springloaded, spring-boot-devtools has the same functionality
* Modify the spark-related dependency scope to provide, which should be provided at runtime
2023-03-08 14:28:40 +08:00
aab14922af [Feature](Nereids) support MarkJoin (#16616)
# Proposed changes
1.The new optimizer supports the combination of subquery and disjunction.In the way of MarkJoin, it behaves the same as the old optimizer. For design details see:https://emmymiao87.github.io/jekyll/update/2021/07/25/Mark-Join.html.
2.Implicit type conversion is performed when conjects are generated after subquery parsing
3.Convert the unnesting of scalarSubquery in filter from filter+join to join + Conjuncts.
2023-03-08 14:26:24 +08:00
626fbc34f9 [bugfix](jsonb) Fix create mv using jsonb key cause be crash (#17430) 2023-03-08 14:18:26 +08:00
4ea0d6c5fa [feature](array_function) add support for array_popfront (#17416) 2023-03-08 13:57:38 +08:00
b1d65f855d [Feature](array-function) Support array_concat function (#17436) 2023-03-08 13:57:16 +08:00
2b6133f4d0 [feature](Nereids): pushdown complex project through inner/outer Join. (#17365) 2023-03-08 12:00:56 +08:00
4b743061b4 [feature](function) support type template in SQL function (#17344)
A new way just like c++ template is proposed in this PR. The previous functions can be defined much simpler using template function. 

    # map element extract template function
    [['element_at', '%element_extract%'], 'E', ['ARRAY<E>', 'BIGINT'], 'ALWAYS_NULLABLE', ['E']],

    # map element extract template function
    [['element_at', '%element_extract%'], 'V', ['MAP<K, V>', 'K'], 'ALWAYS_NULLABLE', ['K', 'V']],


BTW, the plain type function is not affected and the legacy ARRAY_X MAP_K_V is still supported for compatability.
2023-03-08 10:51:31 +08:00
c97422bd3d [enhancement](regression-test) add sleep 3s for schema change and rollup (#17484)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-08 10:43:05 +08:00
a767472c56 [fix](DOE)Fix es p0 case error (#17502)
Fix es array parse error, introduced by #16806
2023-03-08 08:06:30 +08:00
6b88df2bdd [enhancement](planner) support case transition of timestamp datatype when create table (#17305) 2023-03-07 21:03:25 +08:00
fd8adb492d [fix](nereids) fix bugs in nereids window function (#17284)
fix two problems:

1. push agg-fun in windowExpression down to AggregateNode
for example, sql:
select sum(sum(a)) over (order by b)
Plan:
windowExpression( sum(y) over (order by b))
+--- Agg(sum(a) as y, b)

2. push other expr to upper proj
for example, sql:
select sum(a+1) over ()
Plan:
windowExpression(sum(y) over ())
+--- Project(a + 1 as y,...)
+--- Agg(a,...)
2023-03-07 16:35:37 +08:00
fca567068e [Enhancement](spark load)Support for RM HA (#15000)
Adding RM HA configuration to the spark load.
Spark can accept HA parameters via config, we just need to accept it in the DDL

CREATE EXTERNAL RESOURCE spark_resource_sinan_node_manager_ha
PROPERTIES
(
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.executor.memory" = "10g",
"spark.yarn.queue" = "XXXX",
"spark.hadoop.yarn.resourcemanager.address" = "XXXX:8032",
"spark.hadoop.yarn.resourcemanager.ha.enabled" = "true",
"spark.hadoop.yarn.resourcemanager.ha.rm-ids" = "rm1,rm2",
"spark.hadoop.yarn.resourcemanager.hostname.rm1" = "XXXX",
"spark.hadoop.yarn.resourcemanager.hostname.rm2" = "XXXX",
"spark.hadoop.fs.defaultFS" = "hdfs://XXXX",
"spark.hadoop.dfs.nameservices" = "hacluster",
"spark.hadoop.dfs.ha.namenodes.hacluster" = "mynamenode1,mynamenode2",
"spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode1" = "XXX:8020",
"spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode2" = "XXXX:8020",
"spark.hadoop.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"working_dir" = "hdfs://XXXX/doris_prd_data/sinan/spark_load/",
"broker" = "broker_personas",
"broker.username" = "hdfs",
"broker.password" = "",
"broker.dfs.nameservices" = "XXX",
"broker.dfs.ha.namenodes.XXX" = "mynamenode1, mynamenode2",
"broker.dfs.namenode.rpc-address.XXXX.mynamenode1" = "XXXX:8020",
"broker.dfs.namenode.rpc-address.XXXX.mynamenode2" = "XXXX:8020",
"broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);

Co-authored-by: liujh <liujh@t3go.cn>
2023-03-07 15:46:14 +08:00
704faaed84 [feature](Nereids) add rule split limit into two phase (#16797)
1. Add a rule split limit, like Limit(Origin) ==> Limit(Global) -> Gather -> Limit(Local)
2. Add a rule: limit-> sort ==> topN
3. fix a bug about topN
4. make the type of limit,offset long in topN
And because this rule is always beneficial, we add a rule in the rewrite phase
2023-03-07 15:34:12 +08:00
05c5ab5490 [fix](planner) only table name should convert to lowercase when create table (#17373)
we met error: Unknown column '{}DORIS_DELETE_SIGN{}' in 'default_cluster:db.table.
that because when we use alias as the tableName to construct a Table, all parts of the name will be lowercase if lowerCaseTableNames = 1.
To avoid it, we should extract tableName from alias and only lower tableName
2023-03-07 14:41:35 +08:00
b9bb28f22c [Enhancement](Planner)fix unclear exception msg when create table. #17473 2023-03-07 13:38:20 +08:00
357d8c1746 [enhance](Nereids): remove rule flag in LogicalJoin (#17452) 2023-03-07 13:18:50 +08:00
b8c9875adb [refactor](Nereids): refactor PushdownLimit (#17355) 2023-03-07 12:04:20 +08:00
b0e3156f51 [enhance](Nereids): refactor code in Project (#17450) 2023-03-07 11:15:33 +08:00
f79b066790 [fix](resource)Add s3 checker for alter resource (#17467)
* add s3 validity checker for alter resource.

* add s3 validity checker for alter resource.

* add s3 validity checker for alter resource.
2023-03-07 11:07:15 +08:00
7e96b06e6c [Enhance](auth)Users support multiple roles (#17236)
Describe your changes.
1.support GRANT role [, role] TO user_identity
2.support REVOKE role [, role] FROM user_identity
3.’Show grants‘ Add a column to display the roles owned by users
4.‘alter user’ prohibit deleting user's role
5.Repair Logic of roleName cannot start with RoleManager.DEFAULT_ ROLE
2023-03-07 10:28:56 +08:00
bada731390 [fix](restore) fix bug when replay restore and reserve dynamic partition (#17326)
when replay restore a table with reserve_dynamic_partition_enable=true,
must registerOrRemoveDynamicPartitionTable with isReplay=true, or maybe cause
OBSERVER can not replay restore auditlog success.
2023-03-07 10:13:08 +08:00
f85f89f240 [fix](planner) Fix incosistency between groupby expression and output of aggregation node (#17438) 2023-03-07 09:38:20 +08:00
50bf02024a [Improvement](meta) support return total statistics of all databases for command show proc '/jobs (#17342)
currently, show proc jobs command can only used on a specific database,
if a user want to see overall data of the whole cluster, he has to look into every database and sum them up,
it's troublesome.
now he can achieve it simply by giving a -1 as dbId.

mysql> show proc '/jobs/-1';
+---------------+---------+---------+----------+-----------+-------+
| JobType | Pending | Running | Finished | Cancelled | Total |
+---------------+---------+---------+----------+-----------+-------+
| load | 0 | 0 | 0 | 2 | 2 |
| delete | 0 | 0 | 0 | 0 | 0 |
| rollup | 0 | 0 | 1 | 0 | 1 |
| schema_change | 0 | 0 | 2 | 0 | 2 |
| export | 0 | 0 | 0 | 3 | 3 |
+---------------+---------+---------+----------+-----------+-------+

mysql> show proc '/jobs/-1/rollup';
+----------+------------------+---------------------+---------------------+------------------+-----------------+----------+---------------+----------+------+----------+---------+
| JobId | TableName | CreateTime | FinishTime | BaseIndexName | RollupIndexName | RollupId | TransactionId | State | Msg | Progress | Timeout |
+----------+------------------+---------------------+---------------------+------------------+-----------------+----------+---------------+----------+------+----------+---------+
| 17826065 | order_detail | 2023-02-23 04:21:01 | 2023-02-23 04:21:22 | order_detail | rp1 | 17826066 | 6009 | FINISHED | | NULL | 2592000 |
+----------+------------------+---------------------+---------------------+------------------+-----------------+----------+---------------+----------+------+----------+---------+
1 row in set (0.01 sec)
2023-03-07 08:57:55 +08:00
440cf526c8 [fix](type compatibility) fix unsigned int type compatibility problem (#17427)
Fix unsigned int type compatibility value scope problem.

When defining columns, map UNSIGNED INT to BIGINT for compatibility.
The problems are as follows:
It is not consistent with this doc
image

We support the unsigned int type to be compatible with mysql types, but the unsigned int type is created as the int at the time of definition. This will cause numerical overflow.
2023-03-07 08:55:38 +08:00
b68001aee5 [fix](priv) fix duplicated priv check when check column priv (#17446)
when executing select stmt, columns privilege check will be invoked multiple times(column number in select stmt)

Issue Number: close #xxx
2023-03-07 08:51:55 +08:00
48c2d806d7 [enhencement](jdbc catalog) Use Druid instead of HikariCP in JdbcClient (#17395)
This pr does three things:
1. Use Druid instead of HikariCP in JdbcClient
2. when download udf jar, add the name of the jar package after the local file name.
3. refactor some jdbcResource code
2023-03-07 08:51:10 +08:00
aedbc5fcb1 [fix](planner) Slots in the cojuncts of table function node didn't got materialized #17460 2023-03-07 08:50:33 +08:00
Pxl
28c55f15c9 [Enchancement](Materialized-View) add more error infomation for select materialized view fail (#17262)
add more error infomation for select materialized view fail
2023-03-06 18:59:46 +08:00
dca16796ad [fix](ParquetReader) definition level of repeated parent is wrong (#17337)
Fix three bugs:
1.  `repeated_parent_def_level ` should be the definition of its repeated parent.
2. Failed to parse schema like `decimal(p, s)`
3. Fill wrong offsets for array type
2023-03-06 18:15:57 +08:00
0ad638f9fe [enhancement](transaction) Reduce hold writeLock time for DatabaseTransactionMgr to clear transaction (#17414)
* [enhancement](transaction) Reduce hold writeLock time for DatabaseTransactionMgr to clear transaction

* fix ut

* remove unnessary field for remove txn bdbje log

---------

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-03-06 11:32:21 +08:00
56a3ead2d7 [Improvement](restore) make timeout of restore job's dispatching task progress configuable (#17434)
when a restore job which has a plenty of replicas, it may fail due to timeout. The error message is:
[RestoreJob.checkAndPrepareMeta():782] begin to send create replica tasks to BE for restore. total 381344 tasks. timeout: 600000

Currently, the max value of timeout is fixed, it's not suitable for such cases.
2023-03-06 10:05:31 +08:00
a8f20eb4ac [Enhencement](schema_scanner) Optimize the performance of reading information schema tables (#17371)
batch fill block
batch call rpc from FE to get table desc
For 34w colunms

SELECT COUNT( * ) FROM information_schema.columns;
time: 10.3s --> 0.4s
2023-03-06 09:53:01 +08:00
d8a231f340 [Improvement](auth)(step-2) add ranger authorizer for hms catalog (#17424) 2023-03-05 21:50:44 +08:00
afb5def385 [enhancement](timeout) replace query timeout with exec timeout (#17360) 2023-03-05 11:03:59 +08:00
627b5ee302 [enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#17329) 2023-03-05 10:18:56 +08:00
b9b028099d [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id (#17362)
* [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id

NewLoadStreamMgr already has pipe and other info. Do not need save the pipe into fragment state. and FragmentState should be more clear.

But this pr will change the behaviour of BE.
I will pick the pr to doris 1.2.3 and add the load id to FE support. The user could upgrade from 1.2.3 to 2.x
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-04 16:19:36 +08:00
82df2ae9d8 [feature](mysql) Support secure MySQL connection to FE (#17138)
Background:
Doris currently does not support SSL connection from MySQL clients, it's not secure enough in some cases, especially access Doris via the public internet.

Solution:
- Use TLS1.2 protocol to encrypt information.
- Implementation details
  * server <--- connect <--- client
  * if enable SSL: {
  * server <--- SSL connection request packet <--- client
  * server <--- SSL Exchange ---> client } (we will add this `if` logic part in this PR)
  * server ---> handshake request packet ---> client
  * server <--- encrypted data ---> client (this part will be realized in this PR)
- reference1 https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase.html#sect_protocol_connection_phase_initial_handshake_ssl_handshake
- reference2 https://www.rfc-editor.org/rfc/rfc5246

close #16313

Signed-off-by: Yukang Lian <yukang.lian2022@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: morningman <morningman@163.com>
2023-03-04 12:14:48 +08:00
9aecd517b0 [test](Nereids) turn on all test in scalar function w (#17269)
turn on all test case in scalar function W except width_bucket(fix be bug in next PR)
turn off all test case for group_concat(distinct order by)
fix return nullable in TimestampArithmetic
2023-03-04 08:23:50 +08:00
eea0cbec74 [enhancement](transaction) Reduce hold writeLock time for DatabaseTransactionMgr to improve stability of stream load (#17380)
Clear transaction state log occupies too much time, so we change clear transaction log level from info to debug


Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-03-03 19:06:39 +08:00
9f97cd029f [Feature] (Nereids) add check to disable unsupported type (#17196)
1. disable decimalv3
2. disable json
3. disable complex type: array, map, struct
4. disable switch: group_by_and_having_use_alias
2023-03-03 17:57:48 +08:00
b5b595519a [fix](log) use logger to replace printStackTrace() (#17382)
Use Logger to replace printStackTrace to better locate problems.
2023-03-03 14:51:30 +08:00
cc5fa509ad [fix](cooldown) Fix bug in concurrent update_cooldown_conf and operations that update cooldowned data (#17086) 2023-03-03 14:36:58 +08:00
f5d958ccf9 [fix](MTMV) Reset insert timeout in handleInsert (#17249)
In #16343, we split the timeout variable into two ones (one is for query and another is for insertion).

The function `ConnectProcessor::handleQuery` uses the corresponding session variable to change the timeout for the queries requested by MySQL client. However, the function `StmtExecutor::handleInsert` doesn't use the session variable to change the timeout, so we can't change the timeout for the CTAS and MTMV insertion job.
2023-03-03 11:32:50 +08:00
f5232e5c01 [vectorized](bug) fix some open enable_fold_constant_by_be failed cases (#17240) 2023-03-03 10:30:20 +08:00
449f2953c9 [Improvement](auth)(step-1) add ranger authorizer for hms catalog (#17153) 2023-03-03 09:45:08 +08:00
ba82cd10c6 [Enhencement](Jdbc catalog) Add two optional properties for jdbc catalog (#17245)
1. The first property is `only_specified_database`:
In the past, `Jdbc Catalog` will synchronize all database from source database.
Now we add a parameter called `only_specified_database` to jdbc catalog to allow only the specified database to be synchronized, eg:

```sql
create resource if not exists ${resource_name} properties(
    "type"="jdbc",
    "user"="root",
    "password"="123456",
    "jdbc_url" = "jdbc:mysql://172.18.0.1:${mysql_port}/doris_test?useSSL=false",
    "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
    "driver_class" = "com.mysql.cj.jdbc.Driver",
    "only_specified_database" = "true"
);
```
if `only_specified_database` is `true`, jdbc catalog will only synchronize the database which is specified in `jdbc_url`.

2. The second property is `lower_case_table_names`:
This property will synchronize jdbc external data source table names in lower case.

```sql
create resource if not exists ${resource_name} properties(
  "type"="jdbc",
  "user"="doris_test",
  "password"="123456",
  "jdbc_url" = "jdbc:oracle:thin:@172.18.0.1:${oracle_port}:${SID}",
  "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/ojdbc8.jar",
  "driver_class" = "oracle.jdbc.driver.OracleDriver",
  "lower_case_table_names" = "true"
);
```
2023-03-03 00:47:46 +08:00