turn on all test case in scalar function W except width_bucket(fix be bug in next PR)
turn off all test case for group_concat(distinct order by)
fix return nullable in TimestampArithmetic
Clear transaction state log occupies too much time, so we change clear transaction log level from info to debug
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
In #16343, we split the timeout variable into two ones (one is for query and another is for insertion).
The function `ConnectProcessor::handleQuery` uses the corresponding session variable to change the timeout for the queries requested by MySQL client. However, the function `StmtExecutor::handleInsert` doesn't use the session variable to change the timeout, so we can't change the timeout for the CTAS and MTMV insertion job.
1. The first property is `only_specified_database`:
In the past, `Jdbc Catalog` will synchronize all database from source database.
Now we add a parameter called `only_specified_database` to jdbc catalog to allow only the specified database to be synchronized, eg:
```sql
create resource if not exists ${resource_name} properties(
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://172.18.0.1:${mysql_port}/doris_test?useSSL=false",
"driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver",
"only_specified_database" = "true"
);
```
if `only_specified_database` is `true`, jdbc catalog will only synchronize the database which is specified in `jdbc_url`.
2. The second property is `lower_case_table_names`:
This property will synchronize jdbc external data source table names in lower case.
```sql
create resource if not exists ${resource_name} properties(
"type"="jdbc",
"user"="doris_test",
"password"="123456",
"jdbc_url" = "jdbc:oracle:thin:@172.18.0.1:${oracle_port}:${SID}",
"driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/ojdbc8.jar",
"driver_class" = "oracle.jdbc.driver.OracleDriver",
"lower_case_table_names" = "true"
);
```
1. could not use static INSTANCE for FoldConstantOnBE rule, because it is stateful
2. if expression root is Alias, should use its child to do const collection
This CL mainly changes:
Support specifying csv schema manually in s3/hdfs table valued function
s3 (
'URI' = 'https://bucket1/inventory.dat',
'ACCESS_KEY'= 'ak',
'SECRET_KEY' = 'sk',
'FORMAT' = 'csv',
'column_separator' = '|',
'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)',
'use_path_style'='true'
)
Add new session variable dry_run_query
If set to true, the real query result will not be returned, instead, it will only return the number of returned rows.
mysql> select * from bigtable;
+--------------+
| ReturnedRows |
+--------------+
| 10000000 |
+--------------+
This can avoid large result set transmission time and focus on real execution time of query engine.
For debug and analysis purpose.
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:
Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err:
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:
prefer_compute_node_for_external_table
If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.
min_backend_num_for_external_table
Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
Check required properties when creating catalog.
To avoid some strange error when missing required properties
This PR add checks for:
hms catalog: check the validation of dfs.ha properties
jdbc catalog: check jdbc_url, driver_url, driver_class is set.
Fix NPE when init MasterCatalogExecutor
The MasterCatalogExecutor may be called by FrontendServiceImpl from BE, which does not have ConnectionContext.
Add more jdbc url param to resolve Chinese issue
add useUnicode=true&characterEncoding=utf-8 by default in jdbc catalog when connecting to MySQL
Update FAQ doc of catalog
Make database, table, column and other names support unicode by changing LABEL_REGEX COMMON_NAME_REGIEX COMMON_TABLE_NAME_REGEX COLUMN_NAME_REGEX regular expressions in class FeNameFormat.
P.S. @SharpRay has transfered PR #13467 to me, and I‘m responsible for the task now. There will be some modifications during the review period, so I create a new PR and the original #13467 could be closed. Thanks.
This pr refactor the rewrite framework from memo to plan tree, and speed up the analyze/rewrite stage.
Changes:
- abandoned memo in the analysis/rewrite stage, so that we can skip some actions, like new GroupExpression, distinct GroupExpression in the memo(high cost), update children to GroupPlan
- change the most of rules to static rule, so that we can skip initialize lots of rules in Analyzer/Rewriter at every query. but some rules need context, like visitor rule, create rule at the runtime make it is easy to use, so make `custom` rule can help us to create it.
- remove the `logger` field in the Job, Job are generated in large quantities at runtime, we don't need to use logger so save huge time to initialize logger.
- skip some rule as far as possible, e.g. `SelectMaterializedIndexWithoutAggregate`, skip select mv if the table not exist rullup.
- add some caches for frequent operation, like get Job.getDisableRules, Plan.getUnboundExpression
- new bottom up rewrite rule, it can keep traverse multiple new plan which return by rules, this feature depends on `Plan.mutableState`, it is necessary to add this variable field for plan. if the plan is fully immutable, we must use withXxx to renew the plan and set the state for it, this take more runtime overhead and developing workload. another reason is we need multiple mutable state, e.g. whether is applied the rule, whether this plan is manage by the rewrite framework. the good side of mutable state is efficient, but I suggest we don't direct use mutable state in the rule as far as possible, if we need use it, please wrap the mutable state in the framework to update and release it correctly. a good example is `AppliedAwareRuleCondition`, it can update and get the state: whether this plan is applied to a rule before.
- merge some rules, invoke multiple rules in one traverse
- refactor the `EliminateUnnecessaryProject` by CustomRewritor, fix the problem which eliminate some Project which decided the query output order, the case is limit(project), sort(project).
TODO: add trace for new rewrite framework
benchmark:
legacy optimizer:
```
+-----------+---------------+---------------+---------------+
| SQL ID | avg | min | max |
+-----------+---------------+---------------+---------------+
| SQL 1 | 1.39 ms | 0 ms | 9 ms |
| SQL 2 | 1.38 ms | 0 ms | 10 ms |
| SQL 3 | 2.05 ms | 1 ms | 18 ms |
| SQL 4 | 0.89 ms | 0 ms | 9 ms |
| SQL 5 | 1.74 ms | 1 ms | 11 ms |
| SQL 6 | 2.00 ms | 1 ms | 13 ms |
| SQL 7 | 1.83 ms | 1 ms | 15 ms |
| SQL 8 | 0.92 ms | 0 ms | 7 ms |
| SQL 9 | 2.60 ms | 1 ms | 19 ms |
| SQL 10 | 3.54 ms | 2 ms | 28 ms |
| SQL 11 | 3.04 ms | 1 ms | 18 ms |
| SQL 12 | 3.26 ms | 2 ms | 16 ms |
| SQL 13 | 1.10 ms | 0 ms | 10 ms |
| SQL 14 | 2.90 ms | 1 ms | 13 ms |
| SQL 15 | 1.18 ms | 0 ms | 9 ms |
| SQL 16 | 1.05 ms | 0 ms | 13 ms |
| SQL 17 | 1.03 ms | 0 ms | 7 ms |
| SQL 18 | 0.94 ms | 0 ms | 7 ms |
| SQL 19 | 1.47 ms | 0 ms | 13 ms |
| SQL 20 | 0.47 ms | 0 ms | 4 ms |
| SQL 21 | 0.54 ms | 0 ms | 5 ms |
| SQL 22 | 3.34 ms | 1 ms | 19 ms |
| SQL 23 | 7.97 ms | 4 ms | 44 ms |
| SQL 24 | 11.11 ms | 7 ms | 28 ms |
| SQL 25 | 0.98 ms | 0 ms | 8 ms |
| SQL 26 | 0.83 ms | 0 ms | 7 ms |
| SQL 27 | 0.93 ms | 0 ms | 16 ms |
| SQL 28 | 2.19 ms | 1 ms | 18 ms |
| SQL 29 | 3.23 ms | 1 ms | 20 ms |
| SQL 30 | 59.99 ms | 51 ms | 81 ms |
| SQL 31 | 2.65 ms | 1 ms | 18 ms |
| SQL 32 | 2.47 ms | 1 ms | 17 ms |
| SQL 33 | 2.30 ms | 1 ms | 16 ms |
| SQL 34 | 0.66 ms | 0 ms | 8 ms |
| SQL 35 | 0.63 ms | 0 ms | 6 ms |
| SQL 36 | 2.25 ms | 1 ms | 15 ms |
| SQL 37 | 5.97 ms | 3 ms | 20 ms |
| SQL 38 | 5.73 ms | 3 ms | 21 ms |
| SQL 39 | 6.32 ms | 4 ms | 23 ms |
| SQL 40 | 8.61 ms | 5 ms | 35 ms |
| SQL 41 | 6.29 ms | 4 ms | 28 ms |
| SQL 42 | 6.04 ms | 4 ms | 15 ms |
| SQL 43 | 5.81 ms | 3 ms | 16 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG | 4.22 ms | 2.47 ms | 17.05 ms |
| TOTAL SUM | 181.62 ms | 106 ms | 733 ms |
+-----------+---------------+---------------+---------------+
```
nereids with memo rewrite framework(old):
```
+-----------+---------------+---------------+---------------+
| SQL ID | avg | min | max |
+-----------+---------------+---------------+---------------+
| SQL 1 | 3.61 ms | 1 ms | 20 ms |
| SQL 2 | 3.47 ms | 2 ms | 16 ms |
| SQL 3 | 3.27 ms | 1 ms | 18 ms |
| SQL 4 | 2.23 ms | 1 ms | 12 ms |
| SQL 5 | 3.60 ms | 1 ms | 20 ms |
| SQL 6 | 2.73 ms | 1 ms | 17 ms |
| SQL 7 | 3.04 ms | 1 ms | 23 ms |
| SQL 8 | 3.53 ms | 2 ms | 20 ms |
| SQL 9 | 3.74 ms | 2 ms | 22 ms |
| SQL 10 | 3.66 ms | 2 ms | 18 ms |
| SQL 11 | 3.93 ms | 2 ms | 15 ms |
| SQL 12 | 4.85 ms | 2 ms | 27 ms |
| SQL 13 | 4.41 ms | 2 ms | 28 ms |
| SQL 14 | 5.16 ms | 2 ms | 41 ms |
| SQL 15 | 4.33 ms | 2 ms | 33 ms |
| SQL 16 | 4.94 ms | 2 ms | 51 ms |
| SQL 17 | 3.27 ms | 1 ms | 25 ms |
| SQL 18 | 2.78 ms | 1 ms | 22 ms |
| SQL 19 | 3.51 ms | 1 ms | 42 ms |
| SQL 20 | 1.84 ms | 1 ms | 13 ms |
| SQL 21 | 3.47 ms | 1 ms | 66 ms |
| SQL 22 | 5.21 ms | 2 ms | 29 ms |
| SQL 23 | 5.55 ms | 3 ms | 25 ms |
| SQL 24 | 4.21 ms | 2 ms | 28 ms |
| SQL 25 | 3.47 ms | 1 ms | 23 ms |
| SQL 26 | 3.03 ms | 2 ms | 21 ms |
| SQL 27 | 3.07 ms | 1 ms | 17 ms |
| SQL 28 | 4.51 ms | 3 ms | 22 ms |
| SQL 29 | 4.97 ms | 3 ms | 21 ms |
| SQL 30 | 11.95 ms | 8 ms | 33 ms |
| SQL 31 | 3.92 ms | 2 ms | 23 ms |
| SQL 32 | 3.74 ms | 2 ms | 15 ms |
| SQL 33 | 3.62 ms | 2 ms | 22 ms |
| SQL 34 | 4.60 ms | 1 ms | 55 ms |
| SQL 35 | 3.47 ms | 2 ms | 25 ms |
| SQL 36 | 3.34 ms | 2 ms | 18 ms |
| SQL 37 | 4.77 ms | 2 ms | 23 ms |
| SQL 38 | 4.44 ms | 2 ms | 39 ms |
| SQL 39 | 4.52 ms | 2 ms | 23 ms |
| SQL 40 | 5.50 ms | 3 ms | 30 ms |
| SQL 41 | 5.01 ms | 2 ms | 24 ms |
| SQL 42 | 4.32 ms | 2 ms | 24 ms |
| SQL 43 | 4.29 ms | 2 ms | 42 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG | 4.11 ms | 1.91 ms | 26.30 ms |
| TOTAL SUM | 176.88 ms | 82 ms | 1131 ms |
+-----------+---------------+---------------+---------------+
```
nereids with plan tree rewrite framework(new):
```
+-----------+---------------+---------------+---------------+
| SQL ID | avg | min | max |
+-----------+---------------+---------------+---------------+
| SQL 1 | 3.21 ms | 1 ms | 18 ms |
| SQL 2 | 3.99 ms | 1 ms | 76 ms |
| SQL 3 | 2.93 ms | 1 ms | 21 ms |
| SQL 4 | 2.13 ms | 1 ms | 21 ms |
| SQL 5 | 2.43 ms | 1 ms | 30 ms |
| SQL 6 | 2.08 ms | 1 ms | 11 ms |
| SQL 7 | 2.03 ms | 1 ms | 11 ms |
| SQL 8 | 2.27 ms | 1 ms | 22 ms |
| SQL 9 | 2.42 ms | 1 ms | 16 ms |
| SQL 10 | 2.65 ms | 1 ms | 14 ms |
| SQL 11 | 2.78 ms | 1 ms | 14 ms |
| SQL 12 | 3.09 ms | 1 ms | 19 ms |
| SQL 13 | 2.33 ms | 1 ms | 13 ms |
| SQL 14 | 2.66 ms | 1 ms | 16 ms |
| SQL 15 | 2.34 ms | 1 ms | 15 ms |
| SQL 16 | 2.04 ms | 1 ms | 30 ms |
| SQL 17 | 2.09 ms | 1 ms | 17 ms |
| SQL 18 | 1.87 ms | 1 ms | 15 ms |
| SQL 19 | 2.21 ms | 1 ms | 50 ms |
| SQL 20 | 1.32 ms | 0 ms | 12 ms |
| SQL 21 | 1.63 ms | 1 ms | 11 ms |
| SQL 22 | 2.75 ms | 1 ms | 30 ms |
| SQL 23 | 3.44 ms | 2 ms | 17 ms |
| SQL 24 | 2.01 ms | 1 ms | 14 ms |
| SQL 25 | 1.58 ms | 1 ms | 11 ms |
| SQL 26 | 1.53 ms | 0 ms | 13 ms |
| SQL 27 | 1.62 ms | 1 ms | 12 ms |
| SQL 28 | 2.90 ms | 1 ms | 21 ms |
| SQL 29 | 3.04 ms | 2 ms | 17 ms |
| SQL 30 | 10.54 ms | 7 ms | 49 ms |
| SQL 31 | 2.61 ms | 1 ms | 21 ms |
| SQL 32 | 2.42 ms | 1 ms | 14 ms |
| SQL 33 | 2.13 ms | 1 ms | 14 ms |
| SQL 34 | 1.69 ms | 1 ms | 14 ms |
| SQL 35 | 1.87 ms | 1 ms | 15 ms |
| SQL 36 | 2.37 ms | 1 ms | 21 ms |
| SQL 37 | 3.06 ms | 1 ms | 15 ms |
| SQL 38 | 4.09 ms | 1 ms | 31 ms |
| SQL 39 | 5.81 ms | 2 ms | 43 ms |
| SQL 40 | 4.55 ms | 2 ms | 34 ms |
| SQL 41 | 3.49 ms | 1 ms | 20 ms |
| SQL 42 | 2.75 ms | 1 ms | 26 ms |
| SQL 43 | 2.81 ms | 1 ms | 14 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG | 2.78 ms | 1.19 ms | 21.35 ms |
| TOTAL SUM | 119.56 ms | 51 ms | 918 ms |
+-----------+---------------+---------------+---------------+
```
1. Make sure all sub types which STRUCT supported work correctly;
2. remove unused variable `_need_validate_data`;
3. lazy init min or max decimal to support nested DecimalV2 column validate;
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Hive store all the data without partition columns to a default partition named __HIVE_DEFAULT_PARTITION__.
Doris will fail to get the this partition when the partition column type is INT or something else that
__HIVE_DEFAULT_PARTITION__ couldn't convert to.
This pr is to support hive default partition, set the column value to NULL for the missing partition columns.
Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array.
And the later bytes will overwrite the previous one and make wrong bytes order among the network.
Copy the byte array and then fill it into network.