Commit Graph

1955 Commits

Author SHA1 Message Date
397cc011c4 [fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)
ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector

For other algorithms, an error should be reported when there is no init vector

Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.

Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt

Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
2023-03-09 09:51:41 +08:00
8a6a4b82aa [typo](docs) Add a hyperlink to facilitate user redirect. (#17563) 2023-03-09 09:47:10 +08:00
bd5ed2b0c2 [enhancement](histogram) optimize the histogram bucketing strategy, etc (#17264)
* optimize the histogram bucketing strategy, etc

* fix p0 regression of histogram
2023-03-08 20:12:05 +08:00
05b04e4c39 [BugFix](PG catalog) fix that pg catalog can not get all schemas that a pg user can access. (#17517)
Describe your changes.
In the past, pg catalog use sql SELECT schema_name FROM information_schema.schemata where schema_owner='<UserName>'; to select schemas of an user. Howerver, this sql can not find all schemas that a user can access, that because:

A user may not be the owner of an schema, but may have read permission on the schema.
A user may inherit the permissions of its user group and thus have read permissions on one schema.
For these reasons, we replace the sql statement with select nspname from pg_namespace where has_schema_privilege('<UserName>', nspname, 'USAGE');
2023-03-08 19:12:47 +08:00
4ea0d6c5fa [feature](array_function) add support for array_popfront (#17416) 2023-03-08 13:57:38 +08:00
b1d65f855d [Feature](array-function) Support array_concat function (#17436) 2023-03-08 13:57:16 +08:00
ae916f7cb3 [docs](doc) Add docs for Apache Kyuubi (#17481)
* add kyuubi doc of zh-CN & en
2023-03-08 09:36:50 +08:00
Pxl
d8f0ca7108 [Chore](schema change) remove some unused code in schema change (#17459)
remove some unused code in schema change.
remove some row-based config and code.
2023-03-07 09:18:34 +08:00
50bf02024a [Improvement](meta) support return total statistics of all databases for command show proc '/jobs (#17342)
currently, show proc jobs command can only used on a specific database,
if a user want to see overall data of the whole cluster, he has to look into every database and sum them up,
it's troublesome.
now he can achieve it simply by giving a -1 as dbId.

mysql> show proc '/jobs/-1';
+---------------+---------+---------+----------+-----------+-------+
| JobType | Pending | Running | Finished | Cancelled | Total |
+---------------+---------+---------+----------+-----------+-------+
| load | 0 | 0 | 0 | 2 | 2 |
| delete | 0 | 0 | 0 | 0 | 0 |
| rollup | 0 | 0 | 1 | 0 | 1 |
| schema_change | 0 | 0 | 2 | 0 | 2 |
| export | 0 | 0 | 0 | 3 | 3 |
+---------------+---------+---------+----------+-----------+-------+

mysql> show proc '/jobs/-1/rollup';
+----------+------------------+---------------------+---------------------+------------------+-----------------+----------+---------------+----------+------+----------+---------+
| JobId | TableName | CreateTime | FinishTime | BaseIndexName | RollupIndexName | RollupId | TransactionId | State | Msg | Progress | Timeout |
+----------+------------------+---------------------+---------------------+------------------+-----------------+----------+---------------+----------+------+----------+---------+
| 17826065 | order_detail | 2023-02-23 04:21:01 | 2023-02-23 04:21:22 | order_detail | rp1 | 17826066 | 6009 | FINISHED | | NULL | 2592000 |
+----------+------------------+---------------------+---------------------+------------------+-----------------+----------+---------------+----------+------+----------+---------+
1 row in set (0.01 sec)
2023-03-07 08:57:55 +08:00
bc48cbff83 [doc](auth)auth doc (#17358)
* auth doc

* auth en doc

* add note
2023-03-07 08:05:09 +08:00
78a1d630e4 [docs](typo) fix faq docs, already support rename column. (#17428)
* Update data-faq.md

Already support rename column.

* fix

---------

Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-03-07 08:03:51 +08:00
02015cf153 [docs](typo) Correct the wrong default value of DECIMAL type displayed in the Help CREATE TABLE #17422
Correct the wrong default value of DECIMAL type displayed in the Help CREATE TABLE
2023-03-06 12:50:30 +08:00
9617f46fa5 [improvement](memory) Modify mem_limit default value (#17322)
Modify the default value of mem_limit to auto. auto means process mem limit is equal to max(physical mem * 0.9, 6.4G).
6.4G is the maximum memory reserved for the system.
2023-03-06 10:53:27 +08:00
d8a231f340 [Improvement](auth)(step-2) add ranger authorizer for hms catalog (#17424) 2023-03-05 21:50:44 +08:00
7b4fc412c5 [typo](docs) Optimize documents so that users can better understand. (#17295) 2023-03-04 21:02:45 +08:00
17164cf7a8 [fix](docs) add logic for batch delete when sequence column exists (#17367)
* [fix](docs) add logic for batch delete when sequence column exists.

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

* add docs

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

* fix docs 2

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-03 16:28:31 +08:00
3b94ca5ceb [chore](macOS) Use LLVM Clang by default (#17292)
Use LLVM Clang by default
2023-03-03 14:18:02 +08:00
6ce8200d9e [doc](typo) external-table-load.md (#17234)
* fix: external-table-load.md

The SQL with a syntax error.

* fix: external-table-load.md (Chinese)

The SQL with a syntax error.
2023-03-03 14:11:19 +08:00
11994b76d7 add the tag <version since="dev"> for insert_timeout. (#17316)
Co-authored-by: smallhibiscus <844981280>
2023-03-03 14:10:49 +08:00
ba108d40d8 [docs](link) Fix some links in docs is broken (#17335)
* [docs](link) Fix some links in docs is broken

* fix_typo
2023-03-03 14:08:05 +08:00
ba82cd10c6 [Enhencement](Jdbc catalog) Add two optional properties for jdbc catalog (#17245)
1. The first property is `only_specified_database`:
In the past, `Jdbc Catalog` will synchronize all database from source database.
Now we add a parameter called `only_specified_database` to jdbc catalog to allow only the specified database to be synchronized, eg:

```sql
create resource if not exists ${resource_name} properties(
    "type"="jdbc",
    "user"="root",
    "password"="123456",
    "jdbc_url" = "jdbc:mysql://172.18.0.1:${mysql_port}/doris_test?useSSL=false",
    "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
    "driver_class" = "com.mysql.cj.jdbc.Driver",
    "only_specified_database" = "true"
);
```
if `only_specified_database` is `true`, jdbc catalog will only synchronize the database which is specified in `jdbc_url`.

2. The second property is `lower_case_table_names`:
This property will synchronize jdbc external data source table names in lower case.

```sql
create resource if not exists ${resource_name} properties(
  "type"="jdbc",
  "user"="doris_test",
  "password"="123456",
  "jdbc_url" = "jdbc:oracle:thin:@172.18.0.1:${oracle_port}:${SID}",
  "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/ojdbc8.jar",
  "driver_class" = "oracle.jdbc.driver.OracleDriver",
  "lower_case_table_names" = "true"
);
```
2023-03-03 00:47:46 +08:00
39f59f554a [improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983)
This CL mainly changes:

Support specifying csv schema manually in s3/hdfs table valued function

s3 (
'URI' = 'https://bucket1/inventory.dat',
'ACCESS_KEY'= 'ak',
'SECRET_KEY' = 'sk',
'FORMAT' = 'csv',
'column_separator' = '|',
'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)',
'use_path_style'='true'
)
Add new session variable dry_run_query

If set to true, the real query result will not be returned, instead, it will only return the number of returned rows.

mysql> select * from bigtable;
+--------------+
| ReturnedRows |
+--------------+
| 10000000     |
+--------------+
This can avoid large result set transmission time and focus on real execution time of query engine.
For debug and analysis purpose.
2023-03-02 16:51:27 +08:00
9f088f6e90 [feature](json) add json_valid function (#17247)
add json_valid function

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-02 14:08:52 +08:00
30df268c1f [fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244)
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:

Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file  000.snappy.orc, err: 
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:

prefer_compute_node_for_external_table

If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.

min_backend_num_for_external_table

Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
2023-03-02 11:09:55 +08:00
b0c5250bf9 [Enhancement](tvf) support trim_double_quotes and skip_lines for S3 and HDFS table valued function (#17224)
support trim_double_quotes and skip_lines for S3 and HDFS table valued function
2023-03-01 23:41:31 +08:00
d44c4b1300 [improvement][fix](catalog) check required properties when creating catalog and fix jdbc catalog issue (#17209)
Check required properties when creating catalog.
To avoid some strange error when missing required properties

This PR add checks for:

hms catalog: check the validation of dfs.ha properties

jdbc catalog: check jdbc_url, driver_url, driver_class is set.

Fix NPE when init MasterCatalogExecutor
The MasterCatalogExecutor may be called by FrontendServiceImpl from BE, which does not have ConnectionContext.

Add more jdbc url param to resolve Chinese issue

add useUnicode=true&characterEncoding=utf-8 by default in jdbc catalog when connecting to MySQL

Update FAQ doc of catalog
2023-03-01 17:08:36 +08:00
ff8902370c [improvement](doc) Supplementary Bulk Deletion Notes (#17113)
* 补充批量删除注意事项

* 按照批量删除文档前文的介绍, 用户可能会开启`show_hidden_columns`的session variable来查看表是否支持批量删除. 
* 后续按示例进行DELETE/MERGE的导入作业后, 如果在同一个session中执行`select count(*) from xxx`语句时, 可能会发现结果与预期不一致
* 可能无法快速联想到是因为之前开启的session variable导致被删除的语句也被查出来了.

* supplement batch deletion notes for English doc
2023-03-01 13:35:20 +08:00
cfc2d45795 [typo](docs) fix typo (#17208) 2023-03-01 07:41:21 +08:00
eeca16d7a0 [fix](doc)adjust Flink connector document structure and add SchemaChange example (#17231) 2023-03-01 07:40:56 +08:00
475368c62d [typo](docs) Add some details about AES encryption. (#17243)
* [typo](docs) Add some details about AES encryption.

* Update aes.md

* Update aes.md

* Update aes.md

* Update aes.md
2023-03-01 07:40:11 +08:00
7369261f33 [typo](docs)update hight-concurrent-point-query.md (#17248)
Co-authored-by: liuxiaodong <liuxiaodong1@corp.netease.com>
2023-03-01 07:37:27 +08:00
b0de8d1925 [doc][community]correct the number of committers (#16905) 2023-02-28 10:48:06 +08:00
b51ce415e7 [Feature](load) Add submitter and comments to load job (#16878)
* [Feature](load) Add submitter and comments to load job
2023-02-28 09:06:19 +08:00
84413f33b8 [enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127) 2023-02-27 23:31:28 +08:00
c807596c51 [Docs](docs) Modify plugin documents (#17161)
* modify plugin docs

* add qe_slow_log_ms description

* add version describtion
2023-02-27 18:42:02 +08:00
95837b7958 [Enhancement](ES): Support mapping es date format and replace simple json with jackson (#16806)
* Support mapping es date format, default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis

* Replace simple json with jackson, resolve column order random problem

* Add es array doc version
2023-02-27 14:47:21 +08:00
c0360f80bb [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases (#15339)
Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param,
which enables to limit the number of elements in result array.
2023-02-27 14:22:30 +08:00
2626995fc1 [Doc](Load)Add mysql load document (#16483)
* Add doc

* 1

* doc2

* review again

* fix comment

* fix comment

* format

* add recommand dir

* cleint --local-infile

* add streaming_load_max_mb
2023-02-27 13:25:34 +08:00
f228cfdd00 [enhancement](session-variable)add a use_fix_replica session variable to fix query replica (#17101)
Add use_fix_replica session variable, so that we can be better debug replica inconsistencies problem.
If use_fix_replica default is -1, which means not fix,
else we will choose the {use_fix_replica} smallest replica.
2023-02-27 10:20:23 +08:00
aefcc98715 [Enhancement](datetimev2-enhance) support 'microseconds_sub' function for datetimev2 (#17130)
Based on #16970 , introduce microseconds_sub function for datetimev2
2023-02-27 08:47:30 +08:00
d8eb3ec6f7 fix set command example to enable_pipeline_engine (#17103) 2023-02-26 11:06:04 +08:00
14e80b18c8 Add csv file header filter documentation example (#17115) 2023-02-26 11:05:45 +08:00
32d08c9556 Update run-docker-cluster.md (#17116) 2023-02-26 11:05:28 +08:00
3a9aa03aab [BugFix](oracle-catalog) Modify the doris data type mapping of oracle NUMBER(p,s) type (#17051)
The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics. 
For Oracle Number(p,s) type:
1. 
if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+|s| ) significant digit,
and rounding will be performed at s position.
eg:  if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case,
Doris will use
int type (`TINYINT/SMALLINT/INT/.../LARGEINT`).

2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior.

3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx).
p represents how many digits can be left to the left after the decimal point,
the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type,
because there must be two zeros on the right side of the decimal point,
we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)`

4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`,
the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s,
so doris can not determine data type.
2023-02-26 09:05:41 +08:00
4093ef9e4b [fix](auth) fix losing global priv bug and refactor default role name (#16966)
This PR mainly changes:

When upgrading from old version to master, the ADMIN_PRIV for normal user may be lost.
This may only happen if:

Create a user with ADMIN_PRIV privilege.
Upgrade Doris to v1.2.x or master before the meta image which contains the edit log in step 1 is generate.
And the ADMIN_PRIV will be lost in Global Privileges
This PR will rectify this bug and set ADMIN_PRIV to the right place

Refactor the user's implicit role name

In [feature](auth)Implementing privilege management with rbac model #16091, we refactor the Doris auth model by introducing RBAC. And each user will have an implicit role,
named with prefix default_role_rbac_. But it has wrong format like:
default_role_rbac_'default_cluster:user1'@'%'

This PR change the role name's format, like:

default_role_rbac_user1@%
default_role_rbac_user2@[domain]
NOTICE: this change may cause incompatible metadata, but since [feature](auth)Implementing privilege management with rbac model #16091 is not released, we should fix it soon.

Add a new session variable show_user_default_role

When set to true, it will show implicit role of user in the result of show roles stmt. Default is false
2023-02-24 23:36:53 +08:00
c39914c0a0 [feature](partition)add default list partition (#15509)
This pr implements the list default partition referred in related #15507.
It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's
constraints and optimizer wouldn't filter default partition which means default partition would be scanned
each time you try to select data from one table with default partition.

User could either create one table with default partition or alter add one default partition.

```sql
PARTITION LIST(key) {
PARTITION p1 values in (xx,xx),
PARTITION DEFAULT
}

ALTER TABLE XXX ADD PARTITION DEFAULT
```

We don't support automatically migrate data inside default partition which meets newly added partition key's
constraint to newly add partition when alter add new partition. User should select default partition using new 
constraints as predicate and insert them to new partition.

```sql
insert into tbl select * from tbl partition default where partition_key=xx;
```
2023-02-24 15:24:59 +08:00
7470198df6 [Docs](docs) Organize http documents (#16618)
1.  Organize http documents
2. Add http interface authentication for FE
3. Support https interface for FE
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
2023-02-24 15:17:01 +08:00
Pxl
03f4c7a94d [Doc](Materialized-View) update documentation about materialized view enhancement (#17025)
update documentation about materialized view enhancement
2023-02-24 10:06:35 +08:00
37b9b038c4 [typo](docs) fix Fix incorrect url address in export-manual.md. (#17072) 2023-02-24 09:42:28 +08:00
1cce5782a0 [typo](docs) collect doc md language annotation (#17090) 2023-02-24 09:41:54 +08:00