Support delta encoding and rle(bool) to read Glue data
add delta bit pack decoder,
add delta length byte array decoder,
add delta byte array decoder.
add rle bool decoder.
We find some data type is read with delta encoding on AWS Glue, so it should be supported.
The definition of delta encoding can refer to the delta encoding in parquet.
If table has no partition, backup will report error:
2023-03-06 17:35:32,971 ERROR (backupHandler|24) [Daemon.run():118] daemon thread got exception. name: backupHandler
java.util.NoSuchElementException: No value present
at java.util.Optional.get(Optional.java:135) ~[?:1.8.0_152]
at org.apache.doris.catalog.OlapTable.selectiveCopy(OlapTable.java:1259) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.backup.BackupJob.prepareBackupMeta(BackupJob.java:505) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.backup.BackupJob.prepareAndSendSnapshotTask(BackupJob.java:398) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.backup.BackupJob.run(BackupJob.java:301) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.backup.BackupHandler.runAfterCatalogReady(BackupHandler.java:188) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.0-SNAPSHOT]
Notice some changes:
1. Support cancel query for mysql load
2. Change the thread pool for mysql load manager.
3. Fix sucret path check logic
4. Fix some doc error
1. 'insert into' profile has 'insert' type, can not query by 'load' type
2. 'insert into' profile does not have job_id, can not query by job_id. so put all profiles key with query_id
3. 'broker load' profile does not have some infos, npe
ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector
For other algorithms, an error should be reported when there is no init vector
Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.
Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt
Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
Describe your changes.
In the past, pg catalog use sql SELECT schema_name FROM information_schema.schemata where schema_owner='<UserName>'; to select schemas of an user. Howerver, this sql can not find all schemas that a user can access, that because:
A user may not be the owner of an schema, but may have read permission on the schema.
A user may inherit the permissions of its user group and thus have read permissions on one schema.
For these reasons, we replace the sql statement with select nspname from pg_namespace where has_schema_privilege('<UserName>', nspname, 'USAGE');
Modify the default value of mem_limit to auto. auto means process mem limit is equal to max(physical mem * 0.9, 6.4G).
6.4G is the maximum memory reserved for the system.
1. The first property is `only_specified_database`:
In the past, `Jdbc Catalog` will synchronize all database from source database.
Now we add a parameter called `only_specified_database` to jdbc catalog to allow only the specified database to be synchronized, eg:
```sql
create resource if not exists ${resource_name} properties(
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://172.18.0.1:${mysql_port}/doris_test?useSSL=false",
"driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver",
"only_specified_database" = "true"
);
```
if `only_specified_database` is `true`, jdbc catalog will only synchronize the database which is specified in `jdbc_url`.
2. The second property is `lower_case_table_names`:
This property will synchronize jdbc external data source table names in lower case.
```sql
create resource if not exists ${resource_name} properties(
"type"="jdbc",
"user"="doris_test",
"password"="123456",
"jdbc_url" = "jdbc:oracle:thin:@172.18.0.1:${oracle_port}:${SID}",
"driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/ojdbc8.jar",
"driver_class" = "oracle.jdbc.driver.OracleDriver",
"lower_case_table_names" = "true"
);
```
This CL mainly changes:
Support specifying csv schema manually in s3/hdfs table valued function
s3 (
'URI' = 'https://bucket1/inventory.dat',
'ACCESS_KEY'= 'ak',
'SECRET_KEY' = 'sk',
'FORMAT' = 'csv',
'column_separator' = '|',
'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)',
'use_path_style'='true'
)
Add new session variable dry_run_query
If set to true, the real query result will not be returned, instead, it will only return the number of returned rows.
mysql> select * from bigtable;
+--------------+
| ReturnedRows |
+--------------+
| 10000000 |
+--------------+
This can avoid large result set transmission time and focus on real execution time of query engine.
For debug and analysis purpose.
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:
Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err:
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:
prefer_compute_node_for_external_table
If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.
min_backend_num_for_external_table
Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
Check required properties when creating catalog.
To avoid some strange error when missing required properties
This PR add checks for:
hms catalog: check the validation of dfs.ha properties
jdbc catalog: check jdbc_url, driver_url, driver_class is set.
Fix NPE when init MasterCatalogExecutor
The MasterCatalogExecutor may be called by FrontendServiceImpl from BE, which does not have ConnectionContext.
Add more jdbc url param to resolve Chinese issue
add useUnicode=true&characterEncoding=utf-8 by default in jdbc catalog when connecting to MySQL
Update FAQ doc of catalog
* Support mapping es date format, default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis
* Replace simple json with jackson, resolve column order random problem
* Add es array doc version
Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param,
which enables to limit the number of elements in result array.