Meta data place holder for statistics in version 2.1.x. Users could upgrade to this version, but doesn't support rollback.
After this change, statistics related functions doesn't need to change meta data any more in the 2.1 series.
In this PR, we will control whether the external data source query is a push-down function parameter in the filter condition, changing the enable_fun_pushdown of fe conf to the enable_ext_func_pred_pushdown of the variable
Followup #28890
Make HttpSqlConverterPlugin and AuditLoader as Doris' builtin plugin.
To make it simple for user to support sql dialect and using audit loader.
HttpSqlConverterPlugin
By default, there is nothing changed.
There is a new global variable sql_converter_service, default is empty, if set, the HttpSqlConverterPlugin will be enabled
set global sql_converter_service = "http://127.0.0.1:5001/api/v1/convert"
AuditLoader
By default, there is nothing changed.
There is a new global variable enable_audit_plugin, default is false, if set to true, the audit loader plugin will be enable.
Doris will create audit_log in __internal_schema when startup
If enable_audit_plugin is true, the audit load will be inserted into audit_log table.
3 other global variables related to this plugin:
audit_plugin_max_batch_interval_sec: The max interval for audit loader to insert a batch of audit log.
audit_plugin_max_batch_bytes: The max batch size for audit loader to insert a batch of audit log.
audit_plugin_max_sql_length: The max length of statement in audit log
The mysql type code mapped by the map type is 400, but 400 is an unknown type for mysql.
For the jdbc driver of mariadb, when querying through the http api of /api/query or using the jdbc driver of mariadb, an exception will occur.
For the jdbc driver of mysql, it will be converted into binary form, and the correct data can be read through the string type.
Therefore, the mysql custom type of map was removed and changed to string type, so that both the jdbc driver of mariadb and mysql can work normally.
Add `information_schema` database for all catalog.
This is useful when using BI tools to connect to Doris,
the tools can get meta info from `information_schema`.
This PR mainly changes:
1. There will be a `information_schema` db in each catalog.
2. Each `information_schema` db only store the meta info of the catalog it belongs to.
3. For `information_schema`, the `TABLE_SCHEMA` column's value is the database name.
4. There is a new global variable `show_full_dbname_in_info_schema_db`, default is false, if set to true,
The `TABLE_SCHEMA` column's value is the like `ctl.db`, because:
When connect to Doris, the `database` info in connection url will be: `xxx?db=ctl.db`.
And then some BI will try to query `information_schema` with sql like:
`select * from information_schema.columns where TABLE_SCHEMA = "ctl.db"`
So it has to be format as `ctl.db`
eg, the `information_schema.columns` table in external catalog `doris` is like:
```
mysql> select * from information_schema.columns limit 1\G
*************************** 1. row ***************************
TABLE_CATALOG: doris
TABLE_SCHEMA: doris.__internal_schema
TABLE_NAME: column_statistics
COLUMN_NAME: id
ORDINAL_POSITION: 1
COLUMN_DEFAULT: NULL
IS_NULLABLE: NO
DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 4096
CHARACTER_OCTET_LENGTH: 16384
NUMERIC_PRECISION: NULL
NUMERIC_SCALE: NULL
DATETIME_PRECISION: NULL
CHARACTER_SET_NAME: NULL
COLLATION_NAME: NULL
COLUMN_TYPE: varchar(4096)
COLUMN_KEY:
EXTRA:
PRIVILEGES:
COLUMN_COMMENT:
COLUMN_SIZE: 4096
DECIMAL_DIGITS: NULL
GENERATION_EXPRESSION: NULL
SRS_ID: NULL
```
6. Modify the behavior of
- show tables
- shwo databases
- show columns
- show table status
The above statements may query the `information_schema` db if there is `where` predicate after them
The current logic for SQL dialect conversion is all in the `fe-core` module, which may lead to the following issues:
- Changes to the dialect conversion logic may occur frequently, requiring users to upgrade the Doris version frequently within the fe-core module, leading to a longer change cycle.
- The cost of customized development is high, requiring users to replace the fe-core JAR package.
Turning it into a plugin can address the above issues properly.
We change memtable size from 200MB to 100MB to achieve smoother flush
performance. We change loadStreamPerNode from 20 to 60 to avoid stream
rpc to be the bottleneck when enable memtable_on_sink_node. We change
default s3&broker load parallelsim to make the most of CPUs on moderm
multi-core systems.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
We will Integrate new load job manager into new job scheduling framework so that the insert into task can be scheduled after the broker load sql is converted to insert into TVF(table value function) sql.
issue: https://github.com/apache/doris/issues/24221
Now support:
1. load data by tvf insert into sql, but just for simple load(columns need to be defined in the table)
2. show load stmt
- job id, label name, job state, time info
- simple progress
3. cancel load from db
4. support that enable new load through Config.enable_nereids_load
5. can replay job after restarting doris
TODO:
- support partition insert job
- support show statistics from BE
- support multiple task and collect task statistic
- support transactional task
- need add ut case
Add a new FE Config `sql_convertor_service`.
If this config is set, and the session variable `sql_dialect` is set,
Doris will try to use a standalone sql converter service to convert user input sql to
specified sql dialect. eg:
```
mysql> set sql_dialect="presto";
Query OK, 0 rows affected (0.02 sec)
Database changed
mysql> select * from db1.tbl1 where "k1" = 1; # will be converted to select * from db1.tbl1 where `k1` = 1;
+------+------+
| k1 | k2 |
+------+------+
| 1 | 2 |
+------+------+
1 row in set (0.08 sec)
```
The sql converter service should be a http service.
The request and response body can be found in `SQLDialectUtils.java`
Remove EXPERIMENTAL tag for enable_query_hive_views and set enable_query_hive_views to true as default.
This feature has been used on our cluster which has more then a hundred thousands of tables for several months, i think it is fine to enable it as default.
- Running task can be show and fix cancel fail
- When the insert task scheduling cycle is reached, if there are still tasks running, the scheduling of this task will be canceled at this time.
- refactor job status changes SQL
- Fix timer job window error
- Support cancel task
set upload/download task num per be, and improve the overall speed of upload/download, enhance the performance of backup and recovery.
---------
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
Support where, group by, having, order by clause without from clause in query statement.
For example as following:
SELECT 1 AS a, COUNT(), SUM(2), AVG(1), RANK() OVER() AS w_rank
WHERE 1 = 1
GROUP BY a, w_rank
HAVING COUNT() IN (1, 2) AND w_rank = 1
ORDER BY a;
this will return result:
| a |count(*)|sum(2)|avg(1)|w_rank|
+----+--------+------+------+------+
| 1 | 1| 2| 1.0| 1|
For another example as following:
select 1 c1, 2 union (select "hell0", "") order by c1
the second column datatype will be varchar(65533), 65533 is the default varchar length.
this will return result:
|c1 | 2 |
+------+---+
|1 | 2 |
|hell0 | |