Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER
The processing logic is
If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work
If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect
Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER
* [improvement](show) Support that user can use show data skew statement instead of admin
This PR mainly do two things:
1. Support that user can use show data skew statement instead of admin
2. Fix fe ut failed caused by pr [improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql #7876 and pr [feature-wip](iceberg) Step1: Support create Iceberg external table #7391
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
Close related #7389
Support create Iceberg external table in Doris.
This is the first step to support Iceberg external table.
### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.
1. Create an Iceberg external table directly
```sql
CREATE [EXTERNAL] TABLE table_name
ENGINE = ICEBERG
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "icberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
2. Create an Iceberg database and automatically create all the tables under that db.
```sql
CREATE DATABASE db_name
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
### Show table creation
1. For individual tables you can view them with `help show create table`.
```sql
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
`level` varchar(-1) NOT NULL COMMENT "null",
`event_time` datetime NOT NULL COMMENT "null",
`message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris" = "thrift://10.10.10.10:9087",
"iceberg.catalog.type" = "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2. For Iceberg database, you can view it with `help show table creation`.
```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table | Status | Create Time | Error Msg |
+--------+---------+---------------------+---------------------------------------------------------+
| logs | fail | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 | |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```
This is a new syntax.
Show table creation records in Iceberg database:
Syntax:
```sql
SHOW TABLE CREATION [FROM db] [LIKE mask]
```
1. Add a new FE config `colocate_group_relocate_delay_second`
The relocation of a colocation group may involve a large number of tablets moving within the cluster.
Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible.
Relocation usually occurs after a BE node goes offline or goes down.
This config is used to delay the determination of BE node unavailability.
The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group
will not be triggered.
2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL
3. Add a new FE config allow_replica_on_same_host
If set to true, when creating table, Doris will allow to locate replicas of a tablet
on same host. And also the tablet repair and balance will be disabled.
This is only for local test, so that we can deploy multi BE on same host and create table
with multi replicas.
# Proposed changes
Issue Number: close#6238
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
Co-authored-by: wangbo <506340561@qq.com>
Co-authored-by: emmymiao87 <522274284@qq.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: thinker <zchw100@qq.com>
Co-authored-by: Zeno Yang <1521564989@qq.com>
Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
Co-authored-by: xinghuayu007 <1450306854@qq.com>
Co-authored-by: weizuo93 <weizuo@apache.org>
Co-authored-by: yiguolei <guoleiyi@tencent.com>
Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
Co-authored-by: awakeljw <993007281@qq.com>
Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>
## Problem Summary:
### 1. Some code from clickhouse
**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**
The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris
### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node
You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.
### 3. Data Model
Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.
### 4. How to use
1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)
### 5. Some diff from origin exec engine
https://github.com/doris-vectorized/doris-vectorized/issues/294
## Checklist(Required)
1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
If an load task has a relatively short timeout, then we need to ensure that
each RPC of this task does not get blocked for a long time.
And an RPC is usually blocked for two reasons.
1. handling "memory exceeds limit" in the RPC
If the system finds that the memory occupied by the load exceeds the threshold,
it will select the load channel that occupies the most memory and flush the memtable in it.
this operation is done in the RPC, which may be more time consuming.
2. close the load channel
When the load channel receives the last batch, it will end the task.
It will wait for all memtables flushes to finish synchronously. This process is also time consuming.
Therefore, this PR solves this problem by.
1. Use timeout to determine whether it is a high-priority load task
If the timeout of an load task is relatively short, then we mark it as a high-priority task.
2. not processing "memory exceeds limit" for high priority tasks
3. use a separate flush thread to flush memtable for high priority tasks.
* 1.Remove import doris via csv in Dataxwriter, only support via json;
2.Format Dataxwriter code;
3.Optimize exception handling and reduce multiple output of exception logs;
4.Update the dataxwriter's documentation;
* Delete DorisCsvCodec.java
delete unused file extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java
* 1.remove `format` config key;
2.Optimize serialization code in DorisJsonCodec class
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target.
Close related #7546
Add a new FE config `repair_slow_replica`
when this config is true, Doris will try to delete the replica
with the largest number of versions, and then rebalance the replica.
Usually, when the number of versions of a certain replica is much higher
then that of other replicas, there are some problems with the current be's compilation.
Migrating to other machines can typically solve this problem.
Add partitionNum, tabletNum, cardinality in SqlBlockRule to block large/slow sql.
1. set partitionNum, tabletNum, cardinality as limitations to block sqls
2. compatible with lower version
3. add unit tests
4. add docs