fix bugs of lateral view with CTE or where clause.
The error case can be found in newly added tests in `TableFunctionPlanTest.java`
But there are still some bugs not being fixed, so the unit test is annotated with @Ignore
This PR contains the change is #7824 :
> Issue Number: close#7823
>
> After the subquery is rewritten, the rewritten stmt needs to be reset
> (that is, the content of the first analyze semantic analysis is cleared),
> and then the rewritten stmt can be reAnalyzed.
>
> The lateral view ref in the previous implementation forgot to implement the reset function.
> This caused him to keep the first error message in the second analyze.
> Eventually, two duplicate tupleIds appear in the new stmt and are marked with different tuple.
> From the explain string, the following syntax will have an additional wrong join predicate.
> ```
> Query: explain select k1 from test_explode lateral view explode_split(k2, ",") tmp as e1 where k1 in (select k3 from tbl1);
> Error equal join conjunct: `k3` = `k3`
> ```
>
> This pr mainly adds the reset function of the lateral view
> to avoid possible errors in the second analyze
> when the lateral view and subquery rewrite occur at the same time.
Close related #7389
Support create Iceberg external table in Doris.
This is the first step to support Iceberg external table.
### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.
1. Create an Iceberg external table directly
```sql
CREATE [EXTERNAL] TABLE table_name
ENGINE = ICEBERG
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "icberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
2. Create an Iceberg database and automatically create all the tables under that db.
```sql
CREATE DATABASE db_name
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
### Show table creation
1. For individual tables you can view them with `help show create table`.
```sql
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
`level` varchar(-1) NOT NULL COMMENT "null",
`event_time` datetime NOT NULL COMMENT "null",
`message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris" = "thrift://10.10.10.10:9087",
"iceberg.catalog.type" = "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2. For Iceberg database, you can view it with `help show table creation`.
```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table | Status | Create Time | Error Msg |
+--------+---------+---------------------+---------------------------------------------------------+
| logs | fail | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 | |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```
This is a new syntax.
Show table creation records in Iceberg database:
Syntax:
```sql
SHOW TABLE CREATION [FROM db] [LIKE mask]
```
`insert` statement may return exception message `Execute timeout` after load data failed.
But the real reason is that there exists unhealthy backend, not execute timeout.
```
MySQL [ssb]> insert into lineorder_flat select * from lineorder_flat;
ERROR 1105 (HY000): errCode = 2, detailMessage = Execute timeout
```
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.
This PR mainly changes:
1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
This PR mainly changes:
1. Help to Cancel the load job ASAP when encounter unqualified data.
Solution is described in #6318 .
Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues.
2. fix a NPE bug when create user with empty host
3. fix compile warning after rebasing the master(vectorization)
1. Add a new FE config `colocate_group_relocate_delay_second`
The relocation of a colocation group may involve a large number of tablets moving within the cluster.
Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible.
Relocation usually occurs after a BE node goes offline or goes down.
This config is used to delay the determination of BE node unavailability.
The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group
will not be triggered.
2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL
3. Add a new FE config allow_replica_on_same_host
If set to true, when creating table, Doris will allow to locate replicas of a tablet
on same host. And also the tablet repair and balance will be disabled.
This is only for local test, so that we can deploy multi BE on same host and create table
with multi replicas.
# Proposed changes
Issue Number: close#6238
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
Co-authored-by: wangbo <506340561@qq.com>
Co-authored-by: emmymiao87 <522274284@qq.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: thinker <zchw100@qq.com>
Co-authored-by: Zeno Yang <1521564989@qq.com>
Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
Co-authored-by: xinghuayu007 <1450306854@qq.com>
Co-authored-by: weizuo93 <weizuo@apache.org>
Co-authored-by: yiguolei <guoleiyi@tencent.com>
Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
Co-authored-by: awakeljw <993007281@qq.com>
Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>
## Problem Summary:
### 1. Some code from clickhouse
**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**
The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris
### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node
You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.
### 3. Data Model
Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.
### 4. How to use
1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)
### 5. Some diff from origin exec engine
https://github.com/doris-vectorized/doris-vectorized/issues/294
## Checklist(Required)
1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
1. Allow set cpu_resource_limit
-1 means unlimited
2. Drop replica not in valid tag
Otherwise, the migration task from a resource group to another may never finish.
1. fix core dump when using multi explode_bitmap #7716
2. fix bug that json array extract by json path is wrong #7717
3. fix bug that after lateral view, the null value become non-null value #7718
4. fix bug that lateral view may return error: couldn't resolve slot descriptor 1. #7719
5. fix error result when using lateral view with where predicate #7720
Support general hints.
Sql example:
```sql
SELECT /*+ one_hint(1000000) another_hint(k = "v")*/ 1;
```
hints syntax is:
```
/*+ [ HINT_NAME( [ key [ =value ]? ]* ) ]+ */
```
- support multi hints, sep with space
- hint name could be any string in identifier format
- hint could have zero or more parameters, sep with comma
- hint parameter must have one key
- hint parameter could have zero or one value
- hint parameter‘s key and value connected by equal sign
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target.
Close related #7546
If the load result set is empty, or the load data is all filtered by the `where` condition,
it will not return failed with msg `all partitions have no load data`, but will return success directly.
Refer to this issue #7528
When setting property `default_storage_medium=ssd` and `storage_cooldown_second=xxx` in `fe.conf`
`cooldownTime=System.currentTimeMillis()+ storage_cooldown_second` , not always `MAX_COOLDOWN_TIME_MS`
Add a new FE config `repair_slow_replica`
when this config is true, Doris will try to delete the replica
with the largest number of versions, and then rebalance the replica.
Usually, when the number of versions of a certain replica is much higher
then that of other replicas, there are some problems with the current be's compilation.
Migrating to other machines can typically solve this problem.
Add partitionNum, tabletNum, cardinality in SqlBlockRule to block large/slow sql.
1. set partitionNum, tabletNum, cardinality as limitations to block sqls
2. compatible with lower version
3. add unit tests
4. add docs
The partition pruning v2 use connection context in `OlapScanNode`.
Before this PR, NPE would occur when running SQL without ConnectContext such as export, load.
For example:
```
EXPORT TABLE t TO "file:///home/data/export.txt"
```
This pr is for #7096 , which is add a rewrite rule for infer predicate.
For example:
origin stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1
rewrite stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1 and t1.id=1 and t3.id=1
+ Add a switch enable_infer_predicate to control whether to perform predicate expansion.
+ Register a new rule InferFiltersrule and add it to GlobalState.
+ Traverse Conjunct to construct on/where equivalence connection, numerical connection and isNullPredicate.
+ Infer all equivalence connections
+ Construct additional numerical connections and isNullPredicate