Commit Graph

5755 Commits

Author SHA1 Message Date
071be928f9 [fix](vectorized) fix bug multi distinct function get wrong type (#7900) 2022-01-28 22:31:41 +08:00
3a7bb7e144 [improvement](fe-meta-version)Some if conditions do not use the FeMetaVersion constant (#7879) 2022-01-28 22:25:17 +08:00
f93ac89a67 [fix](lateral-view) fix bugs of lateral view with CTE or where clause (#7865)
fix bugs of lateral view with CTE or where clause.
The error case can be found in newly added tests in `TableFunctionPlanTest.java`
But there are still some bugs not being fixed, so the unit test is annotated with @Ignore

This PR contains the change is #7824 :

> Issue Number: close #7823
> 
> After the subquery is rewritten, the rewritten stmt needs to be reset
> (that is, the content of the first analyze semantic analysis is cleared),
> and then the rewritten stmt can be reAnalyzed.
> 
> The lateral view ref in the previous implementation forgot to implement the reset function.
> This caused him to keep the first error message in the second analyze.
> Eventually, two duplicate tupleIds appear in the new stmt and are marked with different tuple.
> From the explain string, the following syntax will have an additional wrong join predicate.
> ```
> Query: explain select k1 from test_explode lateral view explode_split(k2, ",") tmp as e1  where k1 in (select k3 from tbl1);
> Error equal join conjunct: `k3` = `k3`
> ```
> 
> This pr mainly adds the reset function of the lateral view
> to avoid possible errors in the second analyze
> when the lateral view and subquery rewrite occur at the same time.
2022-01-28 22:24:23 +08:00
22830ea498 [feature](show) add new statement show proc '/current_query_stmts' (#7487)
To show the the query statement at first level.
2022-01-28 22:23:13 +08:00
dee79d98a8 [improvement](explain) Displays cast information with implicit conversions in verbose (#7851)
Displays cast information with implicit conversions in verbose.
2022-01-27 10:37:38 +08:00
d2386dd85d [improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql (#7876) 2022-01-27 10:32:18 +08:00
d69b7bff2e [feature](meta) Support show compactionTooSlowTablets and oversizeTablets (#7821)
Add more columns in `show proc "/statistic"`
2022-01-27 10:26:41 +08:00
3b8d48f08b [feature-wip](iceberg) Step1: Support create Iceberg external table (#7391)
Close related #7389

Support create Iceberg external table in Doris. 

This is the first step to support Iceberg external table.

### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.

1. Create an Iceberg external table directly

```sql
    CREATE [EXTERNAL] TABLE table_name 
    ENGINE = ICEBERG
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.table" = "icberg_table_name",
    "iceberg.hive.metastore.uris"  =  "thrift://192.168.0.1:9083",
    "iceberg.catalog.type"  =  "HIVE_CATALOG"
    );
```

2. Create an Iceberg database and automatically create all the tables under that db.

```sql
    CREATE DATABASE db_name 
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
```

### Show table creation

1. For individual tables you can view them with `help show create table`.

```sql 
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                 |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
  `level` varchar(-1) NOT NULL COMMENT "null",
  `event_time` datetime NOT NULL COMMENT "null",
  `message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris"  =  "thrift://10.10.10.10:9087",
"iceberg.catalog.type"  =  "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

2. For Iceberg database, you can view it with `help show table creation`.

```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table  | Status  | Create Time         | Error Msg                                               |
+--------+---------+---------------------+---------------------------------------------------------+
| logs   | fail    | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 |                                                         |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```

  This is a new syntax.
  
  Show table creation records in Iceberg database:
  
  Syntax:
  ```sql
      SHOW TABLE CREATION [FROM db] [LIKE mask]
  ```
2022-01-27 10:22:47 +08:00
015371ac72 [fix](grouping-set) Fix the bug of grouping set core in both vec and non vec query engine (#7800) 2022-01-26 16:15:30 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
b435a54304 [fix] Consider backend status when more than one backends exists in same host (#7784) 2022-01-26 09:10:34 +08:00
461b352d3e [fix](function) Change digital_masking function arg type to BIGINT (#7888)
Change digital_masking function arg type to BIGINT to fix the wrong result.
2022-01-25 22:28:05 +08:00
ee0037e1af [fix](ldap) fix ldap password logic (#7862)
1. write ldap info to image;
2. optimizing LdapClient class thread safety.
2022-01-25 09:59:24 +08:00
8aa9faa7cb [chore](docker) Add docker dev image with ldb-toolchain (#7838)
Add docker images `apache/incubator-doris:build-env-ldb-toolchain-latest`,
which is built with ldb-toolchain
2022-01-24 21:12:15 +08:00
be9ebbc14d [fix] Fix bug that wrong exception message returned by insert statement (#7832)
`insert` statement may return exception message `Execute timeout` after load data failed.
But the real reason is that there exists unhealthy backend, not execute timeout.

```
MySQL [ssb]> insert into lineorder_flat select * from lineorder_flat;
ERROR 1105 (HY000): errCode = 2, detailMessage = Execute timeout
```
2022-01-24 21:11:45 +08:00
60c6bb4f92 [Feature][flink-connector] support flink delete option (#7457)
* Flink Connector supports delete option on Unique models
Co-authored-by: wudi <wud3@shuhaisc.com>
2022-01-23 20:24:41 +08:00
d1b1723c74 [fix](export) fix export failed when table has hidden columns (#7813)
fix export failed when table has hidden columns
2022-01-22 10:19:15 +08:00
ed39ff1500 [feature](compaction) Support triggering compaction for a specific partition manually (#7521)
Add statement to trigger cumulative or base compaction for a specified partition.
2022-01-21 09:27:06 +08:00
ef984a6a72 [improvement](load) Improve load fault tolerance (#7674)
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.

This PR mainly changes:

1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
2022-01-20 09:23:21 +08:00
5fc0a9f40d [improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319)
This PR mainly changes:

1. Help to Cancel the load job ASAP when encounter unqualified data.
    Solution is described in #6318 .
    Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues.

2. fix a NPE bug when create user with empty host
3. fix compile warning after rebasing the master(vectorization)
2022-01-18 13:13:55 +08:00
efb4e189df [fix](lateral-view) Fix some lateral view bugs (#7772)
1. Fix bug that BE may crash when input node of TableFunctionNode has non-null column
2. Fix bug that TableFunctionNode may not return all results
2022-01-18 12:09:32 +08:00
3494c8973b [improvement](colocation) Add a new config to delay the relocation of colocation group (#7656)
1. Add a new FE config `colocate_group_relocate_delay_second`

    The relocation of a colocation group may involve a large number of tablets moving within the cluster.
    Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible.
    Relocation usually occurs after a BE node goes offline or goes down.
    This config is used to delay the determination of BE node unavailability.
    The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group
    will not be triggered.

2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL

3. Add a new FE config allow_replica_on_same_host

    If set to true, when creating table, Doris will allow to locate replicas of a tablet
    on same host. And also the tablet repair and balance will be disabled.
    This is only for local test, so that we can deploy multi BE on same host and create table
    with multi replicas.
2022-01-18 10:26:36 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
5c7863c683 [improvement](fe-unit-test) Fix port in use when the cluster starts in UT. (#7768) 2022-01-16 10:42:56 +08:00
88a3d08fee [fix] fix NPE in SysVariableDesc::equal (#7766) 2022-01-16 10:42:24 +08:00
8b7d7e4dac [improvement] create/drop index support if [not] exist (#7748)
create or drop index clause support if [not] exist
2022-01-16 10:40:44 +08:00
4a3cbf52e3 [fix](show-load) fix show load with the same column name in Where Clause (#7523) 2022-01-15 09:54:43 +08:00
fe80d1417f [style] replace Chinese comments with English comments (#7732) 2022-01-14 09:35:06 +08:00
902ab93043 [fix](session-variable) fix bug that checkpoint may overwrite the global variables (#7526)
We should create temporary object for some static fields when doing checkpoint,
to avoid there variables to be overwritten by the checkpoint process.
2022-01-14 09:25:10 +08:00
d1a994eff9 [fix](cpu-resource)(resource-tag) Allow set cpu_resource_limit to -1 and fix resource tag bug(#6830)
1. Allow set cpu_resource_limit

    -1 means unlimited

2. Drop replica not in valid tag

    Otherwise, the migration task from a resource group to another may never finish.
2022-01-13 23:11:37 +08:00
5e1caea2b1 [fix](lateral-view) Fix some bugs about lateral view (#7721)
1.  fix core dump when using multi explode_bitmap #7716 
2. fix bug that json array extract by json path is wrong #7717 
3. fix bug that after lateral view, the null value become non-null value #7718 
4. fix bug that lateral view may return error: couldn't resolve slot descriptor 1. #7719 
5. fix error result when using lateral view with where predicate #7720
2022-01-13 15:30:38 +08:00
8ac32041e4 [fix](show) fix ConcurrentModificationException for show proc '/current_queries' (#7707) 2022-01-13 15:28:19 +08:00
4ac8b3c9a9 [fix][s3] Fix bug that can not visit aliyun oss with aws s3 sdk (#7691)
Close #7690

1. Exclude httpclient and httpcore dependencies from thrift@0.13

    Explicitly use httpclient@4.5.13 and httpcore@4.4.15
    https://stackoverflow.com/questions/59265959/java-lang-bootstrapmethoderror-call-site-initialization-exception-from-athena-j

2. Exclude aws-java-sdk-s3 dependency from hadoop-aws

    Explicitly use aws-java-sdk-s3@1.11.95
    https://github.com/aws/aws-sdk-java/issues/1032
2022-01-11 15:00:31 +08:00
ff4284f3fa [feature](hint)(mysql-compatibility) Support general hints in select statement (#7664)
Support general hints.

Sql example:

```sql
SELECT /*+ one_hint(1000000) another_hint(k = "v")*/ 1;
```

hints syntax is:

```
/*+ [ HINT_NAME( [ key [ =value ]? ]* ) ]+ */
```

- support multi hints, sep with space
- hint name could be any string in identifier format
- hint could have zero or more parameters, sep with comma
- hint parameter must have one key
- hint parameter could have zero or one value
- hint parameter‘s key and value connected by equal sign
2022-01-09 16:59:08 +08:00
ad35067a2a [chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616) 2022-01-06 23:23:33 +08:00
482bf05da7 [refactor](log) remove RewriteClasses unused LOG reference (#7609) 2022-01-06 23:22:09 +08:00
90aa6c8a72 [fix](syntax) Add STRUCT to keywords (#7606) 2022-01-06 23:20:20 +08:00
831f4cd71e [improvement](website)(proc) Make web page base proc dir and variables orderly (#7535) 2022-01-06 23:16:50 +08:00
563545475e [Optimize](Runtime Filter) Support merge in runtime filter(#7546) (#7547)
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist  remote target.

Close related #7546
2022-01-06 19:08:35 +08:00
e1374d8536 [fix](tablet-scheduler) Fix decommission backend bug (#7563)
Fix bug that decommission backend operation blocked with error:
`no proper tag is chose for tablet.`
2022-01-06 00:08:06 +08:00
738d2d2e07 [refactor] update parent pom version and optimize build scripts (#7548) 2022-01-05 10:45:11 +08:00
9ddcf0625c [improvement](load) Transaction for load job with no data for all partitions should be considered as normal and should not be aborted (#7240)
If the load result set is empty, or the load data is all filtered by the `where` condition,
it will not return failed with msg `all partitions have no load data`, but will return success directly.
2022-01-05 10:38:33 +08:00
5c104ec2d1 [Improvement] use "storage_cooldown_seconds" property when storage medium is SSD (#7532)
Refer to this issue #7528

When setting property `default_storage_medium=ssd` and `storage_cooldown_second=xxx` in `fe.conf`
`cooldownTime=System.currentTimeMillis()+ storage_cooldown_second` , not always `MAX_COOLDOWN_TIME_MS`
2022-01-04 10:32:57 +08:00
bf4a867e85 [improvement](tablet-repair) add a config repair_slow_replica (#7423)
Add a new FE config `repair_slow_replica`    
when this config is true, Doris will try to delete the replica
with the largest number of versions, and then rebalance the replica.
Usually, when the number of versions of a certain replica is much higher
then that of other replicas, there are some problems with the current be's compilation.
Migrating to other machines can typically solve this problem.
2022-01-04 10:28:14 +08:00
6657524c51 [feature](sql-block-rule) add partition_num, tablet_num, cardinality in SqlBlockRule to block big/slow sql (#7403)
Add partitionNum, tabletNum, cardinality in SqlBlockRule to block large/slow sql.

1. set partitionNum, tabletNum, cardinality as limitations to block sqls
2. compatible with lower version
3. add unit tests
4. add docs
2022-01-04 09:59:41 +08:00
d457ab3122 [imporvement] remove unused method from AggregateFunction (#7496) 2021-12-31 16:35:23 +08:00
d6cc3fdf03 [fix](materialized-view) forbidden create materialized view with distinct (#7494) 2021-12-31 16:08:37 +08:00
7903e6491a [Bug](partition pruning v2) Fix NPE when calling Analyzer.getContext() in partition pruning related logic. (#7542)
The partition pruning v2 use connection context in `OlapScanNode`. 
Before this PR, NPE would occur when running SQL without ConnectContext such as export, load.
For example:
```
EXPORT TABLE t TO "file:///home/data/export.txt"
```
2021-12-30 17:03:14 +08:00
723ee84a66 [feature] (planner) InferPredicate (#7096)
This pr is for #7096 , which is add a rewrite rule for infer predicate.

For example:
origin stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1
rewrite stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1 and t1.id=1 and t3.id=1

+ Add a switch enable_infer_predicate to control whether to perform predicate expansion.
+ Register a new rule InferFiltersrule and add it to GlobalState.
+ Traverse Conjunct to construct on/where equivalence connection, numerical connection and isNullPredicate.
+ Infer all equivalence connections
+ Construct additional numerical connections and isNullPredicate
2021-12-30 13:24:30 +08:00
8da2e8b91b [fix](cache) Int overflow causes the wrong latest table to be obtained (#7533) 2021-12-30 11:16:50 +08:00