Commit Graph

5948 Commits

Author SHA1 Message Date
0efef1b332 [fix](schema-change) Fix bug that schema change may return -102 error (#7808)
When using linked schema change, we need to check if all rowsets are of the same type,
ALPHA or BETA. otherwise, we need to use direct schema change to convert the data.
2022-01-21 10:59:54 +08:00
ed39ff1500 [feature](compaction) Support triggering compaction for a specific partition manually (#7521)
Add statement to trigger cumulative or base compaction for a specified partition.
2022-01-21 09:27:06 +08:00
4c17c370e7 [Doc]Modify the audit log plugin to record the SQL statement field type as String (#7798)
Modify the audit log plugin to record the SQL statement field type as String
2022-01-20 21:36:02 +08:00
4768dd4efa [docs] Documentation corrections (#7787) 2022-01-20 16:18:48 +08:00
ef984a6a72 [improvement](load) Improve load fault tolerance (#7674)
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.

This PR mainly changes:

1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
2022-01-20 09:23:21 +08:00
7574d39d14 [fix](bitmap-index) Fix bug that bitmap index may return wrong result. (#7788)
Fix the following bugs.

1. `column1` created a bitmap index.
2. `column1` has a lot index items in the bitmap index, and the index page is divided into two levels.
3. `column1`'s value range is `[1000, 10000000]`.
4. the query condition is `column1 > 0`
5. the empty result will be returned, while the expected value should be 9999000 rows.
2022-01-19 12:27:08 +08:00
aacbc960c8 [fix][chore](thrift) Fix warning when generate cpp code by thrift IDL file and use strict mode (#7773) 2022-01-19 12:26:44 +08:00
5fc0a9f40d [improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319)
This PR mainly changes:

1. Help to Cancel the load job ASAP when encounter unqualified data.
    Solution is described in #6318 .
    Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues.

2. fix a NPE bug when create user with empty host
3. fix compile warning after rebasing the master(vectorization)
2022-01-18 13:13:55 +08:00
efb4e189df [fix](lateral-view) Fix some lateral view bugs (#7772)
1. Fix bug that BE may crash when input node of TableFunctionNode has non-null column
2. Fix bug that TableFunctionNode may not return all results
2022-01-18 12:09:32 +08:00
3494c8973b [improvement](colocation) Add a new config to delay the relocation of colocation group (#7656)
1. Add a new FE config `colocate_group_relocate_delay_second`

    The relocation of a colocation group may involve a large number of tablets moving within the cluster.
    Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible.
    Relocation usually occurs after a BE node goes offline or goes down.
    This config is used to delay the determination of BE node unavailability.
    The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group
    will not be triggered.

2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL

3. Add a new FE config allow_replica_on_same_host

    If set to true, when creating table, Doris will allow to locate replicas of a tablet
    on same host. And also the tablet repair and balance will be disabled.
    This is only for local test, so that we can deploy multi BE on same host and create table
    with multi replicas.
2022-01-18 10:26:36 +08:00
946fa2960d [improvement](broker) add some properties that can be set in the broker conf file (#7499) 2022-01-18 10:24:54 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
ebc27a40d7 [docs] Split the FAQ And Revert auto-label action (#7770) 2022-01-17 10:34:56 +08:00
e80c34b6fe [docs][typo] fix some typos in documents (#7769) 2022-01-16 10:43:42 +08:00
5c7863c683 [improvement](fe-unit-test) Fix port in use when the cluster starts in UT. (#7768) 2022-01-16 10:42:56 +08:00
88a3d08fee [fix] fix NPE in SysVariableDesc::equal (#7766) 2022-01-16 10:42:24 +08:00
36d6d236ad [refactor] remove duplicate if that will never be used (#7761) 2022-01-16 10:41:59 +08:00
5f8d91257b [improvement](routine-load) Reduce the probability that the routine load task rpc timeout (#7754)
If an load task has a relatively short timeout, then we need to ensure that
each RPC of this task does not get blocked for a long time.
And an RPC is usually blocked for two reasons.

1. handling "memory exceeds limit" in the RPC
    
    If the system finds that the memory occupied by the load exceeds the threshold,
    it will select the load channel that occupies the most memory and flush the memtable in it.
    this operation is done in the RPC, which may be more time consuming.

2. close the load channel

    When the load channel receives the last batch, it will end the task.
    It will wait for all memtables flushes to finish synchronously. This process is also time consuming.

Therefore, this PR solves this problem by.

1. Use timeout to determine whether it is a high-priority load task

    If the timeout of an load task is relatively short, then we mark it as a high-priority task.

2. not processing "memory exceeds limit" for high priority tasks
3. use a separate flush thread to flush memtable for high priority tasks.
2022-01-16 10:41:31 +08:00
8b7d7e4dac [improvement] create/drop index support if [not] exist (#7748)
create or drop index clause support if [not] exist
2022-01-16 10:40:44 +08:00
5b0f11b665 [feature](mysql-compatibility)(function) add WEEKDAY function (#7673)
`WEEKDAY` in MySQL: returns an index from 0 to 6 for Monday to Sunday.
`DAYOFWEEK` in MySQL: returns an index from 1 to 7 for Sunday to Saturday.

Doris only have `DAYOFWEEK` function, so I add `WEEKDAY` function.

Thanks for the following materials:
- https://github.com/apache/incubator-doris/pull/6982/files
- https://www.bilibili.com/video/BV1V44y1Y7Ro
2022-01-16 10:39:21 +08:00
4a3cbf52e3 [fix](show-load) fix show load with the same column name in Where Clause (#7523) 2022-01-15 09:54:43 +08:00
be43316f20 [docs] add doc for community feedback and fix CI (#7759)
add doc for community feedback and fix CI
2022-01-14 22:19:28 +08:00
a6ff1bd79e Flink / Spark connector compilation problem (#7725)
Flink / Spark connector compilation problem
2022-01-14 22:14:48 +08:00
e7d65e488c [style] translate code annotations into english (#7752)
Translate Chinese code comments into English,the following files has been modified:
1. be/src/olap/row_cursor.h
2. be/src/olap/compress.h
2022-01-14 09:37:46 +08:00
5c4055ac3a [style] Translate Chinese to English in be_olap_field.h (#7738) 2022-01-14 09:36:58 +08:00
fe80d1417f [style] replace Chinese comments with English comments (#7732) 2022-01-14 09:35:06 +08:00
f3817829bb [fix] fix malloc and free mismatch issue (#7702)
The memory allocate by `malloc` should be freed by `free`
2022-01-14 09:32:33 +08:00
6188ab20df [docs](faq) add multiple FE WEB UI login issues (#7654) 2022-01-14 09:26:39 +08:00
902ab93043 [fix](session-variable) fix bug that checkpoint may overwrite the global variables (#7526)
We should create temporary object for some static fields when doing checkpoint,
to avoid there variables to be overwritten by the checkpoint process.
2022-01-14 09:25:10 +08:00
d03151bda2 [chore](be) Add -Werror (#7744)
All Warning will be treated as Error when compiling BE
2022-01-14 09:21:57 +08:00
10709f315a [fix](github-action) fix the action of set-label-based-on-pr-title (#7758) 2022-01-14 09:20:42 +08:00
3da4425af5 [fix](github-action) fix the action of set-label-based-on-pr-title (#7757) 2022-01-13 23:35:00 +08:00
d1a994eff9 [fix](cpu-resource)(resource-tag) Allow set cpu_resource_limit to -1 and fix resource tag bug(#6830)
1. Allow set cpu_resource_limit

    -1 means unlimited

2. Drop replica not in valid tag

    Otherwise, the migration task from a resource group to another may never finish.
2022-01-13 23:11:37 +08:00
b51121fe86 [chore](github-action) Add label auto for pull requests (#7663) 2022-01-13 20:07:16 +08:00
ccb6c6ac2e [docs] update seatunnel.md (#7731)
correct some wrongly written words and update document format
2022-01-13 15:31:17 +08:00
5e1caea2b1 [fix](lateral-view) Fix some bugs about lateral view (#7721)
1.  fix core dump when using multi explode_bitmap #7716 
2. fix bug that json array extract by json path is wrong #7717 
3. fix bug that after lateral view, the null value become non-null value #7718 
4. fix bug that lateral view may return error: couldn't resolve slot descriptor 1. #7719 
5. fix error result when using lateral view with where predicate #7720
2022-01-13 15:30:38 +08:00
8ac32041e4 [fix](show) fix ConcurrentModificationException for show proc '/current_queries' (#7707) 2022-01-13 15:28:19 +08:00
db2649525f [docs](website) Add Database ODBC version correspondence (#7675) 2022-01-13 15:28:02 +08:00
a034c20d16 [fix](website) Add trademarks footer on official website (#7696) 2022-01-11 15:07:56 +08:00
2de79832fc [docs](hive)(function) fix Hive type error and optimize alias function example (#7694)
1. fix Hive type error 
2. optimize alias function example
2022-01-11 15:07:32 +08:00
1b2acb6acd [docs] update the document format (#7689) 2022-01-11 15:06:42 +08:00
2cf574dc01 [docs] Improve instructions for the configuration of BE. (#7620) 2022-01-11 15:06:05 +08:00
8685b6b985 [improvement](executor) Optimize lock of client cache (#7543) 2022-01-11 15:05:24 +08:00
6864a376ca [improvement](spark-connector) Throw an exception when the data push fails and there are too many retries (#7531) 2022-01-11 15:03:06 +08:00
d4188877f1 [comminity](github) Polish PR template (#7638)
Improving PR templates.
2022-01-11 15:01:50 +08:00
4ac8b3c9a9 [fix][s3] Fix bug that can not visit aliyun oss with aws s3 sdk (#7691)
Close #7690

1. Exclude httpclient and httpcore dependencies from thrift@0.13

    Explicitly use httpclient@4.5.13 and httpcore@4.4.15
    https://stackoverflow.com/questions/59265959/java-lang-bootstrapmethoderror-call-site-initialization-exception-from-athena-j

2. Exclude aws-java-sdk-s3 dependency from hadoop-aws

    Explicitly use aws-java-sdk-s3@1.11.95
    https://github.com/aws/aws-sdk-java/issues/1032
2022-01-11 15:00:31 +08:00
83f6eef506 [improvement](routine-load) Make routine load work with old kafka version (#7630)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-01-10 17:30:24 +08:00
68c87de69e [fix](website) fix CaseList component bug (#7683) 2022-01-10 14:46:05 +08:00
ff4284f3fa [feature](hint)(mysql-compatibility) Support general hints in select statement (#7664)
Support general hints.

Sql example:

```sql
SELECT /*+ one_hint(1000000) another_hint(k = "v")*/ 1;
```

hints syntax is:

```
/*+ [ HINT_NAME( [ key [ =value ]? ]* ) ]+ */
```

- support multi hints, sep with space
- hint name could be any string in identifier format
- hint could have zero or more parameters, sep with comma
- hint parameter must have one key
- hint parameter could have zero or one value
- hint parameter‘s key and value connected by equal sign
2022-01-09 16:59:08 +08:00
7254bcc8ca [refactor](spark-connector) delete useless maven dependencies and some code variable definition issues (#7655) 2022-01-09 16:58:16 +08:00