Commit Graph

13073 Commits

Author SHA1 Message Date
f227472db2 [chore] fix error while compiling with -O3 (#7890) 2022-01-26 12:53:56 +08:00
Pxl
cd73a6b84b [chore] fix clang compile error (#7883) 2022-01-26 12:53:35 +08:00
e3bc232578 [docs] OS Installation Requirements (#7830) 2022-01-26 09:11:41 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
b435a54304 [fix] Consider backend status when more than one backends exists in same host (#7784) 2022-01-26 09:10:34 +08:00
461b352d3e [fix](function) Change digital_masking function arg type to BIGINT (#7888)
Change digital_masking function arg type to BIGINT to fix the wrong result.
2022-01-25 22:28:05 +08:00
a6831535e9 [Vectorized][Bug] fix bug of coalesce function (#7827) 2022-01-25 20:44:16 +08:00
ee0037e1af [fix](ldap) fix ldap password logic (#7862)
1. write ldap info to image;
2. optimizing LdapClient class thread safety.
2022-01-25 09:59:24 +08:00
40f993ca15 [docs][seatunnel] add seatunnel flink doris sink doc (#7844) 2022-01-24 21:13:06 +08:00
8aa9faa7cb [chore](docker) Add docker dev image with ldb-toolchain (#7838)
Add docker images `apache/incubator-doris:build-env-ldb-toolchain-latest`,
which is built with ldb-toolchain
2022-01-24 21:12:15 +08:00
be9ebbc14d [fix] Fix bug that wrong exception message returned by insert statement (#7832)
`insert` statement may return exception message `Execute timeout` after load data failed.
But the real reason is that there exists unhealthy backend, not execute timeout.

```
MySQL [ssb]> insert into lineorder_flat select * from lineorder_flat;
ERROR 1105 (HY000): errCode = 2, detailMessage = Execute timeout
```
2022-01-24 21:11:45 +08:00
4e9bc5cb65 [doc] add documents for bitwise functions (#7790) 2022-01-24 21:08:41 +08:00
86f34323a8 [Doc]Doris compile and install JDK version incompatibility problem (#7797)
* Doris compile and install JDK version incompatibility problem
2022-01-24 13:57:18 +08:00
ca0fac0722 [chore](ldb-toolchain) Support ldb_toolchain on ubuntu 20 (#7846) 2022-01-24 13:16:37 +08:00
d7b40c3136 [Doc]wrong directory structure (#7842)
wrong directory structure
2022-01-23 23:22:52 +08:00
60c6bb4f92 [Feature][flink-connector] support flink delete option (#7457)
* Flink Connector supports delete option on Unique models
Co-authored-by: wudi <wud3@shuhaisc.com>
2022-01-23 20:24:41 +08:00
c2520c878c [Improvement](Vectorized) optimize SegmentIterator predication evaluate (#7795)
* [Improvement](Vectorized) optimize SegmentIterator predication evaluate

* fix bug

* move bytes32_mask_to_bits32_mask to util/simd/bits.h
2022-01-22 15:31:07 +08:00
c5ec6dbc51 [docs](install-deploy) fix misplaced whitespace(#7814) (#7816)
Misplaced whitespace causes unexpected output. Fix it so newcomers need not to
worry if they are lost by the docs.
2022-01-22 10:20:24 +08:00
d1b1723c74 [fix](export) fix export failed when table has hidden columns (#7813)
fix export failed when table has hidden columns
2022-01-22 10:19:15 +08:00
1c711705d7 [chore] Use ccache to speed recompiling test code up. (#7811) 2022-01-22 10:18:52 +08:00
cf02e43ec1 [improvement](vectorized) optimize dict read (#7805) 2022-01-22 10:18:30 +08:00
Pxl
b56c568a8d [fix](vectorized) fix fold const value fail at datetime type (#7803) 2022-01-22 10:16:38 +08:00
b14d1c54fd [fix](function) fix vec round reference #7421 (#7801)
reference #7421
2022-01-22 10:09:10 +08:00
f2cbf0a8d2 [chore] Improve the ldb toolchain compilation documentation (#7829)
Add document for compiling Doris with ldb toolchain
2022-01-21 21:36:43 +08:00
800a36343a [chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712)
Prepare to generate hermetic build using GCC 11 and Clang 13.
The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh)

To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.
2022-01-21 12:12:04 +08:00
0efef1b332 [fix](schema-change) Fix bug that schema change may return -102 error (#7808)
When using linked schema change, we need to check if all rowsets are of the same type,
ALPHA or BETA. otherwise, we need to use direct schema change to convert the data.
2022-01-21 10:59:54 +08:00
ed39ff1500 [feature](compaction) Support triggering compaction for a specific partition manually (#7521)
Add statement to trigger cumulative or base compaction for a specified partition.
2022-01-21 09:27:06 +08:00
4c17c370e7 [Doc]Modify the audit log plugin to record the SQL statement field type as String (#7798)
Modify the audit log plugin to record the SQL statement field type as String
2022-01-20 21:36:02 +08:00
4768dd4efa [docs] Documentation corrections (#7787) 2022-01-20 16:18:48 +08:00
ef984a6a72 [improvement](load) Improve load fault tolerance (#7674)
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.

This PR mainly changes:

1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
2022-01-20 09:23:21 +08:00
7574d39d14 [fix](bitmap-index) Fix bug that bitmap index may return wrong result. (#7788)
Fix the following bugs.

1. `column1` created a bitmap index.
2. `column1` has a lot index items in the bitmap index, and the index page is divided into two levels.
3. `column1`'s value range is `[1000, 10000000]`.
4. the query condition is `column1 > 0`
5. the empty result will be returned, while the expected value should be 9999000 rows.
2022-01-19 12:27:08 +08:00
aacbc960c8 [fix][chore](thrift) Fix warning when generate cpp code by thrift IDL file and use strict mode (#7773) 2022-01-19 12:26:44 +08:00
5fc0a9f40d [improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319)
This PR mainly changes:

1. Help to Cancel the load job ASAP when encounter unqualified data.
    Solution is described in #6318 .
    Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues.

2. fix a NPE bug when create user with empty host
3. fix compile warning after rebasing the master(vectorization)
2022-01-18 13:13:55 +08:00
efb4e189df [fix](lateral-view) Fix some lateral view bugs (#7772)
1. Fix bug that BE may crash when input node of TableFunctionNode has non-null column
2. Fix bug that TableFunctionNode may not return all results
2022-01-18 12:09:32 +08:00
3494c8973b [improvement](colocation) Add a new config to delay the relocation of colocation group (#7656)
1. Add a new FE config `colocate_group_relocate_delay_second`

    The relocation of a colocation group may involve a large number of tablets moving within the cluster.
    Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible.
    Relocation usually occurs after a BE node goes offline or goes down.
    This config is used to delay the determination of BE node unavailability.
    The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group
    will not be triggered.

2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL

3. Add a new FE config allow_replica_on_same_host

    If set to true, when creating table, Doris will allow to locate replicas of a tablet
    on same host. And also the tablet repair and balance will be disabled.
    This is only for local test, so that we can deploy multi BE on same host and create table
    with multi replicas.
2022-01-18 10:26:36 +08:00
946fa2960d [improvement](broker) add some properties that can be set in the broker conf file (#7499) 2022-01-18 10:24:54 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
ebc27a40d7 [docs] Split the FAQ And Revert auto-label action (#7770) 2022-01-17 10:34:56 +08:00
e80c34b6fe [docs][typo] fix some typos in documents (#7769) 2022-01-16 10:43:42 +08:00
5c7863c683 [improvement](fe-unit-test) Fix port in use when the cluster starts in UT. (#7768) 2022-01-16 10:42:56 +08:00
88a3d08fee [fix] fix NPE in SysVariableDesc::equal (#7766) 2022-01-16 10:42:24 +08:00
36d6d236ad [refactor] remove duplicate if that will never be used (#7761) 2022-01-16 10:41:59 +08:00
5f8d91257b [improvement](routine-load) Reduce the probability that the routine load task rpc timeout (#7754)
If an load task has a relatively short timeout, then we need to ensure that
each RPC of this task does not get blocked for a long time.
And an RPC is usually blocked for two reasons.

1. handling "memory exceeds limit" in the RPC
    
    If the system finds that the memory occupied by the load exceeds the threshold,
    it will select the load channel that occupies the most memory and flush the memtable in it.
    this operation is done in the RPC, which may be more time consuming.

2. close the load channel

    When the load channel receives the last batch, it will end the task.
    It will wait for all memtables flushes to finish synchronously. This process is also time consuming.

Therefore, this PR solves this problem by.

1. Use timeout to determine whether it is a high-priority load task

    If the timeout of an load task is relatively short, then we mark it as a high-priority task.

2. not processing "memory exceeds limit" for high priority tasks
3. use a separate flush thread to flush memtable for high priority tasks.
2022-01-16 10:41:31 +08:00
8b7d7e4dac [improvement] create/drop index support if [not] exist (#7748)
create or drop index clause support if [not] exist
2022-01-16 10:40:44 +08:00
5b0f11b665 [feature](mysql-compatibility)(function) add WEEKDAY function (#7673)
`WEEKDAY` in MySQL: returns an index from 0 to 6 for Monday to Sunday.
`DAYOFWEEK` in MySQL: returns an index from 1 to 7 for Sunday to Saturday.

Doris only have `DAYOFWEEK` function, so I add `WEEKDAY` function.

Thanks for the following materials:
- https://github.com/apache/incubator-doris/pull/6982/files
- https://www.bilibili.com/video/BV1V44y1Y7Ro
2022-01-16 10:39:21 +08:00
4a3cbf52e3 [fix](show-load) fix show load with the same column name in Where Clause (#7523) 2022-01-15 09:54:43 +08:00
be43316f20 [docs] add doc for community feedback and fix CI (#7759)
add doc for community feedback and fix CI
2022-01-14 22:19:28 +08:00
a6ff1bd79e Flink / Spark connector compilation problem (#7725)
Flink / Spark connector compilation problem
2022-01-14 22:14:48 +08:00
e7d65e488c [style] translate code annotations into english (#7752)
Translate Chinese code comments into English,the following files has been modified:
1. be/src/olap/row_cursor.h
2. be/src/olap/compress.h
2022-01-14 09:37:46 +08:00
5c4055ac3a [style] Translate Chinese to English in be_olap_field.h (#7738) 2022-01-14 09:36:58 +08:00