Commit Graph

537 Commits

Author SHA1 Message Date
2b1aec871d [docs] Modify the SSB document (#8101)
Add one step of preparation before load
2022-02-18 11:14:20 +08:00
7471873e6f Add version upgrade instructions (#8057)
Add version upgrade instructions
2022-02-17 18:06:41 +08:00
289aacb78c [improvement] enable check_java_version (#8034)
Enable to check the Java version when Doris starts, to prevent the user experience caused by the inconsistency 
between the compiled version and the running version.
If the Java version is compiled and the Java version is run, it will not start, and a prompt message will be given.
2022-02-17 11:16:45 +08:00
486a0586ac [community] modify the doc of verifying apache release (#8084) 2022-02-17 10:53:31 +08:00
GRX
d9535c29f6 [doc] update alter table docs (#8076) 2022-02-17 10:52:59 +08:00
26289c28b0 [fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072)
1. Fix the problem of BE crash caused by destruct sequence. (close #8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`

    This config specify the max concurrent compaction task num on fast disk(typically .SSD).
    So that for high speed disk, we can execute more compaction task at same time,
    to compact the data as soon as possible

3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
2022-02-17 10:52:08 +08:00
79fd81f035 [doc] Added be -238 error code description (#8048)
Added be -238 error code description
2022-02-17 10:47:52 +08:00
264f38471c [feature](spark-load) add Hive Bitmap UDFs (#8036)
Hive Bitmap UDF provides UDFs for generating bitmap and bitmap operations in hive tables.
The bitmap in Hive is exactly the same as the Doris bitmap.
The bitmap in Hive can be imported into Doris through spark bitmap load.
2022-02-17 10:45:20 +08:00
e6fedff68f [Refactor][heartbeat] Make get fe heart response by thrift (#8035)
* [Refactor] Make get fe heart response by thrift

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-02-17 10:25:51 +08:00
a6bf8c13eb [Feature](Transaction) Support two phase commit (2PC) for stream load (#7473)
The two phase batch commit means:
During Stream load, after data is written, the message will be returned to the client,
the data is invisible at this point and the transaction status is PRECOMMITTED.
The data will be visible only after COMMIT is triggered by client.
    
1. User can invoke the following interface to trigger commit operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc

    
2.User can invoke the following interface to trigger abort operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc
2022-02-16 11:55:04 +08:00
884fddbf33 [fix](compatibility) Fix compatibility issue of PRowBatch and some tablet sink bugs (#8000)
1. set both `tuple_offsets` and `new_tuple_offsets` in PRowBatch for compatibility
2. set FE config `repair_slow_replica` default to false
   Avoid impacting the load process after upgrading.
   Eg, if there are only 2 replicas, one is with high version count. After upgrade,
   that replica will be set to bad, so that the load process will be stopped
   because only 1 replica is alive.
3. Fix a bug that NodeChannel may be blocked at `close_wait()`
   Forget to set `add_batch_finish` flag after the last rpc finished.
4. Fix a NPE of RoutineLoadScheduler
2022-02-15 11:23:19 +08:00
aea3e4e59b [refactor] Remove version hash from BE and related test in BE (#8027) 2022-02-14 09:29:27 +08:00
8d7a0d9747 [docs](routine-load)Update routine-load-manual.md (#8006) 2022-02-14 09:28:08 +08:00
Pxl
64fb8dab39 [feature] (function)(vec) support pmod function (#7977) 2022-02-12 16:00:11 +08:00
76483038f4 [docs](config) correct mysql jdbc auto-retry url format (#8009)
Correct format of mysql jdbc failover auto-retry url.
2022-02-11 22:07:02 +08:00
0b2b328c7b [doc] remove useless word 'To' in materialized view (#7985) 2022-02-10 15:08:58 +08:00
92b690f3eb [feature-wip](iceberg) Step2: add table creation strict mode and support refresh iceberg table or db. (#7981)
1. Add `iceberg_table_creation_strict_mode` in `fe.conf` to control iceberg external table creation, when data type  is not supported in Doris.
2. Add `REFRESH` syntax to synchronize the Iceberg table and database.
3. Support create Iceberg external table with specific column definitions.
2022-02-10 15:08:04 +08:00
2e27827c73 [doc] Added http interface return example to obtain the specified table structure information (#7955)
1. Added http interface return example in table-schema-action.md.
2. Correct typos in the document in error.md.
3. Modify the content of the code comments in the text_converter.hpp file.
2022-02-10 15:07:28 +08:00
c3b010b277 [refactor] Remove flink/spark connectors (#8004)
As we discussed in dev@doris[1]
Flink/Spark connectors has been moved to new repo: https://github.com/apache/incubator-doris-connectors

[1] https://lists.apache.org/thread/hnb7bf0l6y6rzb9pr6lhxz3jjoo04skl
2022-02-10 15:00:36 +08:00
4c22f3c6e1 (docs) Improve Flink Connector documentation description (#7946) 2022-02-08 09:50:38 +08:00
f0a3c852f1 [Docs] fix typo (#7964) 2022-02-08 09:46:31 +08:00
c1fef37399 [improvement](runtime-filter) Support adaptive runtime filter(#7546) (#7645)
Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER
    The processing logic is
    If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work
    If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect

Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER
2022-01-30 16:46:52 +08:00
4c7525cf2c [improvement](show) Support that user can use show data skew statement instead of admin (#7914)
* [improvement](show) Support that user can use show data skew statement instead of admin
This PR mainly do two things:
1. Support that user can use show data skew statement instead of admin
2. Fix fe ut failed caused by pr [improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql #7876 and pr [feature-wip](iceberg) Step1: Support create Iceberg external table #7391

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-01-29 10:45:03 +08:00
2d99041ec0 [docs] fix typo (#7907) 2022-01-28 22:32:50 +08:00
f9ac302807 [docs] add function substring document (#7895) 2022-01-28 22:26:11 +08:00
0d3fe8f07b [typo](docs) fix document backup-restore.md typo (#7868)
Refactor the document format and improve the content of the English version
2022-01-27 10:34:34 +08:00
d69b7bff2e [feature](meta) Support show compactionTooSlowTablets and oversizeTablets (#7821)
Add more columns in `show proc "/statistic"`
2022-01-27 10:26:41 +08:00
3b8d48f08b [feature-wip](iceberg) Step1: Support create Iceberg external table (#7391)
Close related #7389

Support create Iceberg external table in Doris. 

This is the first step to support Iceberg external table.

### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.

1. Create an Iceberg external table directly

```sql
    CREATE [EXTERNAL] TABLE table_name 
    ENGINE = ICEBERG
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.table" = "icberg_table_name",
    "iceberg.hive.metastore.uris"  =  "thrift://192.168.0.1:9083",
    "iceberg.catalog.type"  =  "HIVE_CATALOG"
    );
```

2. Create an Iceberg database and automatically create all the tables under that db.

```sql
    CREATE DATABASE db_name 
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
```

### Show table creation

1. For individual tables you can view them with `help show create table`.

```sql 
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                 |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
  `level` varchar(-1) NOT NULL COMMENT "null",
  `event_time` datetime NOT NULL COMMENT "null",
  `message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris"  =  "thrift://10.10.10.10:9087",
"iceberg.catalog.type"  =  "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

2. For Iceberg database, you can view it with `help show table creation`.

```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table  | Status  | Create Time         | Error Msg                                               |
+--------+---------+---------------------+---------------------------------------------------------+
| logs   | fail    | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 |                                                         |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```

  This is a new syntax.
  
  Show table creation records in Iceberg database:
  
  Syntax:
  ```sql
      SHOW TABLE CREATION [FROM db] [LIKE mask]
  ```
2022-01-27 10:22:47 +08:00
38660d5095 [docs]correct SeaTunnel website hyperlink and PPMC into the Member List (#7886)
* [docs]correct SeaTunnel website hyperlink
* [docs]Add the new PPMC into the Member List

* Update members.md
2022-01-26 15:24:52 +08:00
e3bc232578 [docs] OS Installation Requirements (#7830) 2022-01-26 09:11:41 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
461b352d3e [fix](function) Change digital_masking function arg type to BIGINT (#7888)
Change digital_masking function arg type to BIGINT to fix the wrong result.
2022-01-25 22:28:05 +08:00
40f993ca15 [docs][seatunnel] add seatunnel flink doris sink doc (#7844) 2022-01-24 21:13:06 +08:00
8aa9faa7cb [chore](docker) Add docker dev image with ldb-toolchain (#7838)
Add docker images `apache/incubator-doris:build-env-ldb-toolchain-latest`,
which is built with ldb-toolchain
2022-01-24 21:12:15 +08:00
4e9bc5cb65 [doc] add documents for bitwise functions (#7790) 2022-01-24 21:08:41 +08:00
86f34323a8 [Doc]Doris compile and install JDK version incompatibility problem (#7797)
* Doris compile and install JDK version incompatibility problem
2022-01-24 13:57:18 +08:00
60c6bb4f92 [Feature][flink-connector] support flink delete option (#7457)
* Flink Connector supports delete option on Unique models
Co-authored-by: wudi <wud3@shuhaisc.com>
2022-01-23 20:24:41 +08:00
c5ec6dbc51 [docs](install-deploy) fix misplaced whitespace(#7814) (#7816)
Misplaced whitespace causes unexpected output. Fix it so newcomers need not to
worry if they are lost by the docs.
2022-01-22 10:20:24 +08:00
f2cbf0a8d2 [chore] Improve the ldb toolchain compilation documentation (#7829)
Add document for compiling Doris with ldb toolchain
2022-01-21 21:36:43 +08:00
800a36343a [chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712)
Prepare to generate hermetic build using GCC 11 and Clang 13.
The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh)

To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.
2022-01-21 12:12:04 +08:00
ed39ff1500 [feature](compaction) Support triggering compaction for a specific partition manually (#7521)
Add statement to trigger cumulative or base compaction for a specified partition.
2022-01-21 09:27:06 +08:00
4c17c370e7 [Doc]Modify the audit log plugin to record the SQL statement field type as String (#7798)
Modify the audit log plugin to record the SQL statement field type as String
2022-01-20 21:36:02 +08:00
4768dd4efa [docs] Documentation corrections (#7787) 2022-01-20 16:18:48 +08:00
3494c8973b [improvement](colocation) Add a new config to delay the relocation of colocation group (#7656)
1. Add a new FE config `colocate_group_relocate_delay_second`

    The relocation of a colocation group may involve a large number of tablets moving within the cluster.
    Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible.
    Relocation usually occurs after a BE node goes offline or goes down.
    This config is used to delay the determination of BE node unavailability.
    The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group
    will not be triggered.

2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL

3. Add a new FE config allow_replica_on_same_host

    If set to true, when creating table, Doris will allow to locate replicas of a tablet
    on same host. And also the tablet repair and balance will be disabled.
    This is only for local test, so that we can deploy multi BE on same host and create table
    with multi replicas.
2022-01-18 10:26:36 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
ebc27a40d7 [docs] Split the FAQ And Revert auto-label action (#7770) 2022-01-17 10:34:56 +08:00
e80c34b6fe [docs][typo] fix some typos in documents (#7769) 2022-01-16 10:43:42 +08:00
5f8d91257b [improvement](routine-load) Reduce the probability that the routine load task rpc timeout (#7754)
If an load task has a relatively short timeout, then we need to ensure that
each RPC of this task does not get blocked for a long time.
And an RPC is usually blocked for two reasons.

1. handling "memory exceeds limit" in the RPC
    
    If the system finds that the memory occupied by the load exceeds the threshold,
    it will select the load channel that occupies the most memory and flush the memtable in it.
    this operation is done in the RPC, which may be more time consuming.

2. close the load channel

    When the load channel receives the last batch, it will end the task.
    It will wait for all memtables flushes to finish synchronously. This process is also time consuming.

Therefore, this PR solves this problem by.

1. Use timeout to determine whether it is a high-priority load task

    If the timeout of an load task is relatively short, then we mark it as a high-priority task.

2. not processing "memory exceeds limit" for high priority tasks
3. use a separate flush thread to flush memtable for high priority tasks.
2022-01-16 10:41:31 +08:00
8b7d7e4dac [improvement] create/drop index support if [not] exist (#7748)
create or drop index clause support if [not] exist
2022-01-16 10:40:44 +08:00
5b0f11b665 [feature](mysql-compatibility)(function) add WEEKDAY function (#7673)
`WEEKDAY` in MySQL: returns an index from 0 to 6 for Monday to Sunday.
`DAYOFWEEK` in MySQL: returns an index from 1 to 7 for Sunday to Saturday.

Doris only have `DAYOFWEEK` function, so I add `WEEKDAY` function.

Thanks for the following materials:
- https://github.com/apache/incubator-doris/pull/6982/files
- https://www.bilibili.com/video/BV1V44y1Y7Ro
2022-01-16 10:39:21 +08:00