Commit Graph

123 Commits

Author SHA1 Message Date
869fdff2f0 [refactor] add reference path for source file from impala (#9115)
According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.
2022-04-20 12:29:57 +08:00
a2d6724fa7 [fix] change parameter type of hll_cardinality from STRING to HLL (#9002)
In #8882, we disabled the conversion from string to hll. So we need to change the parameter type of hll_cardinality() too
2022-04-15 15:17:11 +08:00
7f7172807f [feature](function)(vectorized) Support all geolocation functions on vectorized engine (#8846) 2022-04-11 09:36:53 +08:00
f3c6ddf651 [feature](function) Support geolocation functions on vectorized engine (#8790) 2022-04-03 10:50:54 +08:00
4d516bece8 [feature-wip](array-type)Add element_at and subscript functions (#8597)
Describe the overview of changes.
1. add function element_at;
2. support element_subscript([]) to get element of array, col_array[N] <==> element_at(col_array, N);
3. return error message instead of BE crash while array function execute failed;

element_at(array, index) desc:
>   Returns element of array at given **(1-based)** index. 
  If **index < 0**, accesses elements from the last to the first. 
  Returns NULL if the index exceeds the length of the array or the array is NULL.

Usage example:
1. create table with ARRAY type column and insert some data:
```
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
|    2 |    3 | NULL   |
|    4 | NULL | []     |
|    3 | NULL | NULL   |
+------+------+--------+
```
2. enable vectorized:
```
set enable_vectorized_engine=true;
```
3. element_subscript([]) usage example:
```
> select k1,k3,k3[1] from array_test;
+------+--------+----------------------------+
| k1   | k3     | %element_extract%(`k3`, 1) |
+------+--------+----------------------------+
|    3 | NULL   |                       NULL |
|    1 | [1, 2] |                          1 |
|    2 | NULL   |                       NULL |
|    4 | []     |                       NULL |
+------+--------+----------------------------+
```
4. element_at function usage example:
```
> select k1,k3 from array_test where element_at(k3, -1) = 2;
+------+--------+
| k1   | k3     |
+------+--------+
|    1 | [1, 2] |
+------+--------+
```
2022-04-02 12:03:56 +08:00
bea9a7ba4f [feature] Support pre-aggregation for quantile type (#8234)
Add a new column-type to speed up the approximation of quantiles.
1. The  new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles.
2. support pre-aggregation of new column-type and quantile_state related functions.
2022-03-24 09:11:34 +08:00
71ce3c4a6e [feature-wip](array-type) Add codes and UT for array_contains and array_position functions (#8401) (#8589)
array_contains function Usage example:
1. create table with ARRAY column, and insert some data:
```
> select * from array_test;
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
|    2 |    3 | NULL   |
|    4 | NULL | []     |
|    3 | NULL | NULL   |
+------+------+--------+
```
2. enable vectorized:
```
> set enable_vectorized_engine=true;
```
3. select with array_contains:
```
> select k1,array_contains(k3,1) from array_test;
+------+-------------------------+
| k1   | array_contains(`k3`, 1) |
+------+-------------------------+
|    3 |                    NULL |
|    1 |                       1 |
|    2 |                    NULL |
|    4 |                       0 |
+------+-------------------------+
```
4. also we can use array_contains in where condition
```
> select * from array_test where array_contains(k3,1);
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
+------+------+--------+
```
5. array_position usage example
```
> select k1,k3,array_position(k3,2) from array_test;
+------+--------+-------------------------+
| k1   | k3     | array_position(`k3`, 2) |
+------+--------+-------------------------+
|    3 | NULL   |                    NULL |
|    1 | [1, 2] |                       2 |
|    2 | NULL   |                    NULL |
|    4 | []     |                       0 |
+------+--------+-------------------------+
```
2022-03-22 15:42:40 +08:00
e0ef9b8f6c [refactor](vectorized) to_bitmap(-1) return NULL instead of return parse failed error_message (#8373) 2022-03-11 17:21:47 +08:00
454b45bea3 [feature](vectorize)(function) support regexp&&sm4&&aes functions (#8307) 2022-03-08 13:14:02 +08:00
Pxl
668188b91f [improvement][vectorized] support es node predicate peel (#8174) 2022-02-26 17:02:54 +08:00
a6bc9cbe53 [Function] Refactor the function code of log (#8199)
1. Support return null when input is invalid
2. Del the unless code in vec function

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-24 11:06:58 +08:00
Pxl
90a8ca808a [Bug][Vectorized] fix bitmap_min(empty) not return null (#8190) 2022-02-24 11:06:27 +08:00
31ab569c1d [Vectorized][Feature] support some bitmap functions (#8138) 2022-02-23 11:42:16 +08:00
Pxl
87e555c27d [Feature][Vectorized] support function json_array/json_object/json_quote (#8158) 2022-02-22 09:29:56 +08:00
bcde1f265a [Function][Vectorized] Support least/greast function (#8107)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:57:07 +08:00
f6e2a4fe16 [Vectorized][Function] Support year/month/week/hour/mintue/day/second floor/ceil function (#8068)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-17 14:18:02 +08:00
bef1b55c1f [feature][fix](vec)(function) Fix multi args function call the DATETIME type not effective in DATE type and add the alias function (#8050)
1. Support some function alias of mod/fmod, adddate/add_data
2. Support some function of multi args: week, yearweek
3. Fix bug of multi args function call the DATETIME type not effective in DATE type
2022-02-17 10:49:25 +08:00
Pxl
64fb8dab39 [feature] (function)(vec) support pmod function (#7977) 2022-02-12 16:00:11 +08:00
071be928f9 [fix](vectorized) fix bug multi distinct function get wrong type (#7900) 2022-01-28 22:31:41 +08:00
f2cbf0a8d2 [chore] Improve the ldb toolchain compilation documentation (#7829)
Add document for compiling Doris with ldb toolchain
2022-01-21 21:36:43 +08:00
800a36343a [chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712)
Prepare to generate hermetic build using GCC 11 and Clang 13.
The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh)

To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.
2022-01-21 12:12:04 +08:00
ef984a6a72 [improvement](load) Improve load fault tolerance (#7674)
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.

This PR mainly changes:

1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
2022-01-20 09:23:21 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
5b0f11b665 [feature](mysql-compatibility)(function) add WEEKDAY function (#7673)
`WEEKDAY` in MySQL: returns an index from 0 to 6 for Monday to Sunday.
`DAYOFWEEK` in MySQL: returns an index from 1 to 7 for Sunday to Saturday.

Doris only have `DAYOFWEEK` function, so I add `WEEKDAY` function.

Thanks for the following materials:
- https://github.com/apache/incubator-doris/pull/6982/files
- https://www.bilibili.com/video/BV1V44y1Y7Ro
2022-01-16 10:39:21 +08:00
bc4ceeca44 [improvement] optimize java cmd find (#7428)
* optimize java cmd find, if java_home not set use java in PATH
2021-12-30 10:16:56 +08:00
07e2acb2f3 [feature] Suport national secret (national commercial password) algorithm SM3/SM4 (#7464)
SM3 is password hash algorithm
SM4 is a block cipher used to replace DES / AES and other international algorithms.
2021-12-28 10:39:54 +08:00
0c154733e0 [feature](function) support bitmap_union/intersect have more columns parameters (#7379)
support multi bitmap parameter for all bitmap aggregation function
2021-12-26 11:03:20 +08:00
d8ba6e3eb6 1. Fix an error when fetch string type field may cause malform packet error. (#7262)
This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH  in fe should be 2^24 -1,
   but it is set as 2^24 -2 by mistake.
2. Fix bitmap_to_string may failed when the result is large than 2G
2021-12-01 10:02:34 +08:00
dcad6ff5e5 [License] Add License header for missing files (#7130)
1. Add License header for missing files
2. Modify the spark pom.xml to correct the location of `thrift`
2021-11-16 18:37:54 +08:00
e69249c082 sub_bitmap (#6977)
Starting from the offset position, intercept the specified limit bitmap elements and return a bitmap subset.

Types of chang
2021-11-06 13:31:03 +08:00
599ecb1f30 [Function] Add bitmap function bitmap_subset_limit (#6980)
Add bitmap function bitmap_subset_limit.
This function will return subset in specified index.
2021-11-04 12:14:47 +08:00
aeec9c45e6 [Function] Add bitmap-xor-count function for doris (#6982)
Add bitmap-xor-count function for doris

relate to #6875
2021-11-02 16:37:00 +08:00
1ff3d708ca [Function] add functions of bitmap_and/or_count (#6912)
issue #6875
add bitmap_and_count/ bitmap_or_count
2021-11-01 14:00:07 +08:00
c7a3116f98 [Function] add bitmap function of bitmap_has_all (#6918)
The 'bitmap_has_all' function returns true if the first bitmap contains all the elements of the second bitmap.
2021-11-01 12:50:47 +08:00
65ded82778 [Function] add BE bitmap function bitmap_subset_in_range (#6917)
Add bitmap function bitmap_subset_in_range.
This function will return subset in specified range (not include the range_end).
2021-11-01 11:05:19 +08:00
Pxl
28030294f7 [Feature] Support bitmap_and_not & bitmap_and_not_count (#6910)
Support bitmap_and_not & bitmap_and_not_count.
2021-11-01 10:11:54 +08:00
a842d41b87 [Function] add BE bitmap function bitmap_max (#6942)
Support bitmap_max.
2021-10-30 18:16:38 +08:00
7a15e583a7 [Feature]Support functions of json_array, json_object, json_quote (#6504) 2021-09-02 09:59:02 +08:00
acc5fd2f21 [BUG] Fix string type cast bug and runtime filter may core when not support avx2 (#6495)
* fix string type cast bug and runtime filter instructions may not support

* add arm support
2021-08-26 09:14:31 +08:00
9216735cfa [New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329)
1. FE vectorized plan code
2. Function register vec function
3. Diff function nullable type
4. New thirdparty code and new thrift struct
2021-08-11 14:54:06 +08:00
ed3ff470ce [ARRAY] Support array type load and select not include access by index (#5980)
This is part of the array type support and has not been fully completed. 
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table

this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979
2021-07-13 14:02:39 +08:00
f93a272956 [Bug] Fix bug that nondeterministic functions should not be rewrote in create view stmt (#6096)
create view v1 as select now() should not be rewrote as:
create view v1 as select "2021-06-26 12:11:11";
2021-07-13 11:35:35 +08:00
c929a8935a [Feature][Function] support bit_length function (#6140)
support bit_length function like mysql
2021-07-08 09:40:30 +08:00
739c0268ff [refactor] Remove decimal v1 related code from code base (#6079)
remove ALL DECIMAL V1 type code , this is a part of #6073
2021-07-07 10:26:32 +08:00
d33a6d1b98 [Function] Support date function: yearweek(), week(), makedate(). (#6000) 2021-06-10 17:38:25 +08:00
629e440a67 [Bug] Fix the bug of nullif function: (#5882)
1. Prevent return NULL call nullif(98, null) in FE
2. Support DecimalV2 of nullif function to get the right result
2021-05-26 10:01:17 +08:00
1e8c4584ab [Function] Add BE udf bitmap_min (#2538) (#5581)
this function will return the min result of the input bitmap .
2021-04-08 09:11:32 +08:00
a1808c1a71 [Function] Add BE udf bitmap_not (#5346) (#5357)
this function will return the not result of inputs two bitmap.
2021-02-07 22:39:17 +08:00
05ac7fcd4a [Function] Add BE udf bitmap_xor (#5098)
this function will return the xor result of inputs two bitmap .
2021-01-04 09:27:46 +08:00
1267d6bf66 [Bug][MultiLoad] Fix multiload missing userinfo and rebase error (#5058) 2020-12-11 12:01:32 +08:00