Commit Graph

35 Commits

Author SHA1 Message Date
84792d0886 fix compile of master (#23467) 2023-08-25 11:47:39 +08:00
9cacf9535a [Opt](functions) Use preloaded cache to accelerate timezone parsing (#22694)
* opt

* bugfix

* fix ut

* fix stylecheck
2023-08-25 10:00:48 +08:00
7cfb3cc0aa [fix](functions) fix function substitute for datetimeV1/V2 (#23344)
* fix

* function fe
2023-08-25 09:59:38 +08:00
Pxl
8ed4045df9 [Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842) 2023-08-23 08:37:40 +08:00
86e6f5d039 [FIX](decimal)fix decimal precision (#22364)
Now we make wrong for decimal parse from string
if given string precision is bigger than defined decimal precision, we will return a overflow error, but only digit part is bigger than typed digit length , we should return overflow error when we traverse given string to decimal value
2023-08-03 21:13:58 +08:00
f2919567df [feature](datetime) Support timezone when insert datetime value (#21898) 2023-07-31 13:08:28 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
5300b21db7 [Bug](DECIMALV3) report failure if a decimal value is overflow (#18336) 2023-04-17 13:18:14 +08:00
7ae51c856e [refactor](unify exception) unify exception definition and error code (#18006)
* [refactor](unify exception) unify exception definition and error code


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-25 12:41:07 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
9114896178 [DecimalV3](opt) opt the function of decimalv3 to_string logic (#16427) 2023-02-07 13:28:07 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
9dd1d989e8 [test](decimalv3) add regression test cases for decimalv3 (#14672) 2022-12-01 15:18:40 +08:00
1ec7f45fb6 [Bug](avg) Fix avg for bigint (#14433) 2022-11-22 10:29:59 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
32b1456b28 [feature-wip](array) remove array config and check array nested depth (#13428)
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
2022-10-20 15:52:31 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
59699a4321 [feature](JSON datatype)Support JSON datatype (#10322)
Add `JSON` datatype, following features are implemented by this PR:
1. `CREATE` tables with `JSON` type columns
2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 
3. `SELECT` JSON columns

Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type)

* add JSONB data storage format type

* fix JsonLiteral resolve bug

* add DataTypeJson case in data_type_factory

* add JSON syntax check in FE

* add operators for jsonb_document, currently not support comparison between any JSON type value

* add ColumnJson and DataTypeJson

* add JsonField to store JsonValue

* add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral

* add push_json for MysqlResultWriter

* JSON column need no zone_map_index

* Revert "JSON column need no zone_map_index"

This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79.

* add JSON writer and reader, ignore zone-map for JSON column

* add json_to_string for DataTypeJson

* add olap_data_convertor for JSON type

* add some enum

* add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions

* fix column_json offsets overflow bug, format code

* remove useless TODOs, add CmpType cases for JSON type

* add license header

* format license

* format be codes

* resolve rebase master conflicts

* fix bugs for CREATE and meta related code

* refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes

* modification be codes along code review advice

* fix rebase conflicts with master

* add unit test for json_value and column_json

* fix rebase error

* rename json to jsonb

* fix some data convert bugs, set Mysql type to JSON
2022-09-25 14:06:49 +08:00
Pxl
1b4a2c287e [Improvement][chore] replace from_decv2_to_packed128 to decv2.value (#11261) 2022-07-28 10:41:27 +08:00
d67029c830 [feature-wip] (datetimev2) support cast between datetimev2 with different scales (#11198)
* [feature-wip] (datetimev2) support `cast` between datetimev2 with different scale
2022-07-26 22:36:13 +08:00
babab5d535 [feature-wip] support datetimev2 (#11085) 2022-07-23 16:07:59 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00
ec6620ae3e [feature-wip](array-type) add function arrays_overlap (#10233) 2022-06-30 08:12:29 +08:00
ca94867b4e [Feature-wip] add date v2 type (#9916) 2022-06-26 16:07:56 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
7b3865b524 [fix](ut)(vectorized) fix a potential stack overflow bug and some unit test (#9140) 2022-04-21 12:17:03 +08:00
Pxl
a9d185fcc4 [Enhancement] add clang-tidy config && add C++ Code Diagnostic document (#8642)
add clang-tidy config && add C++ Code Diagnostic document
2022-03-29 18:17:09 +08:00
be4dd0c820 [fix][vectorized] Fix error cast to boolean (#8345) 2022-03-06 13:47:46 +08:00
a6bc9cbe53 [Function] Refactor the function code of log (#8199)
1. Support return null when input is invalid
2. Del the unless code in vec function

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-24 11:06:58 +08:00
7d7e3a39f5 [refactor] Remove snapshot converter and unused Protobuf Definitions (#8026)
1. remove snapshot converter
2. remove unused protobuf definitions
3. move some macro as const variables
2022-02-12 16:06:04 +08:00
7a73645eee [refactor] remove some unused code (#8022) 2022-02-12 15:17:28 +08:00
3048ce8a4f [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939)
This PR mainly changes:

1. Change the define of PBlock

    The new PBlock consists of a set of PColumnMeta and a binary buffer.
    The PColumnMeta records the metadata information of all columns in the Block,
    while the buffer stores the serialized binary data of all columns.
    
2. Refactor the serialize/deserialize method of data type

    Rewrite the `serialize()/deserialize()` of IDataType. And also add
    a new method `get_uncompressed_serialized_bytes()` to get the total length
    of uncompressed serialized data of a column.
    
3. Rewrite the serialize/deserialize method of Block

    Now, when serializing a Block to PBlock, it will first get the total length
    of uncompressed serialized data of all columns in this Block, and then allocate
    the memory to write the serialized data to the buffer.
    
4. Use brpc attachment to transmit the serialized column data
2022-02-08 11:11:42 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00