Commit Graph

54 Commits

Author SHA1 Message Date
d17ed5e27a [vectorization](storage)support seq column in storage layer (#8186)
[vectorization](storage)support seq column in storage layer (#8186)
2022-02-23 12:23:31 +08:00
31ab569c1d [Vectorized][Feature] support some bitmap functions (#8138) 2022-02-23 11:42:16 +08:00
b1e7343532 [Vectorized] [HashJoin] Opt HashJoin Performance (#8119)
Co-authored-by: lihaopeng <happenlee@hotmail.com>
2022-02-23 10:28:16 +08:00
802fcbbb05 (#8162)refactor binary dict
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-22 11:23:54 +08:00
Pxl
87e555c27d [Feature][Vectorized] support function json_array/json_object/json_quote (#8158) 2022-02-22 09:29:56 +08:00
c47368f80c [fix] (udf) fix check_fn and fn_call function name not same (#8132) 2022-02-22 09:18:07 +08:00
56adc7f56b [Bug][vec] Fix bug of nullable const value convert to argument cause coredump (#8139)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-20 20:05:23 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
8892780091 [Vectorized][Feature] support agg function percentile&&percentile_approx (#8066) 2022-02-18 13:42:24 +08:00
bcde1f265a [Function][Vectorized] Support least/greast function (#8107)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:57:07 +08:00
68b24d608f [fix] (vectorization)Fix nullable column compute the hash value error (#8105)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:20:47 +08:00
b9f0b5565c [refactor](storage) refactor some interfaces of storage layer column (#8064)
1 format binary plain
2 remove batch_set_null_bitmap
3 fix segiter return value
4 set insert_many_binary_data args
2022-02-18 10:54:51 +08:00
936da4f10a [feature](thread-pool) Support thread pool per disk for scanners (#7994)
Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening

key point:
1. each disk has a thread pool for scanners
2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin.

performance testing: 
vec version: 25% faster than single thread pool in a high io util disk test case
normal version: 8% faster than single thread pool in a high io util disk test case
2022-02-18 09:40:58 +08:00
bdd78f20c8 [Vectorized][HashJoin] Eliminate hashjoin branch prediction (#8051)
Co-authored-by: jewisliu <jewisliu@tencent.com>
2022-02-17 19:00:26 +08:00
Pxl
e0dbf48682 [Vectorized] [AggFunction] Support group_concat (#8086) 2022-02-17 14:19:07 +08:00
f6e2a4fe16 [Vectorized][Function] Support year/month/week/hour/mintue/day/second floor/ceil function (#8068)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-17 14:18:02 +08:00
f8411f3c6a [refactor](mysql_table_writer)split into two parts of vectorized and row mode (#8081) 2022-02-17 11:29:25 +08:00
Pxl
f06c13a828 [feature](vec)(function) support function convert_tz() (#8060) 2022-02-17 10:51:32 +08:00
bef1b55c1f [feature][fix](vec)(function) Fix multi args function call the DATETIME type not effective in DATE type and add the alias function (#8050)
1. Support some function alias of mod/fmod, adddate/add_data
2. Support some function of multi args: week, yearweek
3. Fix bug of multi args function call the DATETIME type not effective in DATE type
2022-02-17 10:49:25 +08:00
0003822da7 [feature](vec) add ColumnHLL to support hll type (#7828) 2022-02-17 10:44:42 +08:00
Pxl
143c4085ee [Feature][Vectorized] support aggregate function ndv()/approx_count_distinct() (#8044) 2022-02-16 14:30:13 +08:00
25d64775d1 [Vectorized][Feature] Support mysql external table insert into stm (#7979) 2022-02-15 14:58:58 +08:00
7d7e3a39f5 [refactor] Remove snapshot converter and unused Protobuf Definitions (#8026)
1. remove snapshot converter
2. remove unused protobuf definitions
3. move some macro as const variables
2022-02-12 16:06:04 +08:00
Pxl
b26e7e3c28 [feature](function)(vec) support locate function (#7988)
* support function locate in vectorized engine

* add ut and fix some bug
2022-02-12 16:00:37 +08:00
Pxl
64fb8dab39 [feature] (function)(vec) support pmod function (#7977) 2022-02-12 16:00:11 +08:00
7a73645eee [refactor] remove some unused code (#8022) 2022-02-12 15:17:28 +08:00
5029ef46c9 [fix] fix ltrim result may incorrect in some case (#7963)
fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately

this function return different between gcc and clang when x is 0
2022-02-09 13:06:37 +08:00
db20e1f323 [refactor](storage) VGenericIterator to reuse Schema (#7858)
1. reuse Schema to avoid copying, because clone Schema will generate a lot of sub Field object
2. call interface provided by Block to reduce code lines
2022-02-09 13:06:03 +08:00
Pxl
0553ce2944 [feature](vectorization) support function topn && remove some unused code (#7793) 2022-02-09 13:05:31 +08:00
3048ce8a4f [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939)
This PR mainly changes:

1. Change the define of PBlock

    The new PBlock consists of a set of PColumnMeta and a binary buffer.
    The PColumnMeta records the metadata information of all columns in the Block,
    while the buffer stores the serialized binary data of all columns.
    
2. Refactor the serialize/deserialize method of data type

    Rewrite the `serialize()/deserialize()` of IDataType. And also add
    a new method `get_uncompressed_serialized_bytes()` to get the total length
    of uncompressed serialized data of a column.
    
3. Rewrite the serialize/deserialize method of Block

    Now, when serializing a Block to PBlock, it will first get the total length
    of uncompressed serialized data of all columns in this Block, and then allocate
    the memory to write the serialized data to the buffer.
    
4. Use brpc attachment to transmit the serialized column data
2022-02-08 11:11:42 +08:00
ef233701b3 [feature](vec)(load) Support vtablet sink to enable insert into by using vec query engine (#7957)
Support vtablet sink to enable insert into query in vec query engine
2022-02-08 11:04:09 +08:00
505acae931 [fix](vectorization) make sure the mem address use in agg is align in proper way before use (#7960) 2022-02-08 10:05:03 +08:00
f8d086d87f [feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519)
Support implement UDF through GRPC protocol. This brings several benefits: 
1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf
2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected

But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.

Create function like

```
CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES (
  "SYMBOL"="add_int",
  "OBJECT_FILE"="127.0.0.1:9999",
  "TYPE"="RPC"
);
```
Function service need to implement `check_fn` and `fn_call` methods
Note:
THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!
2022-02-08 09:25:09 +08:00
9eb1d1df27 [fix](vec) fix block mem use-after-free bug in agg table read (#7944) 2022-02-06 00:34:38 +08:00
51abaa89f3 [fix](vec) Fix some bugs about vec engine (#7884)
1. mem leak in vcollector iter
2. query slow in agg table limit 10
3. query slow in SSB q4,q5,q6
2022-02-03 19:21:17 +08:00
c0e59e59aa [fix][refactor] fix bugs and refactor some code by lint (#7871)
1. Fix some `passedByValue` issues.
2. Fix some `dereferenceBeforeCheck` issues.
3. Fix some `uninitMemberVar` issues.
4. Fix some iterator `eraseDereference` issues.
5. Fix compile issue introduced from #7923 #7905 #7848
2022-02-01 14:31:14 +08:00
82f421a019 [fix](brpc-attachment) Fix bug that may cause BE crash when enable transfer_data_by_brpc_attachment (#7921)
This PR mainly changes:

1. Fix bug when enable `transfer_data_by_brpc_attachment`

    In `data_stream_sender`, we will send a serialized PRowBatch data to multiple Channels.
    And if `transfer_data_by_brpc_attachment` is enabled, we will mistakenly clear the data in PRowBatch
    after sending PRowBatch to the first Channel.
    As a result, the following Channel cannot receive the correct data, causing an error.

    So I use a separate buffer instead of `tuple_data` in PRowBatch to store the serialized data
    and reuse it in multiple channels.

2. Fix bug that the the offset in serialized row batch may overflow

    Use int64 to replace int32 offset. And for compatibility, add a new field `new_tuple_offsets` in PRowBatch.
2022-02-01 08:51:16 +08:00
358bd79fb1 [improvement](vec)(Join) Mem reuse to speed up join operator (#7905)
1. Reuse the mem of output block in vec join node
2. Add the function `replicate` in column
2022-01-31 22:14:12 +08:00
Pxl
3ee000c13c [chore] support build with libc++ && add some build config (#7903)
support LIBCPP/LDD/BUILD_META_TOOL for build.sh
2022-01-30 16:47:22 +08:00
fb6e22f4ca [Fix] fix memory leak in be unit test (#7857)
1. fix be unit test memory leak
2. ignore mindump test with ASAN test
2022-01-29 01:00:38 +08:00
071be928f9 [fix](vectorized) fix bug multi distinct function get wrong type (#7900) 2022-01-28 22:31:41 +08:00
1ba20b1dbb [improvement](storage) improving Column inserter (#7855)
* optimize Column inserter

* DCHECK

* DCHECK

Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-01-27 14:18:15 +08:00
ec5ecd1604 handle conflict (#7836)
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-01-26 16:33:37 +08:00
015371ac72 [fix](grouping-set) Fix the bug of grouping set core in both vec and non vec query engine (#7800) 2022-01-26 16:15:30 +08:00
f227472db2 [chore] fix error while compiling with -O3 (#7890) 2022-01-26 12:53:56 +08:00
a6831535e9 [Vectorized][Bug] fix bug of coalesce function (#7827) 2022-01-25 20:44:16 +08:00
c2520c878c [Improvement](Vectorized) optimize SegmentIterator predication evaluate (#7795)
* [Improvement](Vectorized) optimize SegmentIterator predication evaluate

* fix bug

* move bytes32_mask_to_bits32_mask to util/simd/bits.h
2022-01-22 15:31:07 +08:00
Pxl
b56c568a8d [fix](vectorized) fix fold const value fail at datetime type (#7803) 2022-01-22 10:16:38 +08:00
b14d1c54fd [fix](function) fix vec round reference #7421 (#7801)
reference #7421
2022-01-22 10:09:10 +08:00
800a36343a [chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712)
Prepare to generate hermetic build using GCC 11 and Clang 13.
The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh)

To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.
2022-01-21 12:12:04 +08:00