Commit Graph

584 Commits

Author SHA1 Message Date
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
e0ef9b8f6c [refactor](vectorized) to_bitmap(-1) return NULL instead of return parse failed error_message (#8373) 2022-03-11 17:21:47 +08:00
7cfcddd8df [fix] brpc will check required field in proto and need_gen_rollup is moved will throw exception (#8420) 2022-03-11 00:28:33 +08:00
d880559214 [refactor] remove old schema change code on BE (#8342) 2022-03-09 13:05:44 +08:00
0ff7de4157 [refactor] remove agent status (#8273)
There are 3 error code types in BE: OLAPStatus AgentStatus Status.
It is very confused and sometimes conflict during write code.
I will try to unify them to Status.
2022-03-09 13:04:50 +08:00
Pxl
cd8694e532 [feature][vectorized] support replace() (#8384) 2022-03-08 18:57:12 +08:00
454b45bea3 [feature](vectorize)(function) support regexp&&sm4&&aes functions (#8307) 2022-03-08 13:14:02 +08:00
f52d479cbc [fix](ut) fix be ut fragment_mgr_test compile failed (#8344) 2022-03-05 14:43:20 +08:00
e7c417505c [fix] fix hash table insert() may be failed but not handle this error (#8207) 2022-03-03 22:33:05 +08:00
f622ce0497 [refactor] remove types_test (#8289)
* [refactor] remove types_test
1. remove types_test, it will cause core dump in higher version GCC or
   clang, because of memory align, some code will be vectorized in higher
   GCC or clang
2. Change string type length to 2 GB instead of -1
3. modify inaccessible code
2022-03-03 09:31:35 +08:00
8be71b69d5 [refactor] remove pusher.cpp and related mock test code (#8288) 2022-03-03 09:30:54 +08:00
246ac4e37a [fix] fix a bug of encryption function with iv may return wrong result (#8277) 2022-03-02 17:26:44 +08:00
b40e9144cb [feature-wip][array-type] Refactor type info for nested array. (#8279) 2022-03-02 14:20:39 +08:00
2b9b0fc1ec [Fix] Function percentile input null return null (#8238) 2022-03-01 14:42:48 +08:00
Pxl
7d0e36a054 [fix](be-ut) fix bitmap_ut result wrong && fix schema_change compile error (#8261) 2022-03-01 11:11:02 +08:00
c66a9bf64b [fix](be-ut) fix unit test bug for tablet_info_test (#8253)
introduced from #8041
2022-02-27 10:44:20 +08:00
a6bc9cbe53 [Function] Refactor the function code of log (#8199)
1. Support return null when input is invalid
2. Del the unless code in vec function

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-24 11:06:58 +08:00
9a7931cfed [fix](mem-pool) fix bug that mem pool failed to allocate in ASAN mode (#8216)
Also fix BE ut:

1. fix scheme_change_test memory leak
2. fix mem_pool_test
    Do not using DEFAULT_PADDING_SIZE = 0x10 in mem_pool when running ut.
3. remove plugin_test
2022-02-24 10:52:58 +08:00
0726a43a2a [fix](be-ut) Fix unused-but-set-variable errors. (#8211) 2022-02-23 21:43:15 +08:00
01fb25a498 [UT] Fix the UT of column_nullable_test (#8180)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-23 15:37:40 +08:00
e3f1efcbbf [Vec][Storage] Support delete condition;ut (#8091)
Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-02-23 12:48:18 +08:00
d17ed5e27a [vectorization](storage)support seq column in storage layer (#8186)
[vectorization](storage)support seq column in storage layer (#8186)
2022-02-23 12:23:31 +08:00
31ab569c1d [Vectorized][Feature] support some bitmap functions (#8138) 2022-02-23 11:42:16 +08:00
802fcbbb05 (#8162)refactor binary dict
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-22 11:23:54 +08:00
f13fd13e1b [fix] (schema change) Fix BE crash after schema change int column to varchar column(#8073) (#8142)
Co-authored-by: jianping.teng <tengjp@outlook.com>
2022-02-22 09:22:00 +08:00
5f50d9ae3b predicate test bugfix (#8134)
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-19 12:05:26 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
bcde1f265a [Function][Vectorized] Support least/greast function (#8107)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:57:07 +08:00
68b24d608f [fix] (vectorization)Fix nullable column compute the hash value error (#8105)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:20:47 +08:00
a162f56284 (test) resolve unit test failed problem for VGenericIteratorsTest
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-17 20:03:07 +08:00
Pxl
f06c13a828 [feature](vec)(function) support function convert_tz() (#8060) 2022-02-17 10:51:32 +08:00
bef1b55c1f [feature][fix](vec)(function) Fix multi args function call the DATETIME type not effective in DATE type and add the alias function (#8050)
1. Support some function alias of mod/fmod, adddate/add_data
2. Support some function of multi args: week, yearweek
3. Fix bug of multi args function call the DATETIME type not effective in DATE type
2022-02-17 10:49:25 +08:00
aea3e4e59b [refactor] Remove version hash from BE and related test in BE (#8027) 2022-02-14 09:29:27 +08:00
Pxl
64f71ddae3 [fix](be-ut) fix segmentation fault at unaligned address int128 (#8021) 2022-02-14 09:29:05 +08:00
18e2071278 [fix](be-unit-test) Fix memory problems in agg_test.cpp. (#8019) 2022-02-14 09:23:40 +08:00
7d7e3a39f5 [refactor] Remove snapshot converter and unused Protobuf Definitions (#8026)
1. remove snapshot converter
2. remove unused protobuf definitions
3. move some macro as const variables
2022-02-12 16:06:04 +08:00
Pxl
b26e7e3c28 [feature](function)(vec) support locate function (#7988)
* support function locate in vectorized engine

* add ut and fix some bug
2022-02-12 16:00:37 +08:00
7a73645eee [refactor] remove some unused code (#8022) 2022-02-12 15:17:28 +08:00
5029ef46c9 [fix] fix ltrim result may incorrect in some case (#7963)
fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately

this function return different between gcc and clang when x is 0
2022-02-09 13:06:37 +08:00
Pxl
0553ce2944 [feature](vectorization) support function topn && remove some unused code (#7793) 2022-02-09 13:05:31 +08:00
3048ce8a4f [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939)
This PR mainly changes:

1. Change the define of PBlock

    The new PBlock consists of a set of PColumnMeta and a binary buffer.
    The PColumnMeta records the metadata information of all columns in the Block,
    while the buffer stores the serialized binary data of all columns.
    
2. Refactor the serialize/deserialize method of data type

    Rewrite the `serialize()/deserialize()` of IDataType. And also add
    a new method `get_uncompressed_serialized_bytes()` to get the total length
    of uncompressed serialized data of a column.
    
3. Rewrite the serialize/deserialize method of Block

    Now, when serializing a Block to PBlock, it will first get the total length
    of uncompressed serialized data of all columns in this Block, and then allocate
    the memory to write the serialized data to the buffer.
    
4. Use brpc attachment to transmit the serialized column data
2022-02-08 11:11:42 +08:00
f8d086d87f [feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519)
Support implement UDF through GRPC protocol. This brings several benefits: 
1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf
2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected

But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.

Create function like

```
CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES (
  "SYMBOL"="add_int",
  "OBJECT_FILE"="127.0.0.1:9999",
  "TYPE"="RPC"
);
```
Function service need to implement `check_fn` and `fn_call` methods
Note:
THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!
2022-02-08 09:25:09 +08:00
c0e59e59aa [fix][refactor] fix bugs and refactor some code by lint (#7871)
1. Fix some `passedByValue` issues.
2. Fix some `dereferenceBeforeCheck` issues.
3. Fix some `uninitMemberVar` issues.
4. Fix some iterator `eraseDereference` issues.
5. Fix compile issue introduced from #7923 #7905 #7848
2022-02-01 14:31:14 +08:00
82f421a019 [fix](brpc-attachment) Fix bug that may cause BE crash when enable transfer_data_by_brpc_attachment (#7921)
This PR mainly changes:

1. Fix bug when enable `transfer_data_by_brpc_attachment`

    In `data_stream_sender`, we will send a serialized PRowBatch data to multiple Channels.
    And if `transfer_data_by_brpc_attachment` is enabled, we will mistakenly clear the data in PRowBatch
    after sending PRowBatch to the first Channel.
    As a result, the following Channel cannot receive the correct data, causing an error.

    So I use a separate buffer instead of `tuple_data` in PRowBatch to store the serialized data
    and reuse it in multiple channels.

2. Fix bug that the the offset in serialized row batch may overflow

    Use int64 to replace int32 offset. And for compatibility, add a new field `new_tuple_offsets` in PRowBatch.
2022-02-01 08:51:16 +08:00
c1fef37399 [improvement](runtime-filter) Support adaptive runtime filter(#7546) (#7645)
Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER
    The processing logic is
    If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work
    If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect

Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER
2022-01-30 16:46:52 +08:00
fb6e22f4ca [Fix] fix memory leak in be unit test (#7857)
1. fix be unit test memory leak
2. ignore mindump test with ASAN test
2022-01-29 01:00:38 +08:00
Pxl
cd73a6b84b [chore] fix clang compile error (#7883) 2022-01-26 12:53:35 +08:00
cf02e43ec1 [improvement](vectorized) optimize dict read (#7805) 2022-01-22 10:18:30 +08:00
800a36343a [chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712)
Prepare to generate hermetic build using GCC 11 and Clang 13.
The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh)

To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.
2022-01-21 12:12:04 +08:00
0efef1b332 [fix](schema-change) Fix bug that schema change may return -102 error (#7808)
When using linked schema change, we need to check if all rowsets are of the same type,
ALPHA or BETA. otherwise, we need to use direct schema change to convert the data.
2022-01-21 10:59:54 +08:00