Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening
key point:
1. each disk has a thread pool for scanners
2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin.
performance testing:
vec version: 25% faster than single thread pool in a high io util disk test case
normal version: 8% faster than single thread pool in a high io util disk test case
1. Support some function alias of mod/fmod, adddate/add_data
2. Support some function of multi args: week, yearweek
3. Fix bug of multi args function call the DATETIME type not effective in DATE type
fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately
this function return different between gcc and clang when x is 0
1. reuse Schema to avoid copying, because clone Schema will generate a lot of sub Field object
2. call interface provided by Block to reduce code lines
This PR mainly changes:
1. Change the define of PBlock
The new PBlock consists of a set of PColumnMeta and a binary buffer.
The PColumnMeta records the metadata information of all columns in the Block,
while the buffer stores the serialized binary data of all columns.
2. Refactor the serialize/deserialize method of data type
Rewrite the `serialize()/deserialize()` of IDataType. And also add
a new method `get_uncompressed_serialized_bytes()` to get the total length
of uncompressed serialized data of a column.
3. Rewrite the serialize/deserialize method of Block
Now, when serializing a Block to PBlock, it will first get the total length
of uncompressed serialized data of all columns in this Block, and then allocate
the memory to write the serialized data to the buffer.
4. Use brpc attachment to transmit the serialized column data
Support implement UDF through GRPC protocol. This brings several benefits:
1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf
2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected
But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.
Create function like
```
CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES (
"SYMBOL"="add_int",
"OBJECT_FILE"="127.0.0.1:9999",
"TYPE"="RPC"
);
```
Function service need to implement `check_fn` and `fn_call` methods
Note:
THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!
This PR mainly changes:
1. Fix bug when enable `transfer_data_by_brpc_attachment`
In `data_stream_sender`, we will send a serialized PRowBatch data to multiple Channels.
And if `transfer_data_by_brpc_attachment` is enabled, we will mistakenly clear the data in PRowBatch
after sending PRowBatch to the first Channel.
As a result, the following Channel cannot receive the correct data, causing an error.
So I use a separate buffer instead of `tuple_data` in PRowBatch to store the serialized data
and reuse it in multiple channels.
2. Fix bug that the the offset in serialized row batch may overflow
Use int64 to replace int32 offset. And for compatibility, add a new field `new_tuple_offsets` in PRowBatch.