the tuple String Slot's ptr and len are not assigned appropriately on send side, the receive side may crash in some situation.
detail description:
on send side, when we call RowBatch::serialize(PRowBatch* output_batch) to pack RowBatch, the Tuple::deep_copy()
will be called, for each String Slot, only String Slots that is not null will set ptr and len with proper value, the null String
Slots will keep original status, the ptr member will point randomly and the len member may unexpect.
on recv side, unpack is processed by RowBatch::RowBatch(const RowDescriptor&, const PRowBatch&...), in this
function, each String Slot will transfer offset to valid string_val->ptr whether the String Slot is null or not.
but some business logic depends on string_val->len=0, such as AggregateFuncTraits::init(), HyperLogLog::deserialize()
will return correctly if slice.size<=0. so if string_val->len is set to 0 in send side, everything will be ok, otherwise server
may crash.
by netcomm viewpoint, we should make sure transfer correct data, it's sender's responsibility to set data with proper
value, and do not make any presume which way the recv side will use it.
Add `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`.
Be consistent with `link-connector` options .
eg:
```scala
df.write
.format("doris")
// specify maximum number of lines in a single flushing
.option("sink.batch.size",2048)
// specify number of retries after writing failed
.option("sink.max-retries",3)
.save()
```
In an HA environment, JE will retains as many reserved files.
the jdbje log become too large.
so we should limit the reserved files size, default set 1GB
Support lateral view of the result column in subquery.
For example:
```
select e1 from (select k2 as a from test_explode group by a) tmp1
lateral view explode_split(a, ",") tmp2 as e1;
```
The lateral view will parse the inline view column
and put the table function node above the subquery.
Transfer RowBatch in Protobuf Request to Controller Attachment,
when the maximum length of the RowBatch in the Protobuf Request is exceeded.
This can avoid reaching the upper limit of the Protobuf Request length (2G),
and it is expected that performance can be improved.
This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH in fe should be 2^24 -1,
but it is set as 2^24 -2 by mistake.
2. Fix bitmap_to_string may failed when the result is large than 2G
The broker scan node has two tuple descriptors:
One is dest tuple and the other is src tuple.
The src tuple is used to read the lines of the original file,
and the dest tuple is used to save the converted lines.
The preceding filter is executed on the src tuple, so src tuple descriptor should be used
to initialize the filter expression
If the calculation of the lateral view function is completed,
the result will be directly returned to the upper layer.
It will cause a lot of memory copy and network transmission.
The reason is that the original column that generally participates
in the lateral view is very likely to be a very long value.
If Doris still retain this column after calculating the lateral view,
it need to perform a memory copy.
However, in many cases, the upper plan node does not need the original columns of the lateral view,
so it is necessary to perform column pruning after the calculation of the lateral view,
so as to avoid useless memory copy and network transmission.
For example, the following query can prune the original column v1
```select k1, e1 from table lateral view explode_split(v1, ",") tmp as e1;```
The `outputSlotIds` in TableFunctionNode is used to store the columns that should be retained after pruning.
* Support scalar function in lateral view
The child 0 of explode_split function could be a scalar function
such as: concat(k1, ",", k2)
This pr mainly detects whether the lateral view with function satisfies the following specifications in semantics.
1. The columns in the function must all belong to the original table
2. The function must be a scalar function
1. The clang format action will be triggered when a PR is submitted.
2. Skywalking eyes actions will be triggered when a PR is submitted and after merging to master branch.
We found that many commit messages submitted at present have ambiguous information.
Clear commit messages can help developers submit pull requests more readable,
committers merge easily and Release Manager easy to release.
Therefore, we have sorted out a version of the commit format specification.
We hope that subsequent contributors can sort out the commit messages according to
the specification when submitting Pull Request.