code refactor: improve code's readability, avoid const_cast
1. make loop simpler and clearer by using range-based loop grammar, it's safer than old loop style
2. iteration for _row_desc.tuple_descriptors() use index replace index and iterator mixed
3. add new function To cast_to(From from), use this union-based casting between two types to replace reinterpret_cast, this new cast is more readable
4. avoid using the same variable name for nested loop, it's dangerous
5. add const keyword for member functions followed CppCoreGuidelines
Close related #7334
1. Fix bug describe in [Bug] show frontends cause FE oom #7334
2. Fix error of CurrentConnected fields in show frontends result.
3. Add more FAQ
Fix the problem that when the source column of the lateral view comes from a inline view,
the column in the inline view cannot be materialized correctly.
At the same time, fix the problem that the correct output column cannot be projected
when the source column of the lateral view comes from a inline view.
It should be noted that when the column in the query is from a inline view column.
During semantic analysis and planning, it needs to be converted from tuple(virtual) to real tuple.
I found some small problems when I read code. So I add some small enhancement.
1. modify PR template. Now the template of PR isn't simple and clear. It's useful to refactor it.
2. some small change (typo, format .....)
issue: #7230
When getting the latest update time of a table, only compare the partitions of this query,
not all partitions of a table.
The goal is to improve the SqlCache hit rate.
When a broker load's task is failed, it may be retried by holding the
LoadJob's write lock and submit loading task to a thread pool.
But submitting a task to thread pool may be blocked for at most 60 seconds
(depends on BlockPolicy), so it will hold write lock for too long.
the tuple String Slot's ptr and len are not assigned appropriately on send side, the receive side may crash in some situation.
detail description:
on send side, when we call RowBatch::serialize(PRowBatch* output_batch) to pack RowBatch, the Tuple::deep_copy()
will be called, for each String Slot, only String Slots that is not null will set ptr and len with proper value, the null String
Slots will keep original status, the ptr member will point randomly and the len member may unexpect.
on recv side, unpack is processed by RowBatch::RowBatch(const RowDescriptor&, const PRowBatch&...), in this
function, each String Slot will transfer offset to valid string_val->ptr whether the String Slot is null or not.
but some business logic depends on string_val->len=0, such as AggregateFuncTraits::init(), HyperLogLog::deserialize()
will return correctly if slice.size<=0. so if string_val->len is set to 0 in send side, everything will be ok, otherwise server
may crash.
by netcomm viewpoint, we should make sure transfer correct data, it's sender's responsibility to set data with proper
value, and do not make any presume which way the recv side will use it.
Add `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`.
Be consistent with `link-connector` options .
eg:
```scala
df.write
.format("doris")
// specify maximum number of lines in a single flushing
.option("sink.batch.size",2048)
// specify number of retries after writing failed
.option("sink.max-retries",3)
.save()
```
In an HA environment, JE will retains as many reserved files.
the jdbje log become too large.
so we should limit the reserved files size, default set 1GB
Support lateral view of the result column in subquery.
For example:
```
select e1 from (select k2 as a from test_explode group by a) tmp1
lateral view explode_split(a, ",") tmp2 as e1;
```
The lateral view will parse the inline view column
and put the table function node above the subquery.
Transfer RowBatch in Protobuf Request to Controller Attachment,
when the maximum length of the RowBatch in the Protobuf Request is exceeded.
This can avoid reaching the upper limit of the Protobuf Request length (2G),
and it is expected that performance can be improved.
This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH in fe should be 2^24 -1,
but it is set as 2^24 -2 by mistake.
2. Fix bitmap_to_string may failed when the result is large than 2G
The broker scan node has two tuple descriptors:
One is dest tuple and the other is src tuple.
The src tuple is used to read the lines of the original file,
and the dest tuple is used to save the converted lines.
The preceding filter is executed on the src tuple, so src tuple descriptor should be used
to initialize the filter expression
If the calculation of the lateral view function is completed,
the result will be directly returned to the upper layer.
It will cause a lot of memory copy and network transmission.
The reason is that the original column that generally participates
in the lateral view is very likely to be a very long value.
If Doris still retain this column after calculating the lateral view,
it need to perform a memory copy.
However, in many cases, the upper plan node does not need the original columns of the lateral view,
so it is necessary to perform column pruning after the calculation of the lateral view,
so as to avoid useless memory copy and network transmission.
For example, the following query can prune the original column v1
```select k1, e1 from table lateral view explode_split(v1, ",") tmp as e1;```
The `outputSlotIds` in TableFunctionNode is used to store the columns that should be retained after pruning.
* Support scalar function in lateral view
The child 0 of explode_split function could be a scalar function
such as: concat(k1, ",", k2)
This pr mainly detects whether the lateral view with function satisfies the following specifications in semantics.
1. The columns in the function must all belong to the original table
2. The function must be a scalar function