Define a new file scanner node for hms table in be.
This file scanner node is different from broker scan node as blow:
1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot
and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot.
2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed.
3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type,
but FileScanNode will have the final type.
Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.
doris on es8 can not work, because type change. The use of type is no longer recommended in es7,
and support for type has been removed from es8.
1. /_mapping not support include_type_name
2. /_search not support use type
In a vectorized scenario, the query plan will generate a new tuple for the join node.
This tuple mainly describes the output schema of the join node.
Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema.
For example:
1. The case where the null side column caused by outer join is converted to nullable.
2. The projection of the outer tuple.
When I builded doris be with ubsan enabled and enabled vectorization,
be core dump at doris::DecimalV2Value::operator long(). It cored
because accessing on a non-aligned address by sse.
With ubsan enabled, compile generates different assemble code including
sse instruction.
A sender serializes tuples to a contiguous memory area, while a receiver
just copy it. So we should align each tuple offset to 16 bytes.
For compatibility, we should use a config to control it.
BTW: with tools like ubsan, asan, tsan we can find bugs more easily,
e.g. #8815. It is difficult to find the bug without ubsan.
Anyway, we should use modern tools to be more productive.
SEQ_COL is used on tables with unique key to order data in one transaction(rowset),
when there is only one rowset and the rowset is compacted, rows in the rowset is sorted
and rows with same keys are resolved by compaction, so a scanner sets direct_mode to
optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL.
However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator.
Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator
creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator
does not handle it, so a core dump happen.
Actually, in the above case, segment iterator does not need to read SEQ_COL.
When SEQ_COL is really needed, iterators creates SEQ_COL column in block,
so segment Iterator does not need do create SEQ_COL at all.
```cpp
for (uint16_t i = 0; i < *size; ++i) {
// some code here
}
```
The value of size is read for each conditional test, which also prevents possible vectorization.