SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr,
we implement replicate for ARRAY type, to support SQL like this:
`SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM lineorder, date WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`
## Design:
For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types.
There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes.
1. `const TypeInfo* get_scalar_type_info(FieldType field_type)`
The function is used to get the type info of scalar types. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
2. `const TypeInfo* get_collection_type_info(FieldType sub_type)`
The function is used to get the type info of array types with just **ONE** depth. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)`
4. `TypeInfoPtr get_type_info(const TabletColumn* col)`
These functions are used to get the type info of **BOTH** scalar types and composite types. The caller should be responsible to manage the resources returned.
#### About the new type `TypeInfoPtr`
`TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter.
1. For scalar types, the deleter does nothing.
2. For composite types, the deleter reclaim the memory.
By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr:
1. `Field`
2. `ColumnReader`
3. `DefaultValueColumnIterator`
Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance.
1. `ScalarColumnWriter` - holds `Field`
1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter`
1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types.
2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and `BitmapIndexWriter` doesn't support `ArrayType`.
3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
1. `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types.
2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types).
3. `ColumnVectorBatch`
1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `IndexedColumnReader`
2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader`
3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`
1. add a new config in regression-conf.groovy
enableHdfs, default is false, to skip tests with hdfs
2. fix a bug that when double type column result is null, exception will be thrown
- Add two new types to stream load boker load: **csv_with_names** and **csv_with_name_sand_types**
- Add two new types to export: **csv_with_names** and **csv_with_names_and_types**
This reverts commit de7dce4df84fcbfbbaf715cbac151e802321f80f.
Reverts apache/incubator-doris#8976
This cause BE ut failed: sh run-be-ut.sh --run --filter OlapTableSinkTest.*
```
==62008==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ffff36867c0 in thread T0
```
* [Forbidden](Vec) Switch to non-vec engine when outer join + not null column
Vectorized code will occur `core` in the case of ```outer join + not null column```, such as issue #7901
So we need to fall back from vectorized mode to non-vectorized mode when we encounter this situation.
If the nullside column of the outer join is a column that must return non-null like count(*)
then there is no way to force the column to be nullable.
At this time, vectorization cannot support this situation,
so it is necessary to fall back to non-vectorization for processing.
For example:
Query: set enable_vectorized_engine=true
Query: select * from t1 left join (select k1, count(k2) as count_k2 from t2 group by k1) tmp on t1.k1=tmp.k1
Result: Query goes non-vectorized engine