Set FE `enable_new_load_scan_node` to true by default.
So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode
to read data
1. Support loading parquet file in stream load with new load scan node.
2. Fix bug that new parquet reader can not read column without logical or converted type.
3. Change jsonb parser function to "jsonb_parse_error_to_null"
So that if the input string is not a valid json string, it will return null for jsonb column in load task.
Add tpch 1g orc test case in hive docker
Refactor some suites dir of catalog test cases.
And "-internal" for dlf endpoint, to support access oss with aliyun vpc.
json reader DCHECK fail because of missing TYPE_STRING
fix bug that if no file is found, the tvf will throw NPE.
The predicate conjuncts can not be pushed down to parquet reader if this is a load task.
Because the predicate should be applied on column of dest table, not on column of source file.
Add a temp property "use_new_load_scan_node" of broker load to make regression test happy.
So that we can use new load scan node for a certain job and avoid setting global FE config.
Issue Number: close#12574
This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file.
TODO:
1. modify `_scann_eof` later.
2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null.
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
1. Refactor the file reader creation in FileFactory, for simplicity.
Previously, FileFactory had too many `create_file_reader` interfaces.
Now unified into two categories: the interface used by the previous BrokerScanNode,
and the interface used by the new FileScanNode.
And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.
2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode
3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.
4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases
Signed-off-by: eldenmoon <15605149486@163.com>