Stream load should read all the data completely before parsing the json.
And also add a new BE config streaming_load_max_batch_read_mb
to limit the data size when loading json data.
Fix the bug of loading empty json array []
Add doc to explain some certain case of loading json format data.
Fix: #4124
This CL mainly changes:
1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent.
2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly.
3. See `load-json-format.md` to get details of loading json format.
After checking, I found that broker load in 0.11 added num_of_columns_from_file parameter in thrift. This parameter does not consider compatibility in BE.
So broker load could cause BE crashed during the upgrade
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv
This patch is able to parse columns from file path like in Spark(Partition Discovery).
This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
Currently, Doris only support UTF-8 encoded data. All data will be
shown to user in UTF-8 format. So if data loaded in Doris does not
UTF-8 encoded, user will see garbled data when querying.
I introduce a fast UTF-8 validator from
https://github.com/lemire/fastvalidate-utf-8
This validator is highly optimized that it only takes 0.7 CPU cycles
to validata a 64k string. And by testing 1GB data load to Doris, the
validator has no impact on performance.
This change adds a load property named strict_mode which is used to prohibit the incorrect data.
When it is set to false, the incorrect data will be loaded by NULL just like before.
When it is set to true, the incorrect data which belongs to a column without expr will be filtered.
The strict_mode is supported in broker load v2 now. It will be supported in stream load later.
1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication.
2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS.
3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user.
4. A lot of bugs fixed.
5. Performance improvement.