Transfer RowBatch in Protobuf Request to Controller Attachment,
when the maximum length of the RowBatch in the Protobuf Request is exceeded.
This can avoid reaching the upper limit of the Protobuf Request length (2G),
and it is expected that performance can be improved.
The broker scan node has two tuple descriptors:
One is dest tuple and the other is src tuple.
The src tuple is used to read the lines of the original file,
and the dest tuple is used to save the converted lines.
The preceding filter is executed on the src tuple, so src tuple descriptor should be used
to initialize the filter expression
Increase compatibility with mysql
1. Added two system tables files and partitions
2. Improved the return logic of mysql error code to make the error code more compatible with mysql
3. Added lock/unlock tables statement and show columns statement for compatibility with mysql dump
4. Compatible with mysqldump tool, now you can use mysql dump to dump data and table structure from doris
now use mysqldump may print error message like
```
$ mysqldump -h127.0.0.1 -P9130 -uroot test_query_qa > a
mysqldump: Error: 'errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): `EXTRA`' when trying to dump tablespaces
```
This error message not effect the export file, you can add `--no-tablespaces` to avoid this error
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
Users can directly query the data in the hive table in Doris, and can use join to perform complex queries without laboriously importing data from hive.
Main changes list below:
FE:
Extend HiveScanNode from BrokerScanNode
HiveMetaStoreClientHelper communicate with HIVE and HDFS.
BE:
Treate HiveScanNode as BrokerScanNode, treate HiveTable as BrokerTable.
broker_scanner.cpp: suppot read column from HDFS path.
orc_scanner.cpp: support read hdfs file.
POM:
Add hive.version=2.3.7, hive-metastore and hive-exec
Add hadoop.version=2.8.0, hadoop-hdfs
Upgrade commons-lang to fix incompatiblity of Java 9 and later.
Thrift:
Add THiveTable
Add read_by_column_def in TBrokerRangeDesc
in debug mode,query memory not enough, may cause be down
fe set useStreamingPreagg true, but be function CreateHashPartitions check is_streaming_preagg_ should false.
then casue core dump.
```
*** Check failure stack trace: ***
@ 0x2aa48ad google::LogMessage::Fail()
@ 0x2aa6734 google::LogMessage::SendToLog()
@ 0x2aa43d4 google::LogMessage::Flush()
@ 0x2aa7169 google::LogMessageFatal::~LogMessageFatal()
@ 0x24703be doris::PartitionedAggregationNode::CreateHashPartitions()
@ 0x2468fd6 doris::PartitionedAggregationNode::open()
@ 0x1e3b153 doris::PlanFragmentExecutor::open_internal()
@ 0x1e3af4b doris::PlanFragmentExecutor::open()
@ 0x1d81b92 doris::FragmentExecState::execute()
@ 0x1d840f7 doris::FragmentMgr::_exec_actual()
```
we should remove DCHECK(!is_streaming_preagg_)
Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache
add a config used for auto check and reset bprc stub
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
1. optimize error message when using batch delete
2. rename session variable is_report_success to enable_profile
3. add table name to OlapScanner profile
This CL mainly changes:
1. Add star schema benchmark tools in `tools/ssb-tools`, for user to easy load and test with SSB data set.
2. Disable the segment cache for some read scenario such as compaction and alter operation.(Fix#6924 )
3. Fix a bug that `max_segment_num_per_rowset` won't work(Fix#6926)
4. Enable `enable_batch_delete_by_default` by default.
1. Refactor the create method of hdfs reader & writer.
libhdfs3 does not support arm64. So we should not support hdfs reader & writer on arm64.
2. And micro for LowerUpperImpl
while query with multi where conditions, such as `where dt in (20210926,20210919) and hour<=13`,
will cause int * int product overflow result. and then in the function extend_scan_key will call
`range.convert_to_fixed_value()` mistakenly. And for a big `range[_low_value, _high_value)`,
mass value will be inserted into _fixed_values, result in oom finally.
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
1. Add a custom thirdparty download url.
2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
1. Fix bug of UNKNOWN Operation Type 91
2. Support using resource_tag property of user to limit the usage of BE
3. Add new FE config `disable_tablet_scheduler` to disable tablet scheduler.
4. Add documents for resource tag.
5. Modify the default value of FE config `default_db_data_quota_bytes` to 1PB.
6. Add a new BE config `disable_compaction_trace_log` to disable the trace log of compaction time cost.
7. Modify the default value of BE config `remote_storage_read_buffer_mb` to 16MB
8. Fix `show backends` results error
9. Add new BE config `external_table_connect_timeout_sec` to set the timeout when connecting to odbc and mysql table.
10. Modify issue template to enable blank issue, for release note or other specific usage.
11. Fix a bug in alpha_row_set split_range() function.
Support hdfs in select outfile clause without broker.
This PR implement a HDFS writer in BE which is used to write HDFS file directly without using broker.
Also the hdfs outfile clause syntax check has been added in FE.
The syntax:
```
select * from xx into outfile "hdfs://user/outfile_" format as csv
properties ("hdfs.fs.dafultFS" = "xxx", "hdfs.hdfs_user" = "xxx");
```
Note that all hdfs configurations need to carry a prefix `hdfs.`.
1. Fix a memory leak in `collect_iterator.cpp` (Fix#6700)
2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix#6701)
3. Make the error msg of stream load more friendly.
1.Fix a potential BE coredump of sending batch when loading data. (Fix [Bug] BE crash when loading data #6656)
2.Fix a potential BE coredump when doing schema change. (Fix [Bug] BE crash when doing alter task #6657)
3.Optimize the metric of base_compaction_request_failed.
4.Add Order column in show tablet result. (Fix [Feature] Add order column in SHOW TABLET stmt result #6658)
5.Fix bug that tablet repair slot not being released. (Fix [Bug] Tablet scheduler stop working #6659)
6.Fix bug that REPLICA_MISSING error can not be handled. (Fix [Bug] REPLICA_MISSING error can not be handled. #6660)
7.Modify column name of SHOW PROC "/cluster_balance/cluster_load_stat"
8.Optimize the result of SHOW PROC "/statistic" to show COLOCATE_MISMATCH tablets (Fix [Feature] the health status of colocate table's tablet is not shown in show proc statistic #6663)
9.Fix bug that show load where state='pending' can not be executed. (Fix [Bug] show load where state='pending' can not be executed. #6664)
1. insert very large string value may coredump
2. some analitic functiuon and agg function result may be incorrect
3. string compare may be coredump when string type is too large
4. string type in delete condition can not process correctly
5. add text/blob as alias of string to compitable with mysql
6. fix string type min/max agg may process incorrectly
This pr mainly supports
1. Export query result sets concurrently
2. Query result set export supports s3 protocol
Among them, there are several preconditions for concurrently exporting query result sets
1. Enable concurrent export variables
2. The query itself can be exported concurrently
(some queries containing sort nodes at the top level cannot be exported concurrently)
3. Export the s3 protocol used instead of the broker
After exporting the result set concurrently,
the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}
```
SELECT count(distinct products_id) FROM a_table as a WHERE 1=1 AND products_id in ( SELECT products_id from b_table );
```
Because hash table construction errors may lead to unstable results
* [Optimize] optimize the speed of converting integer to string
* Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient
Co-authored-by: caiconghui <caiconghui@xiaomi.com>