1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
1. Add a custom thirdparty download url.
2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
* [Bug]:fix when data null , throw NullPointerException
* [Bug]:Distinguish between null and empty string
* [Feature]:flink-connector supports streamload parameters
* [Fix]:code style
* [Fix]: support json format import and use httpclient to streamload
* [Fix]:remove System out
* [Fix]:upgrade httpclient version
* [Doc]: add json format import doc
Co-authored-by: wudi <wud3@shuhaisc.com>
1. Fix bug of UNKNOWN Operation Type 91
2. Support using resource_tag property of user to limit the usage of BE
3. Add new FE config `disable_tablet_scheduler` to disable tablet scheduler.
4. Add documents for resource tag.
5. Modify the default value of FE config `default_db_data_quota_bytes` to 1PB.
6. Add a new BE config `disable_compaction_trace_log` to disable the trace log of compaction time cost.
7. Modify the default value of BE config `remote_storage_read_buffer_mb` to 16MB
8. Fix `show backends` results error
9. Add new BE config `external_table_connect_timeout_sec` to set the timeout when connecting to odbc and mysql table.
10. Modify issue template to enable blank issue, for release note or other specific usage.
11. Fix a bug in alpha_row_set split_range() function.
This reverts commit dedb57f87e31305db3e2a13e374ba4fd58043fca.
Reverts #6252
This commit may cause tablet which segments are all empty never to compaction, and results in -235 error.
I will revert this commit, and the problem will be solved in #6671
1. Fix the problem that the WITH statement cannot be printed when `UNION` is included in SQL
2. In the `toSql` method, convert the normal VIEW into the final statement
3. Replace `selectStmt.originSql` with `selectStmt.toSql`
when use 3 FE follower, when restart the fe, and regardless of order, we probability can't start fe success,
and bdb throw RollbackException,
In this scenario, the bdb suggests to catch the exception, simply closing all your ReplicatedEnvironment handles,
and then reopening.
so we catch the RollbackException, and reopen the ReplicatedEnvironment
1、Fix bug that the sync jobs are not cancelled after deleting the database.
2、The MySQL and Doris tables should have a one-to-one correspondence.
If they are not, they should fail when creating the task.
3、When the cluster has multiple FE, the non-master will core when replay create the sync job.
4、Inconsistent data when updating key column
5、Failed to synchronize data when there are multiple tables in single sync job.
6、After restarting the master, resuming the paused syncjob will fail.
In some cases, the query plan thrift structure of a query may be very large
(for example, when there are many columns in SQL), resulting in a large number
of "send fragment timeout" errors.
This PR adds an FE config to control whether to transmit the query plan in a compressed format.
Using compressed format transmission can reduce the size by ~50%. But it may reduce
the concurrency by ~10%. Therefore, in the high concurrency small query scenario,
you can choose to turn off compaction.
This demo includes reading hdfs files and writing doris through streaming load、 reading kafka message queues and writing doris through streaming load and reading doris tables through spark doris connector to build DataFrame dataset.
Support hdfs in select outfile clause without broker.
This PR implement a HDFS writer in BE which is used to write HDFS file directly without using broker.
Also the hdfs outfile clause syntax check has been added in FE.
The syntax:
```
select * from xx into outfile "hdfs://user/outfile_" format as csv
properties ("hdfs.fs.dafultFS" = "xxx", "hdfs.hdfs_user" = "xxx");
```
Note that all hdfs configurations need to carry a prefix `hdfs.`.
1. Fix a memory leak in `collect_iterator.cpp` (Fix#6700)
2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix#6701)
3. Make the error msg of stream load more friendly.
1.Fix a potential BE coredump of sending batch when loading data. (Fix [Bug] BE crash when loading data #6656)
2.Fix a potential BE coredump when doing schema change. (Fix [Bug] BE crash when doing alter task #6657)
3.Optimize the metric of base_compaction_request_failed.
4.Add Order column in show tablet result. (Fix [Feature] Add order column in SHOW TABLET stmt result #6658)
5.Fix bug that tablet repair slot not being released. (Fix [Bug] Tablet scheduler stop working #6659)
6.Fix bug that REPLICA_MISSING error can not be handled. (Fix [Bug] REPLICA_MISSING error can not be handled. #6660)
7.Modify column name of SHOW PROC "/cluster_balance/cluster_load_stat"
8.Optimize the result of SHOW PROC "/statistic" to show COLOCATE_MISMATCH tablets (Fix [Feature] the health status of colocate table's tablet is not shown in show proc statistic #6663)
9.Fix bug that show load where state='pending' can not be executed. (Fix [Bug] show load where state='pending' can not be executed. #6664)
This commit is going to reduce thread number of SyncJob .
1、Submit send task to thread pool to send data.
2、Submit eof task to thread pool to block and wake up client to commit transactions.
3、Use SerialExecutorService to ensure correct order of sent data in every channel.
Besides,some bugs have been fixed in this commit
1、Failed to resume syncJob.
2、Failed to do sync data when set multiple tables in a syncJob.
3、In a cluster with multiple Fe, master may hang up after creating syncJob.
The current SqlCache sql_key is generated by taking the md5 value of selectStmt.toSql(), but selectStmt.toSql() is spliced through the operator tree, and sometimes some specific parameters cannot be displayed, resulting in sql hits with different parameters The same cache is used, and the query results are inconsistent with expectations.
For example, our user has a sql with more than 300 rows, which contains a lot of parameters, including partitions. But the result of selectStmt.toSql() is:
SELECT `tb`.`type` AS `type`, `tb`.`name` AS `name`, `tb`.`name1` AS `name1`, `tb`.`name2` AS `name2`, `tb`.`name3` AS `name3`
FROM (
SELECT 3 AS `type`, `cc`.`name` AS `name`, `cc`.`name1` AS `name1`
, coalesce(`bb`.`name`, '请联系您的品牌业务经理进行咨询。') AS `name2`, `bb`.`name1` AS `name3`
FROM `cc`
LEFT JOIN `bb` ON `cc`.`id` = `bb`.`id1`
UNION ALL
SELECT `dd`.`type` AS `type`, `dd`.`name` AS `name`, `dd`.`name1` AS `name1`, `dd`.`name2` AS `name2`, `dd`.`name3` AS `name3`
FROM `dd`
UNION ALL
SELECT `ee`.`type` AS `type`, `ee`.`name` AS `name`, `ee`.`name1` AS `name1`, `ee`.`name2` AS `name2`, `ee`.`name3` AS `name3`
FROM `ee`
) tb
LIMIT 10
In this way, the user specified different partitions for query, and the same cache was queried, which was inconsistent with the expected result. Therefore, it is recommended to use originStmt instead of selectStmt.toSql() to generate sql_key.
1. Support boolean data type for spark-doris-connector because Doris has previously supported the boolean data type
2. Bug-Fix for the Doris BE core when spark request data from be
* [BUG][Profile] Fixed the problem that BE's profile could not add child profile in the specified correct location
bug:
runtime_profile()->add_child(build_phase_profile, false, nullptr);
child profile will add to second location
* Update runtime_profile.cpp