Add `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`.
Be consistent with `link-connector` options .
eg:
```scala
df.write
.format("doris")
// specify maximum number of lines in a single flushing
.option("sink.batch.size",2048)
// specify number of retries after writing failed
.option("sink.max-retries",3)
.save()
```
Transfer RowBatch in Protobuf Request to Controller Attachment,
when the maximum length of the RowBatch in the Protobuf Request is exceeded.
This can avoid reaching the upper limit of the Protobuf Request length (2G),
and it is expected that performance can be improved.
1. The clang format action will be triggered when a PR is submitted.
2. Skywalking eyes actions will be triggered when a PR is submitted and after merging to master branch.
We found that many commit messages submitted at present have ambiguous information.
Clear commit messages can help developers submit pull requests more readable,
committers merge easily and Release Manager easy to release.
Therefore, we have sorted out a version of the commit format specification.
We hope that subsequent contributors can sort out the commit messages according to
the specification when submitting Pull Request.
Now minidump file will be created when BE crashes.
And user can manually trigger a minidump by sending SIGUSR1 to BE process.
More details can be found in minidump.md documents
Add a new field `Lag` in result of `show routine load` stmt.
`Lag: {"0":10, "1":0}` means kafka partition 0 has 10 msg behind and partition 1 is update-to-date.
Users can directly query the data in the hive table in Doris, and can use join to perform complex queries without laboriously importing data from hive.
Main changes list below:
FE:
Extend HiveScanNode from BrokerScanNode
HiveMetaStoreClientHelper communicate with HIVE and HDFS.
BE:
Treate HiveScanNode as BrokerScanNode, treate HiveTable as BrokerTable.
broker_scanner.cpp: suppot read column from HDFS path.
orc_scanner.cpp: support read hdfs file.
POM:
Add hive.version=2.3.7, hive-metastore and hive-exec
Add hadoop.version=2.8.0, hadoop-hdfs
Upgrade commons-lang to fix incompatiblity of Java 9 and later.
Thrift:
Add THiveTable
Add read_by_column_def in TBrokerRangeDesc
The new session variable 'close_join_reorder' is used to turn off all automatic join reorder algorithms.
If close_join_reorder is true, the Doris will execute query by the order in the original query.
1. Migrate some of the best practice articles to the Blog
2. Changed the names of performance tests and best practices to performance tests and examples
Mainly changes:
1. Fix [Bug] Colocate group can not redistributed after dropping a backend #7019
2. Add detail msg about why a colocate group is unstable.
3. Add more suggestion when upgrading Doris cluster.
Add the sharing blog function to the document site, including the blog list and detail page. At the same time, a guide on how to share blogs has been added to the developer guide.
Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache
add a config used for auto check and reset bprc stub
Doris should provide a http api to return backends list for connectors to submit stream load,
and without privilege checking, which can let common user to use it
1. By default , Spark connector must write all fields value to `Doris` table .
In this feature , user can specify part of fields to write , even specify the order of the fields to write.
eg:
I have a table named `student` which has three columns (name,gender,age) ,
creating table sql as following:
```sql
create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2;
```
Now , I just want to write values to two columns : name , gender.
The code as following:
```scala
val df = spark.createDataFrame(Seq(
("m", "zhangsan"),
("f", "lisi"),
("m", "wangwu")
))
df.write
.format("doris")
.option("doris.fenodes", dorisFeNodes)
.option("doris.table.identifier", dorisTable)
.option("user", dorisUser)
.option("password", dorisPwd)
//specify your fields or the order
.option("doris.write.field", "gender,name")
.save()
```