bp #44959
if you `enable_all_http_auth = true` in be.conf, then restart be, and
keep using `curl -u "xxxx:xxxx" http://127.0.0.1:8040/api/health` while
be is starting. You may encounter a situation where the link does not
return.
Reason:
When be is still starting, there is no information about fe master. When
you make an api request to be http port, be needs to request
authentication information from fe, which will cause it to request a
machine with empty ip and port 0. This rpc call will definitely fail
(this is not equivalent to a password error). After receiving this
failure, be does not `send_reply` to the api requester, so this api
request cannot be returned.
pick: #38489
Usage:
1. `curl http://be_ip:be_host/api/compaction_score?top_n=10` Returns a
json object contains compaction score for top n, n=top_n.
```
[
{
"compaction_score": "5",
"tablet_id": "42595"
},
{
"compaction_score": "5",
"tablet_id": "42587"
},
{
"compaction_score": "5",
"tablet_id": "42593"
},
{
"compaction_score": "5",
"tablet_id": "42597"
},
{
"compaction_score": "5",
"tablet_id": "42589"
},
{
"compaction_score": "5",
"tablet_id": "42599"
},
{
"compaction_score": "5",
"tablet_id": "42601"
},
{
"compaction_score": "5",
"tablet_id": "42591"
},
{
"compaction_score": "5",
"tablet_id": "42585"
},
{
"compaction_score": "4",
"tablet_id": "10034"
}
]
```
If top_n is not specified, return all compaction score for all tablets.
If top_n is illegal, raise an error.
```
invalid argument: top_n=wrong
```
2. `curl http://be_ip:be_host/api/compaction_score?sync_meta=true`
`sync_meta` is only available on cloud mode, will sync meta from meta
service. It can cooperate with top_n.
If add param `sync_meta` on non-cloud mode, will raise an error.
```
sync meta is only available for cloud mode
```
3. In the future, this endpoint may extend other utility, like fetching
tablet compaction score by table id, etc.
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
* [fix](compaction test) show single replica compaction status and fix test (#33076)
* [improve](http action) add http interface to calculate the crc of all files in tablet (#34915)
* [Enhancement](full compaction) Add run status support for full compaction (#34043)
* The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}`
e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084`
If full compaction is running, the output will be
```
{
"status" : "Success",
"run_status" : true,
"msg" : "compaction task for this tablet is running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
else the ouput will be
```
{
"status" : "Success",
"run_status" : false,
"msg" : "compaction task for this tablet is not running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
* 2
* 2
* [Fix](partial update) Fix rowset not found error when doing partial update (#34112)
Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown.
Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.
This PR optimizes some of the logic related to group commit:
1. Improved the error handling when there is insufficient WAL space during import.
2. Accounted for cases where the content length is negative during import.
3. Added missing error log printing in `group_commit_mgr.cpp`.
* When using stream processing frameworks like Flink with group commit mode enabled, the uncertain size of imported data makes such behavior prohibitive. Previously, to simplify the process, the error message for excessive data volume during streamload was combined with the one for group commit mode, leading to confusion for users when encountering errors indicating the data volume is too large during Flink imports. To address this issue, we are adjusting the logic: if a user employs stream processing imports like Flink with group commit mode enabled, we will automatically disable group commit mode, switching to the standard import mode instead. This is the essence of this PR.