add logs for partial update
the master PR is #35802
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
* [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788)
* **Problem:** For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas.
**Cause:** Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE.
**Solution:** Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs.
* 2
* [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)
* [Enhancement](full compaction) Add run status support for full compaction (#34043)
* The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}`
e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084`
If full compaction is running, the output will be
```
{
"status" : "Success",
"run_status" : true,
"msg" : "compaction task for this tablet is running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
else the ouput will be
```
{
"status" : "Success",
"run_status" : false,
"msg" : "compaction task for this tablet is not running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
* 2
* 2
* [Fix](partial update) Fix rowset not found error when doing partial update (#34112)
Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown.
Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.
* [Fix](partial-update) Fix partial update fail when the datetime default value is 'current_time' (#32926)
* Problem: When importing data that includes datetime with a default value of current time for partial column updates, the import fails.
Reason: Partial column updates do not handle the logic for datetime default values.
Solution: During partial column updates, when the default value is set to current time, read the current time from the runtime state and write it into the data.
* [Enhancement](partial update)Add timezone case for partial update timestamp #33177
* [fix](partial update) Support partial update when the date default value is 'current_date'. This PR is a extension of PR #32926. (#33394)
* Problem:
Inconsistent behavior occurs when executing partial column update `UPDATE` statements and `INSERT` statements on merge-on-write tables with the Nereids optimizer enabled. The number of columns passed to BE differs; `UPDATE` operations incorrectly pass all columns, while `INSERT` operations correctly pass only the updated columns.
Reason:
The Nereids optimizer does not handle partial column update `UPDATE` statements properly. The processing logic for `UPDATE` statements rewrites them as equivalent `INSERT` statements, which are then processed according to the logic of `INSERT` statements. For example, assuming a MoW table structure with columns k1, k2, v1, v2, the correct rewrite should be:
* `UPDATE` table t1 set v1 = v1 + 1 where k1 = 1 and k2 = 2
* =>
* `INSERT` into table (v1) select v1 + 1 from table t1 where k1 = 1 and k2 = 2
However, the actual rewriting process does not consider the logic for partial column updates, leading to all columns being included in the `INSERT` statement, i.e., the result is:
* `INSERT` into table (k1, k2, v1, v2) select k1, k2, v1 + 1, v2 from table t1 where k1 = 1 and k2 = 2
This results in `UPDATE` operations incorrectly passing all columns to BE.
Solution:
Having analyzed the cause, the solution is straightforward: when rewriting partial column update `UPDATE` statements to `INSERT` statements, only retain the updated columns and all key columns (as partial column updates must include all key columns). Additionally, this PR includes error injection cases to verify the number of columns passed to BE is correct.
* 2
* 3
* 4
* 5
Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.
Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.
Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
Refactor write path code by abstract base class. Whether to use `StorageEngine` or `CloudStorageEngine` will be determined during compilation instead of runtime `config::cloud_mode` to avoid unexpected null pointer or undefined behavior issues caused by merging code.
Class that depend on `StorageEngine` but are shared by the cloud mode need to have an abstract base class. Common code should be extracted into the base class, while the code that depends on `StorageEngine` should be implemented in a `StorageEngine` mix-in class of the base class.
1. limit the column count to default 2048
2. fix get_inverted_index return nullptr when variant's unique id is -1, using it's parent unique id instead
3. avoid add same path subcolumn duplicately in tablet schema
4. make extracted column unique id -1
It should be ensured that the obtained versions are continuous when calculate delete bitmap calculations in publish.
The remaining NOTREADY tablet in the schema change failure should be dropped.
When a rowset was deleted, the delete bitmap cannot be deleted until there are no read requests to use the rowset.