* [Fix] Fix bug that rowset meta is deleted after compaction
After compaction, the tablet rowset meta will be modified by
adding to new output rowsets and deleting the old input rowsets.
The output version may equals to the input version.
So we should delete the "input" version from _rs_version_map
before adding the "output" version to _rs_version_map. Otherwise,
the new "output" version will be lost in _rs_version_map.
Adding a new storage engine, we need to make an extensible tablet interface, so olap/StorageEngine can support and manage new tablet types.
To start, this commit creates a class BaseTablet and make Tablet and new MemTablet inherit
this base class, some common fields & methods are moved to BaseTablet class, which fields
and methods belong to base/old class is not finalized yet, it will change as the project evolves.
Fix#3384
There is no functional changes in this patch.
Key refactor points are:
- Remove meaningless return value of functions in class Tablet, and
also some related functions in other classes
- Allow RowsetGraph::capture_consistent_versions to pass a nullptr
to the output parameter
- Use CHECK instead of LOG(FATAL) to simplify code
This CL fixes#3270 by skipping recently added version when performing cumulative compaction. A new config named "cumulative_compaction_skip_window_seconds" is added to adjust the time window.
When calculating the cumulative point at first time, we should stop increasing
the cumulative point when we meet a rowset with overlap flag as OVERLAPPING,
even if it has only one segments.
This CL try to fix a potential bug describe in ISSUE: #3097. But I'm not sure this is the root cause.
Also remove lots of verbose log, and fix a memory leak.
1. Add some comments to make the code easier to understand;
2. Make the metric `create_tablet_requests_failed` to be accurate;
3. Some internal methods use naked pointers directly instead of `shared_ptr`;
4. The `using` in `.h` files are contagious when included by other files,
so we should only use it in `.cpp` files;
5. Some formatting changes: such as wrapping lines that are too long
6. Parameters that need to be modified, use pointers instead of references
No functional changes in this patch.
When using an iterator of _tablet_map.tablet_arr(`std::list`) to remove
a tablet, we should first remove tablet from _partition_map to avoid
the iterator becoming invalid.
* Improve comparison and printing of Version
There are two members in `Version`:` first` and `second`.
There are many places where we need to print one `Version` object and
compare two `Version` objects, but in the current code, these two members
are accessed directly, which makes the code very tedious.
This patch mainly do:
1. Adds overloaded methods for `operator<<()` for `Version`, so
we can directly print a Version object;
2. Adds the `cantains()` method to determine whether it is an containment
relationship;
3. Uses `operator==()` to determine if two `Version` objects are equal.
Because there are too many places need to be modified, there are still some
naked codes left, which will be modified later.
This patch also removes some necessary header file references.
No functional changes in this patch.
time(NULL) returns second-resolution timestamp, however all compaction related time in Tablet are in millis-resolution. Therefore should use UnixMillis() instead.
Support compaction operation to compact only one rowset.
After the modification, the last rowset of the tablet will
also be compacted.
At the same time, we added a `segments_overlap_pb` field to
the rowset meta. Used to describe whether the segment data
in the rowset overlaps. This field is set by `rowset_writer`.
Initially UNKNOWN for compatibility with existing data.
In addition, the version hash of the rowset generated after
compaction is directly set to the version hash of last rowset
participating in compaction, to ensure that the tablet's
version hash remains unchanged after compaction.
When there are to many segment in one rowset, which is larger than
BE config 'max_cumulative_compaction_num_singleton_deltas', the
cumulative compaction will not work and just increase the cumulative
point, because there is only once rowset being selected.
So when selecting rowset for cumulative compaction, we should meet 2
requirments before finishing the selection logic:
1. compaction score is larger than 'max_cumulative_compaction_num_singleton_deltas'
2. at least 2 rowsets are selected.
The current compaction selection strategy and cumulative point update logic
will cause the cumulative compaction to not work, and all compaction tasks
will be completed only by the base compaction. This can cause a large number
of data versions to pile up.
In the current cumulative point update logic, when a cumulative cannot select
enough number of rowsets, it will directly increase the cumulative point.
Therefore, when the data version generates the same speed as the cumulative
compaction polling, it will cause the cumulative point to continuously increase
without triggering the cumulative compaction.
The new strategy mainly modifies the update logic of cumulative point to ensure
that the above problems do not occur. At the same time, the new strategy also
takes into account the problem that compaction cannot be performed if cumulative
points stagnate for a long time. Cumulative points will be forced to increase
through threshold settings to ensure that compaction has a chance to execute.
Also add a new HTTP API to view the compaction status of specified tablet.
See `compaction-action.md` for details.
Add a flag in RowsetMeta to record whether it has been deleted from rowset meta.
Before this PR, 37156 rowsets only cost 1642 s.
With this PR, 37319 rowsets just cost 1 s.
[Metric] Add tablet compaction score metrics
Backend:
Add metric "tablet_max_compaction_score" to monitor the current max compaction
score of tablets on this Backend. This metric will be updated each time
the compaction thread picking tablets to compact.
Frontend:
Add metric "tablet_max_compaction_score" for each Backend. These metrics will
be updated when backends report tablet.
And also add a calculated metric "max_tablet_compaction_core" to monitor the
max compaction core of tablets on all Backends.
[Storage]
This PR implements thread-safe `Rowset::load()` for both AlphaRowset and BetaRowset. The main changes are
1. Introduce `DorisCallOnce<ReturnType>` to be the replacement for `DorisInitOnce` . It works for both Status and OLAPStatus.
2. `segment_v2::ColumnReader::init()` is now implemented by DorisCallOnce.
3. `segment_v2::Segment` is now created by a factory open() method. This guarantees all Segment instances are in opened state.
4. `segment_v2::Segment::_load_index()` is now implemented by DorisCallOnce.
5. Implement thread-safe load() for AlphaRowset and BetaRowset
When making snapshot for incremental clone, missed singleton versions may have been compacted.
So get_rowset_by_version() will not acquire any rowset.
This rowset should be acquired from _inc_rs_version_map.
In previous compaction, only rowsets will be taken into consideration.
Doing streaming load, the singleton rowset may is made up of many overlapping segments.
Scanning these overlapping segments will result in read amplification.
To address this problem, overlapping segments should be taken into consideration
when doing cumulative compaction to reduce read amplification.
1. Calculate cumulative point when loading tablet first time.
2. Simplify pick rowsets logic upon delete predicate.
3. Saving meta and modify rowsets only once after cumulative compaction.
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.
1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
Nameing rowset with tablet_id and version will lead to
many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
with different format.