For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
In version 0.13, we support a more efficient compaction logic.
This logic will maintain multiple version paths of the tablet.
This can avoid -230 errors and can also support incremental clone.
But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`.
At present, the incremental rowset meta recorded in `incr_rs_meta` and the records
in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the
new multi-version path, resulting in many cases not triggering incremental clone.
This CL mainly modified:
1. Removed `incr_rs_meta` metadata
2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`.
3. Delete a lot of code that was previously used for version compatibility.
ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread
problems, such as data race, mutex problems, etc.
We should detect TSAN problems for Doris BE, both unit tests and
server should pass through TSAN mode, to make Doris more robustness.
This is the very beginning patch to fix TSAN problems, and some
difficult problems are suppressed in file 'tsan_suppressions', you
can suppress these problems by setting:
export TSAN_OPTIONS="suppressions=tsan_suppressions"
before running:
`BUILD_TYPE=tsan ./run-be-ut.sh --run`
#4996
When BE is restarting and the older tablet have been added to the garbage collection queue but not deleted yet.
In this case, since the data_dirs are parallel loaded, a later loaded tablet may be older than previously loaded one, which should not be acknowledged as a failure.
It should be noted that the _add_tablet_unlocked() method will also be called when creating a new tablet. In that case, the changes in this pull request will not be accessed so there is no affect on the tablet creating process.
Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk.
Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks
is limited by semaphore.
If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM
will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction.
Therefore, concurrency limitation is not enough.
The strategy proposed in #3624 may be effective to solve the OOM.
A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model.
Producer will try to generate compaction tasks and acquire `permits` for each task.
The compaction task which can hold `permits` will be executed in thread pool and each finished task will
release its `permits`.
`permits` should be applied for before a compaction task can execute. When the sum of `permits`
held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed,
until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here.
To some extent, memory consumption can be limited by setting appropriate `permits` threshold.
Suppose that the current available byte in a disk is `_available_bytes`. After the data with size of `incoming_data_size` is written in the disk, the left space in the disk should be calculated as follow :
`int64_t left_bytes = _available_bytes - incoming_data_size;`
rather than:
`int64_t left_bytes = _disk_capacity_bytes - _available_bytes - incoming_data_size;`
After 0.12 version, doris remove the format convert functiion which can convert from hdr_ format
to tabletmeta_ format when loading metas, the commit link: 3bca253
When we update doris version and there are old format meta in storage,
BE will not read the old format tablet. It can lead to data loss.
So we add meta format detection function to prevent data loss.
When there are old format meta in olap_meta, BE can find and print log or exit.
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
Redesign metrics to 3 layers:
MetricRegistry - MetricEntity - Metrics
MetricRegistry : the register center
MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several
MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift
clients and servers, and so on.
Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity,
thrift_opened_clients on a thrift_client entity.
MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across
different MetricEntities.
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
main refactor points are:
- Use a single get_absolute_tablet_path function instead of 3
independent functions
- Remove meaningless return value of register_tablet and deregister_tablet
- Some typo and format
* Improve comparison and printing of Version
There are two members in `Version`:` first` and `second`.
There are many places where we need to print one `Version` object and
compare two `Version` objects, but in the current code, these two members
are accessed directly, which makes the code very tedious.
This patch mainly do:
1. Adds overloaded methods for `operator<<()` for `Version`, so
we can directly print a Version object;
2. Adds the `cantains()` method to determine whether it is an containment
relationship;
3. Uses `operator==()` to determine if two `Version` objects are equal.
Because there are too many places need to be modified, there are still some
naked codes left, which will be modified later.
This patch also removes some necessary header file references.
No functional changes in this patch.
[Bug][BetaRowset] fix beta rowset read slowly with limit
beta rowset do not update raw_rows_read in statistics and will read all
data in tablet when query with limit, which lead to long query time.
Now Env has unify all environment operation, such as file operation.
However some of our old functions don't leverage it. This change unify
FileUtils::scan_dir to use Env's function.