Commit Graph

53 Commits

Author SHA1 Message Date
dc9cd34047 [docs] Add user manual for hdfs load and transaction. (#7497) 2021-12-30 10:22:48 +08:00
07e2acb2f3 [feature] Suport national secret (national commercial password) algorithm SM3/SM4 (#7464)
SM3 is password hash algorithm
SM4 is a block cipher used to replace DES / AES and other international algorithms.
2021-12-28 10:39:54 +08:00
20ef8a6e21 [feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098)
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
2021-12-22 22:58:23 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
d641a26490 [Refactor] Remove boost filesystem (#5579)
* use std::filesystem instead of boost
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2021-04-08 09:11:59 +08:00
a6e2c3e3f1 [Bug][Clone] Fix the bug that incremental clone is not triggered (#5230)
In version 0.13, we support a more efficient compaction logic. 
This logic will maintain multiple version paths of the tablet.
This can avoid -230 errors and can also support incremental clone.

But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`.
At present, the incremental rowset meta recorded in `incr_rs_meta` and the records
in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the
new multi-version path, resulting in many cases not triggering incremental clone.

This CL mainly modified:

1. Removed `incr_rs_meta` metadata
2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`.
3. Delete a lot of code that was previously used for version compatibility.
2021-02-06 22:04:48 +08:00
72ca5c5f3d [Optimize] Remove path check when start BE (#5268)
remove path check when start BE
2021-01-24 10:14:07 +08:00
58e58c94d8 [TSAN] Fix tsan bugs (part 1) (#5162)
ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread
problems, such as data race, mutex problems, etc.
We should detect TSAN problems for Doris BE, both unit tests and
server should pass through TSAN mode, to make Doris more robustness.
This is the very beginning patch to fix TSAN problems, and some
difficult problems are suppressed in file 'tsan_suppressions', you
can suppress these problems by setting:
export TSAN_OPTIONS="suppressions=tsan_suppressions"

before running:
`BUILD_TYPE=tsan ./run-be-ut.sh --run`
2021-01-15 09:45:11 +08:00
86e40dd3e5 Fix old tablet inserting bug (#5113)
#4996
When BE is restarting and the older tablet have been added to the garbage collection queue but not deleted yet.
In this case, since the data_dirs are parallel loaded, a later loaded tablet may be older than previously loaded one, which should not be acknowledged as a failure.

It should be noted that the _add_tablet_unlocked() method will also be called when creating a new tablet. In that case, the changes in this pull request will not be accessed so there is no affect on the tablet creating process.
2020-12-24 15:20:54 +08:00
f6881d2f7b [Bug] Fix coredump bug when create new tablets (#5089)
There is a bug may cause BE coredump when create tablet,
the accessing of tablet_set of a data dir should be protected by lock.
2020-12-17 00:34:31 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
09f97f8a05 [Refactor] Fixes some be typo part 2 (#4747) 2020-10-20 09:28:57 +08:00
3438a746ac [Typo] Fix typo in metrics macros (#4739)
Just fix typo.
Rename DEFINE_GAUGE_METRIC_PROTOTYPE_5ARG(name, unit) to DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit)
Rename DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) witch define core metrics to DEFINE_GAUGE_CORE_METRIC_PROTOTYPE_2ARG(name, unit)
2020-10-15 19:56:43 +08:00
eba595583e [Optimize] Optimize the execution model of compaction to limit memory consumption (#4670)
Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. 
Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks
is limited by semaphore.
If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM
will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. 
Therefore, concurrency limitation is not enough.

The strategy proposed in #3624  may be effective to solve the OOM. 

A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model.
Producer will try to generate compaction tasks and acquire `permits` for each task. 
The compaction task which can hold `permits` will be executed in thread pool and each finished task will
release its `permits`.

`permits` should be applied for before a compaction task can execute. When the sum of `permits`
held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed,
until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here.

To some extent, memory consumption can be limited by setting appropriate `permits` threshold.
2020-10-11 11:39:25 +08:00
985ec98a9b [Bug] Fix the calculation of the variable "left_bytes" in data_dir.cpp (#4609)
Suppose that the current available byte in a disk is `_available_bytes`. After the data with size of `incoming_data_size` is written in the disk,  the left space in the disk should be calculated as follow :
  `int64_t left_bytes = _available_bytes - incoming_data_size;`
rather than:
  `int64_t left_bytes = _disk_capacity_bytes - _available_bytes - incoming_data_size;`
2020-09-20 20:54:13 +08:00
4571b09dd6 [storage][compatibility] Add meta format detection to prevent data loss. (#4539)
After 0.12 version, doris remove the format convert functiion which can convert from hdr_ format
to tabletmeta_ format when loading metas, the commit link: 3bca253

When we update doris version and there are old format meta in storage,
BE will not read the old format tablet. It can lead to data loss.

So we add meta format detection function to prevent data loss.
When there are old format meta in olap_meta, BE can find and print log or exit.
2020-09-13 11:58:22 +08:00
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
b85bb0e2e9 [Bug-Fix] Some deleted tablets are not recycled on BE (#4401) 2020-08-27 12:09:19 +08:00
664e6a5898 [Storage] "align_tag_path" and ALIGN_TAG_PREFIX is needless (#4410) 2020-08-26 10:47:21 +08:00
4c571cb6f5 Revert "[Metrics] Support tablet level metrics (#4327)" (#4397)
This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-19 22:37:52 +08:00
56260a65c8 [Metrics] Support tablet level metrics (#4327)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-08-18 16:56:12 +08:00
e71152132c [metrics] Redesign metrics to 3 layers (#4115)
Redesign metrics to 3 layers:
    MetricRegistry - MetricEntity - Metrics
    MetricRegistry : the register center
    MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several 
        MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift 
        clients and servers, and so on. 
    Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, 
        thrift_opened_clients on a thrift_client entity.
    MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across 
        different MetricEntities.
2020-08-08 11:23:01 +08:00
da921928d0 [Code]Fix some spell problem (#4066)
fix some spell problem
2020-07-13 20:54:31 +08:00
93a0b47d22 Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762)" (#3931)
This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.
2020-06-24 10:13:45 +08:00
ca96ea3056 [Memory Engine] MemTablet creation and compatibility handling in BE (#3762) 2020-06-18 09:56:07 +08:00
e4dc2ec440 [StorageEngine] Make StorageEngine::open return more detailed info (#3761)
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
2020-06-07 10:21:33 +08:00
ed9022a908 Ignore broken disk when BE starts up (#3741) 2020-06-05 10:26:07 +08:00
f6b5c8839b [Bug] Ignore loading DELETE status tablet error when restarting BE (#3641)
Fix: #3640 

Also add a `batch delete meta` feature for `meta tool`
Fix #3639
2020-05-21 19:08:28 +08:00
6be7a6232f [Config] Add ignore config to determine whether to continue to start be when load tablet from header failed. (#3632)
Add config ignore_load_tablet_failure to determine whether to continue to start be when load tablet from header failed.
2020-05-20 09:40:50 +08:00
f39c8b156d [refactor] A small refactor on class DataDir (#3276)
main refactor points are:
- Use a single get_absolute_tablet_path function instead of 3
  independent functions
- Remove meaningless return value of register_tablet and deregister_tablet
- Some typo and format
2020-04-10 00:32:22 +08:00
e32ed28bf4 [Storage] Use getmntent_r() for thread-safe (#3284) 2020-04-09 14:19:09 +08:00
5f9359d618 Use SleepFor() instead of usleep() (#3211) 2020-03-29 14:18:19 +08:00
64a06ea9d4 [UT] Fix some BE unit tests (#3110)
And also support graceful exit for StorageEngine to avoid hang too long
time in unit test.
2020-03-16 13:31:44 +08:00
f2d2e4bffd [Unused] Remove unused GC function in DataDir (#3019) 2020-02-28 21:47:41 +08:00
7c4149cf27 Improve comparison and printing of Version (#2796)
* Improve comparison and printing of Version

There are two members in `Version`:` first` and `second`.
There are many places where we need to print one `Version` object  and
compare two `Version` objects, but in the current code, these two members
are accessed directly, which makes the code very tedious.

This patch mainly do:
1. Adds overloaded methods for `operator<<()` for `Version`, so
   we can directly print a Version object;
2. Adds the `cantains()` method to determine whether it is an containment
   relationship;
3. Uses `operator==()` to determine if two `Version` objects are equal.

Because there are too many places need to be modified, there are still some
naked codes left, which will be modified later.

This patch also removes some necessary header file references.

No functional changes in this patch.
2020-01-19 18:04:28 +08:00
324f1b8f51 Unify the type of path_hash to size_t (#2324)
The type of path hash should be `size_t`(i.e. `uint32_t`),
but the current code mixes `int64_t`, ` int32_t` and `size_t`
2019-11-28 18:48:52 +08:00
fda46654a2 Support setting properties for storage_root_path (#2235)
We can specify the properties of storage_root_path by setting ':', seperate by ','
e.g.
storage_root_path = /home/disk1/palo,medium:ssd,capacity:50
2019-11-22 18:12:26 +08:00
d0316d158d Refactor and reorganize the file utils (#2089) 2019-11-11 20:25:41 +08:00
6b4ef34162 fix AlphaRowsetTest by remove StorageEngine #2078 (#2091) 2019-10-30 19:39:41 +08:00
3bca253fb3 Fix beta rowset read slow (#1994)
[Bug][BetaRowset] fix beta rowset read slowly with limit

beta rowset do not update raw_rows_read in statistics and will read all
data in tablet when query with limit, which lead to long query time.
2019-10-17 19:19:46 +08:00
f130bd3e7b Use Env function to operate directory (#1980)
Now Env has unify all environment operation, such as file operation.
However some of our old functions don't leverage it. This change unify
FileUtils::scan_dir to use Env's function.
2019-10-15 09:25:12 +08:00
948f497dd4 Fix some warning when compile type is debug using -Werror flag(because of use deprecated funcrtions) (#1953) 2019-10-11 17:21:41 +08:00
e4f3e8fda7 Remove redundant method in rowset meta manager (#1949) 2019-10-10 19:29:59 +08:00
b72a4a4bc6 Add tablet meta checkpoint mechanism (#1936) 2019-10-10 09:39:02 +08:00
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00
f58a222da7 Fix bug that the calculation of disk usage percent is wrong (#1791)
This bug may cause unable to load data
2019-09-12 14:37:20 +08:00
981e0feb99 Check rowset is useful atomicly (#1750)
* Check rowset is useful atomicly

* Only release rowset id when it is added to unused rowset

* remove release rowset id when save rowset meta
2019-09-06 17:21:42 +08:00
85940a292b RowsetFactory as a single entry for Rowset creation (#1748) 2019-09-05 18:10:18 +08:00
a63989cc61 Use RowsetFactory to create and init RowsetWriter (#1740) 2019-09-04 17:02:43 +08:00