Commit Graph

403 Commits

Author SHA1 Message Date
6726c9bf2f [improvement](compaction) reduce tablet skip compaction time (#44273) (#44791)
pick master #44273

The time for tablet skip compaction is 120 seconds, which is too long.
In the scenario of high-frequency import (mow), it leads to a high
compaction score. Therefore, reducing the skip time to 10 seconds is
necessary.
2024-12-02 10:07:17 +08:00
4031808f00 [enhancement](rowset-meta) Remove rowset meta from olap meta directly when rowsets deleted (#41716) (#43183)
pick: #41716
2024-11-05 00:40:12 +08:00
b489cdf840 [opt](merge-on-write) avoid to check delete bitmap while lookup rowkey in some situation to reduce CPU cost (#41480) (#41439)
## Proposed changes

Issue Number: close #xxx

cherry-pick #41480
2024-10-11 10:15:39 +08:00
0d38a9a36d [feature](restore) support atomic restore (#41107)
Cherry-pick #40353, #40734, #40817, #40876, #40921, #41017, #41083
2024-09-24 09:41:41 +08:00
d1d52ae68c [feature](compaction) Add an http action for visibility of compaction score on each tablet (#38489) (#40826)
pick: #38489 

Usage:
1. `curl http://be_ip:be_host/api/compaction_score?top_n=10` Returns a
json object contains compaction score for top n, n=top_n.
```
[
    {
        "compaction_score": "5",
        "tablet_id": "42595"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42587"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42593"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42597"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42589"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42599"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42601"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42591"
    },
    {
        "compaction_score": "5",
        "tablet_id": "42585"
    },
    {
        "compaction_score": "4",
        "tablet_id": "10034"
    }
]
```
If top_n is not specified, return all compaction score for all tablets.
If top_n is illegal, raise an error.
```
invalid argument: top_n=wrong
```

2. `curl http://be_ip:be_host/api/compaction_score?sync_meta=true`
`sync_meta` is only available on cloud mode, will sync meta from meta
service. It can cooperate with top_n.
If add param `sync_meta` on non-cloud mode, will raise an error.
```
sync meta is only available for cloud mode
```

3. In the future, this endpoint may extend other utility, like fetching
tablet compaction score by table id, etc.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-09-21 20:35:55 +08:00
fbed56639c [fix](compaction) catch exception in compaction (#40900) (#40930)
## Proposed changes
pick #40900 
```

terminate called after throwing an instance of 'doris::Exception'
 {color:red} what():  [E6] Too large string size.{color}

	0#  doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173
	1#  doris::vectorized::read_string_binary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, doris::vectorized::BufferReadable&, unsigned long) at /root/doris_branch-2.1/doris/be/src/vec/io/io_helper.h:177
	2#  doris::vectorized::IAggregateFunctionHelper<doris::vectorized::AggregateFunctionNullUnaryInline<doris::vectorized::AggregateFunctionGroupConcat<doris::vectorized::AggregateFunctionGroupConcatImplStr>, true> >::deserialize_and_merge_from_column_range(char*, doris::vectorized::IColumn const&, unsigned long, unsigned long, doris::vectorized::Arena*) const at /root/doris_branch-2.1/doris/be/src/vec/aggregate_functions/aggregate_function.h:0
	3#  doris::vectorized::IAggregateFunctionHelper<doris::vectorized::AggregateStateUnion>::add_batch_range(unsigned long, unsigned long, char*, doris::vectorized::IColumn const**, doris::vectorized::Arena*, bool) at /root/doris_branch-2.1/doris/be/src/vec/aggregate_functions/aggregate_function.h:0
	4#  doris::vectorized::VerticalBlockReader::_update_agg_value(std::vector<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>, std::allocator<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> > >&, int, int, bool) at /root/doris_branch-2.1/doris/be/src/vec/olap/vertical_block_reader.cpp:326
	5#  doris::vectorized::VerticalBlockReader::_update_agg_data(std::vector<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>, std::allocator<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> > >&) at /root/doris_branch-2.1/doris/be/src/vec/olap/vertical_block_reader.cpp:308
	6#  doris::vectorized::VerticalBlockReader::_agg_key_next_block(doris::vectorized::Block*, bool*) at /root/doris_branch-2.1/doris/be/src/vec/olap/vertical_block_reader.cpp:0
	7#  doris::vectorized::VerticalBlockReader::next_block_with_aggregation(doris::vectorized::Block*, bool*) at /root/doris_branch-2.1/doris/be/src/common/status.h:491
	8#  doris::Merger::vertical_compact_one_group(std::shared_ptr<doris::Tablet>, doris::ReaderType, std::shared_ptr<doris::TabletSchema>, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::vectorized::RowSourcesBuffer*, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*, std::vector<unsigned int, std::allocator<unsigned int> >, long, doris::CompactionSampleInfo*) at /root/doris_branch-2.1/doris/be/src/olap/merger.cpp:0
	9#  doris::Merger::vertical_merge_rowsets(std::shared_ptr<doris::Tablet>, doris::ReaderType, std::shared_ptr<doris::TabletSchema>, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, long, doris::Merger::Statistics*) at /root/doris_branch-2.1/doris/be/src/olap/merger.cpp:445
	10# doris::Compaction::do_compaction_impl(long) at /root/doris_branch-2.1/doris/be/src/olap/compaction.cpp:385
	11# doris::Compaction::do_compaction(long) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1291
	12# doris::CumulativeCompaction::execute_compact_impl() at /root/doris_branch-2.1/doris/be/src/common/status.h:491
	13# doris::Compaction::execute_compact() at /root/doris_branch-2.1/doris/be/src/common/status.h:491
	14# doris::Tablet::execute_compaction(doris::Compaction&) at /root/doris_branch-2.1/doris/be/src/common/status.h:491
	15# std::_Function_handler<void (), doris::StorageEngine::_submit_compaction_task(std::shared_ptr<doris::Tablet>, doris::CompactionType, bool)::$_0>::_M_invoke(std::_Any_data const&) at /root/doris_branch-2.1/doris/be/src/olap/olap_server.cpp:1018
	16# doris::ThreadPool::dispatch_thread() at /root/doris_branch-2.1/doris/be/src/util/threadpool.cpp:0
	17# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
	18# ?
	19# ?

*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 38266341 ***
*** Aborted at 1726615633 (unix time) try "date -d @1726615633" if you are using GNU date ***
*** Current BE git commitID: db06c678a3 ***
*** SIGABRT unknown detail explain (@0x26af02) received by PID 2535170 (TID 2536168 OR 0x7f48c81cd640) from PID 2535170; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris_branch-2.1/doris/be/src/common/signal_handler.h:421
 1# 0x00007F4AB18F8520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill at ./nptl/pthread_kill.c:89
 3# raise at ../sysdeps/posix/raise.c:27
 4# abort at ./stdlib/abort.c:81
 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
 7# 0x000055C401EF7811 in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 8# 0x000055C401EF7964 in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 9# doris::vectorized::read_string_binary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, doris::vectorized::BufferReadable&, unsigned long) at /root/doris_branch-2.1/doris/be/src/vec/io/io_helper.h:180
10# doris::vectorized::IAggregateFunctionHelper<doris::vectorized::AggregateFunctionNullUnaryInline<doris::vectorized::AggregateFunctionGroupConcat<doris::vectorized::AggregateFunctionGroupConcatImplStr>, true> >::deserialize_and_merge_from_column_range(char*, doris::vectorized::IColumn const&, unsigned long, unsigned long, doris::vectorized::Arena*) const in /mnt/disk1/STRESS_ENV/be/lib/doris_be
11# doris::vectorized::IAggregateFunctionHelper<doris::vectorized::AggregateStateUnion>::add_batch_range(unsigned long, unsigned long, char*, doris::vectorized::IColumn const**, doris::vectorized::Arena*, bool) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
12# doris::vectorized::VerticalBlockReader::_update_agg_value(std::vector<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>, std::allocator<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> > >&, int, int, bool) at /root/doris_branch-2.1/doris/be/src/vec/olap/vertical_block_reader.cpp:326
13# doris::vectorized::VerticalBlockReader::_update_agg_data(std::vector<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>, std::allocator<COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> > >&) at /root/doris_branch-2.1/doris/be/src/vec/olap/vertical_block_reader.cpp:308
14# doris::vectorized::VerticalBlockReader::_agg_key_next_block(doris::vectorized::Block*, bool*) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
15# doris::vectorized::VerticalBlockReader::next_block_with_aggregation(doris::vectorized::Block*, bool*) at /root/doris_branch-2.1/doris/be/src/vec/olap/vertical_block_reader.cpp:58
16# doris::Merger::vertical_compact_one_group(std::shared_ptr<doris::Tablet>, doris::ReaderType, std::shared_ptr<doris::TabletSchema>, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::vectorized::RowSourcesBuffer*, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*, std::vector<unsigned int, std::allocator<unsigned int> >, long, doris::CompactionSampleInfo*) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
17# doris::Merger::vertical_merge_rowsets(std::shared_ptr<doris::Tablet>, doris::ReaderType, std::shared_ptr<doris::TabletSchema>, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, long, doris::Merger::Statistics*) at /root/doris_branch-2.1/doris/be/src/olap/merger.cpp:445
18# doris::Compaction::do_compaction_impl(long) at /root/doris_branch-2.1/doris/be/src/olap/compaction.cpp:385
19# doris::Compaction::do_compaction(long) at /root/doris_branch-2.1/doris/be/src/olap/compaction.cpp:136
20# doris::CumulativeCompaction::execute_compact_impl() at /root/doris_branch-2.1/doris/be/src/olap/cumulative_compaction.cpp:79
21# doris::Compaction::execute_compact() at /root/doris_branch-2.1/doris/be/src/olap/compaction.cpp:118
22# doris::Tablet::execute_compaction(doris::Compaction&) at /root/doris_branch-2.1/doris/be/src/olap/tablet.cpp:2067
23# std::_Function_handler<void (), doris::StorageEngine::_submit_compaction_task(std::shared_ptr<doris::Tablet>, doris::CompactionType, bool)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
24# doris::ThreadPool::dispatch_thread() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
25# doris::Thread::supervise_thread(void*) at /root/doris_branch-2.1/doris/be/src/util/thread.cpp:499
26# start_thread at ./nptl/pthread_create.c:442
27# 0x00007F4AB19DC850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-09-19 01:08:34 +08:00
341b2e0693 [enhancement](compaction) Abort compaction tasks when correspoding tablet states have been changed (#40271) (#40828)
## Proposed changes

pick: #40271

1. Change the standard for cumu compaction capability. Tablets under
state `running` or `not ready` are capable to do cumulative compaction.
2. Abort a compaction task at the beginning when the tablet is no more
capable to do compaction.
2024-09-14 11:19:31 +08:00
bb709ad917 [branch-2.1] Picks "[Fix](merge-on-write) Fix duplicate key problem after adding sequence column for merge-on-write table #39958" (#40010)
## Proposed changes

picks https://github.com/apache/doris/pull/39958
2024-08-28 23:00:31 +08:00
beab6a81c1 [fix] (compaction) fix time series compaction policy (#39170) (#39228)
## Proposed changes

pick from master #39170
2024-08-13 10:33:04 +08:00
e2f45225d6 [branch-2.1] Picks "[opt](merge-on-write) eliminate reading the old values of non-key columns for delete stmt in publish phase #38703" (#39074)
picks https://github.com/apache/doris/pull/38703
2024-08-09 10:42:52 +08:00
7e95d7cbec [bugfix](backup)(cooldown) cancel backup properly when be backup failed (#38724) (#38993)
Co-authored-by: zhangyuan <ayuanzhang@tencent.com>
2024-08-07 15:58:11 +08:00
8ce30963cd [fix] (compaction) fix time series compaction policy (#38220) (#38917)
## Proposed changes

pick from #38220
2024-08-06 14:26:42 +08:00
64b69ed1ba [branch-2.1] Picks "[opt](merge-on-write) Skip the alignment process of some rowsets in partial update #38487" (#38682)
## Proposed changes

picks https://github.com/apache/doris/pull/38487
2024-08-02 20:05:31 +08:00
327069fdbc [branch-2.1](log) add tablet clear cache log (#38713) 2024-08-02 08:40:02 +08:00
4d980b8235 [feature](http action)Add http action to show nested inverted index file (#38272) (#38672)
backport #38272
2024-08-01 19:30:59 +08:00
359e50fc58 [fix](load) change tablet schema pointer to shared_ptr in memtable (#37927) (#37939)
backport #37927
2024-07-16 22:32:03 +08:00
7887f51e9b [fix](partial update) fix a mem leak issue (#37706) (#37730)
cherry-pick #37706
2024-07-13 09:20:01 +08:00
217eac790b [pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526) 2024-07-11 21:25:34 +08:00
97945af947 [fix](merge-on-write) when full clone failed, duplicate key might occur (#37001) (#37229)
cherry-pick #37001
2024-07-03 19:48:10 +08:00
7443e8fcf2 [cherry-pick](branch-2.1) fix single compaction test p2 #34568 #36881 (#37075) 2024-07-02 15:22:04 +08:00
798d9d6fc6 [pick21][opt](mow) reduce memory usage for mow table compaction (#36865) (#36968)
cherry-pick https://github.com/apache/doris/pull/36865 to branch-2.1
2024-07-01 15:33:18 +08:00
22cb7b8fcb [improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222)
cherry pick from #28082
2024-06-27 13:45:21 +08:00
6ec9a731e8 [branch-2.1](cherry-pick) partial update should not read old fileds from rows with delete sign (#36210) (#36755)
cherry-pick #36210
2024-06-24 21:13:24 +08:00
5541fd11e9 [branch-2.1](partial update)add logs for partial update (#35416)
add logs for partial update

the master PR is #35802

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-06-04 22:47:48 +08:00
72489a04c3 [cherry-pick](branch-2.1) remove some CHECKs in Tablet::revise_tablet_meta (#31268) (#34702)
## Proposed changes

Issue Number: close #xxx

cherry-pick #31268 

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-06-02 00:15:31 +08:00
7b271f916d [branch-2.1](partial-update) duplicate key occurred when BE restart (#35678)
We should save new delete bitmap in rocksdb when conflicts are handled in publish phase, which is introduced introduced by #30366
2024-05-31 09:38:06 +08:00
6e17dc1e87 (cherry-pick)[branch-2.1] add calc tablet file crc and fix single compaction test #33076 #34915 (#35215)
* [fix](compaction test) show single replica compaction status and fix test (#33076)
* [improve](http action) add http interface to calculate the crc of all files in tablet (#34915)
2024-05-26 17:15:09 +08:00
1e4a83e17b fix compile 2024-05-22 01:04:34 +08:00
b4a798240a [fix](inverted_index) donot use int32_t for index id to avoid overflow (#35062) 2024-05-21 12:58:38 +08:00
c22f42121b [fix](compaction test) show single replica compaction status and fix test (#33076) (#34285) (#34438) 2024-05-06 21:00:34 +08:00
b15fc2a906 [Cherry-pick](branch-2.1) Pick #34043 and #34112 (#34318)
* [Enhancement](full compaction) Add run status support for full compaction (#34043)

* The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}`
e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084`

If full compaction is running, the output will be
```
{
"status" : "Success",
"run_status" : true,
"msg" : "compaction task for this tablet is running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
else the ouput will be
```
{
"status" : "Success",
"run_status" : false,
"msg" : "compaction task for this tablet is not running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```

* 2

* 2

* [Fix](partial update) Fix rowset not found error when doing partial update (#34112)

Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown.

Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.
2024-04-30 07:26:23 +08:00
5277a55791 (pick 34003) release fd for shutdown tablets (#34224) 2024-04-29 10:51:19 +08:00
7e91e69eb9 [fix](compaction) fix single compaction (#33907)
* [fix](compaction)Fix single compaction to get all local versions #33849

add test and comment

* remove single replica compaction prepare input rowsets

reviesd
2024-04-19 23:30:25 +08:00
46a258dc85 [improvement](binlog)Support inverted index format v2 in CCR (#33415) 2024-04-17 23:42:12 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
2add3bc13a [fix](partial update) compaction may cause update failue (#31551) (#32361) 2024-03-18 10:58:51 +08:00
5f125bbaaa [improvement](binlog)Support inverted index in CCR (#31743) (#32101) 2024-03-12 15:34:08 +08:00
ffa904c487 [enhance](Cooldown) Skip cooldown if the tablet is dropped (#32079) 2024-03-12 14:20:39 +08:00
779ca464a5 [Fix](Status) Handle returned overall Status correctly (#31692)
Handle returned overall Status correctly
2024-03-09 19:44:39 +08:00
65d45daf8a [Bug](coredump) fix regresstion test coredump in multi thread access map (#31664) 2024-03-03 19:30:55 +08:00
e86cc7e8e8 [chore](log) reduce a lot inject debug point log #31474 2024-02-28 13:07:47 +08:00
82fd3af54b [chore](log) change merge-on-write correctness check log to VLOG_NOTICE (#31414) (#31467) 2024-02-27 23:36:24 +08:00
5e4674ab66 [fix](partial update) mishandling of exceptions in the publish phase may result in data loss (#30366) 2024-01-27 09:09:02 +08:00
25c71e386b [minor](Cooldown) Log the old rowset id when uploading rowset to S3 #30339 2024-01-27 09:07:02 +08:00
ccde65b942 [fix](Cooldown) Enhance calculate logic of _has_data_to_cooldown (#30244) (#30299) 2024-01-25 13:25:34 +08:00
5213f941dd [improvement](cooldown) print the cooldown version when follow cooldown version (#30239) 2024-01-24 10:02:03 +08:00
d525f576e1 [improve] Use lru cache to count the number of column in tablet schema to control memory (#29668) 2024-01-12 13:58:19 +08:00
0d16ec7345 [improvement](cooldown) do not cooldown tablet without cold data (#29690) 2024-01-12 11:57:16 +08:00
7c7dbf15bc [feature](merge-cloud) Decouple Tablet/TabletManager/TxnManager from global StorageEngine instance (#29736) 2024-01-12 11:57:16 +08:00
eb2b22bff1 [improve](cooldown) skip empty tablet (#29620) 2024-01-07 18:57:06 +08:00