Commit Graph

105 Commits

Author SHA1 Message Date
92cbbd2b75 [fix](clone) Fix clone and alter tablet use same tablet path #34889 (#36858)
cherry pick from #34889
2024-06-30 20:40:54 +08:00
22cb7b8fcb [improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222)
cherry pick from #28082
2024-06-27 13:45:21 +08:00
0dccc4e6e4 [cherry-pick](branch-2.1)fix http error when downloading varaint inverted index file #35668 (#36061)
pick from master[#35668](https://github.com/apache/doris/pull/35668)
2024-06-11 14:09:05 +08:00
72489a04c3 [cherry-pick](branch-2.1) remove some CHECKs in Tablet::revise_tablet_meta (#31268) (#34702)
## Proposed changes

Issue Number: close #xxx

cherry-pick #31268 

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-06-02 00:15:31 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
d60d804d9c [fix](memory) Fix task repeat attach task DCHECK failed #32784 (#33343)
[branch-2.1](memory) Fix CCR task repeat attach task DCHECK failed3 #33366
2024-04-08 16:15:04 +08:00
5f125bbaaa [improvement](binlog)Support inverted index in CCR (#31743) (#32101) 2024-03-12 15:34:08 +08:00
779ca464a5 [Fix](Status) Handle returned overall Status correctly (#31692)
Handle returned overall Status correctly
2024-03-09 19:44:39 +08:00
fc53c7210b [fix](chmod) change chmod to filesystem::permission to avoid race condition (#31032) 2024-02-18 11:50:16 +08:00
cc3c6d1479 [improvement](create tablet) backend create tablet round robin among … (#30530)
* [improvement](create tablet) backend create tablet round robin among … (#29818)

* [improvement](create tablet) be choose disk tolerate with little skew (#30354)

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
2024-01-30 10:20:35 +08:00
7c7dbf15bc [feature](merge-cloud) Decouple Tablet/TabletManager/TxnManager from global StorageEngine instance (#29736) 2024-01-12 11:57:16 +08:00
797238cbb7 [fix](merge-on-write) fix schema change may result in delete bitmap incorrect (#29386) 2024-01-02 23:45:04 +08:00
1aa9ac4fe4 Prevent making snapshot on remote rowset in single replica compaction (#28716) 2023-12-27 23:43:43 +08:00
a8e6676640 [Bug](security) BE download_files function exists log print sensitive msg #28592 (#28594) 2023-12-26 21:59:47 +08:00
f374beaa4e [fix](log) regularise some BE error type and fix a load task check #28729 2023-12-25 10:45:19 +08:00
ebed055d2b [chore](clone) rename clone request field (#27591) 2023-12-08 11:53:57 +08:00
2f04873da9 [fix](clone) Fix engine_clone file exist (#27361)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-11-21 20:30:27 +08:00
d767804815 [feature](merge-cloud) Decouple rowset id generator and local rowsets gc implementation (#25921) 2023-11-10 10:07:02 +08:00
f31c1d858a [fix](merge-on-write) fix duplicate key in schema change (#25705)
It should be ensured that the obtained versions are continuous when calculate delete bitmap calculations in publish.
The remaining NOTREADY tablet in the schema change failure should be dropped.
When a rowset was deleted, the delete bitmap cannot be deleted until there are no read requests to use the rowset.
2023-10-25 05:59:48 -05:00
9c9fc84f39 [feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929) 2023-10-18 20:29:04 +08:00
73c3e3ab55 [Feature](x-load) support config min replica num for loading data (#21118) 2023-10-11 21:07:35 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
6502da8917 [bugfix](restore) add partition id into convert_rowset_ids() (#24834) 2023-09-25 20:07:24 +08:00
9d2fc78bd5 [fix](cooldown) Fix potential data loss when clone task's dst tablet is cooldown replica (#17644)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
2023-09-01 15:27:52 +08:00
91c5640cae [fix](tablet clone) fix clone backend chose wrong disk (#23729) 2023-09-01 15:12:35 +08:00
22cbf43b14 [Improvement](binlog) Add full/incr engine clone with binlog (#22678)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-08 10:03:11 +08:00
ba3a0922eb [fix](ipv6)Support IPV6 (#22219)
fe:Remove restrictions from IPv4
be: thrift server Specify binding address
be: Restore changed code of “be/src/olap/task/engine_clone_task.cpp”
2023-07-26 08:40:32 +08:00
Pxl
ca71048f7f [Chore](status) avoid empty error msg on status (#21454)
avoid empty error msg on status
2023-07-11 13:48:16 +08:00
691a988c97 [enhancement](merge-on-write) add async publish task when version is discontinuous for merge on write table when clone (#21025)
version discontinuity may occur when clone. To deal with this case, add async publish task when version is discontinuous.
2023-06-22 21:50:14 +08:00
accaff1026 [Feature](compaction) wip: single replica compaction (#19237)
Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica.

The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica.
The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool.
When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.
2023-05-30 21:12:48 +08:00
42239d635a [fix](tablet_manager_lock) fix create tablet timeout #20067 (#20069) 2023-05-28 23:05:13 +08:00
ae352997b4 [Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063) 2023-05-28 11:23:07 +08:00
93933308e6 [Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881) 2023-05-26 23:40:49 +08:00
cdfbfd1f6b [fix](replica) Fix inconsistent replica id between FE and BE (#18688) 2023-05-06 11:06:29 +08:00
52b1bd2c81 [clone](download) fix be clone action download tablet content length overflow (#18851) 2023-04-28 11:35:17 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
f03598f214 [enhance](cooldown) no snapshot or migration action for cooldown tablet (#17658) 2023-03-27 13:35:32 +08:00
0801883604 [fix](merge-on-write) fix that delete bitmap is not calculated correctly when clone tablet (#17334) 2023-03-05 22:04:28 +08:00
5190a496ac [fix](rebalance) fix that the clone operation is not performed due to incorrect condition judgment (#17381) 2023-03-05 21:58:33 +08:00
26a46d8c3f [fix](cooldown) Handle full clone with cooldowned rowsets (#17069) 2023-02-28 11:04:01 +08:00
66ceab540a [fix](replica) Fix inconsistent replica id between BE and FE in corner case of tablet rebalance (#16889) 2023-02-22 16:21:11 +08:00
d013d529c8 [Feature](ipv6)Support IPV6 (#14063)
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
2023-02-14 21:43:10 +08:00
7482b6bad2 [fix](cooldown) Add cold_compaction_lock to serialize any operations which may delete the input rowsets of cold data compaction (#16742)
Add cold_compaction_lock to serialize tablet clone, cold data compaction and follow cooldowned data
2023-02-14 21:38:33 +08:00
5014ad03e7 [feature](cooldown) Auto delete unused remote files (#16588) 2023-02-13 23:59:39 +08:00
8317c4a752 [Bug](cooldown) set new replica id when early exit in doing clone when no missed versions (#16644)
* set new replica id

* reduce lock

* reset when replica id is different
2023-02-13 14:39:03 +08:00
6a8fc35b78 [Bug](Cooldown) fix load balance causing no cooldown replica (#16641) 2023-02-12 16:47:38 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
2389a90cd0 [enhancement](snapshot) add missed version log when make_snapshot in engine clone task (#14284) 2022-11-24 14:51:28 +08:00
15eb07b829 [BugFix](file cache) don't clean clone dir when doing _gc_unused_file_caches (#14194)
* use another file_size overload for noexcept

* don't gc clone dir

* use better status
2022-11-14 11:35:08 +08:00