Commit Graph

225 Commits

Author SHA1 Message Date
1fbfb81b8a [branch-2.1] Picks "[Fix](partial update) Persist partial_update_info in RocksDB in case of BE restart after a partial update has commited #38331" (#39035)
picks https://github.com/apache/doris/pull/38331 and
https://github.com/apache/doris/pull/39066
2024-08-08 14:50:08 +08:00
9f4e7346fb [fix](compaction) fixing the inaccurate statistics of concurrent compaction tasks (#37318) (#37496) 2024-07-10 22:23:25 +08:00
afcc6170f6 [fix](txn_manager) Add ingested rowsets to unused rowsets when removing txn (#37417)
Generally speaking, as long as a rowset has a version, it can be
considered not to be in a pending state. However, if the rowset was
created through ingesting binlogs, it will have a version but should
still be considered in a pending state because the ingesting txn has not
yet been committed.

This PR updates the condition for determining the pending state. If a
rowset is COMMITTED, the txn should be allowed to roll back even if a
version exists.

Cherry-pick #36551
2024-07-10 14:25:44 +08:00
3337c1bbe3 [[enhancement](compaction) adjust compaction concurrency based on compaction score and workload (#37491)
adjust compaction concurrency based on compaction score and workload
#36672
fix null pointer when retrieving CPU load average #37171
2024-07-09 09:56:35 +08:00
f5572ac732 [pick]reset memtable flush thread num (#37092)
## Proposed changes

pick #37028
2024-07-02 19:20:17 +08:00
92cbbd2b75 [fix](clone) Fix clone and alter tablet use same tablet path #34889 (#36858)
cherry pick from #34889
2024-06-30 20:40:54 +08:00
843c89f109 [fix] fix nullptr when clear cache due to move (#34323) 2024-04-30 09:52:41 +08:00
b15fc2a906 [Cherry-pick](branch-2.1) Pick #34043 and #34112 (#34318)
* [Enhancement](full compaction) Add run status support for full compaction (#34043)

* The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}`
e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084`

If full compaction is running, the output will be
```
{
"status" : "Success",
"run_status" : true,
"msg" : "compaction task for this tablet is running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
else the ouput will be
```
{
"status" : "Success",
"run_status" : false,
"msg" : "compaction task for this tablet is not running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```

* 2

* 2

* [Fix](partial update) Fix rowset not found error when doing partial update (#34112)

Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown.

Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.
2024-04-30 07:26:23 +08:00
5277a55791 (pick 34003) release fd for shutdown tablets (#34224) 2024-04-29 10:51:19 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
7b74b199a5 [fix](memory) Fix LRU cache deleter and memory tracking (#32080)
In order to add common code to the value deleter of LRU cache, let all lru cache values inherit from LRUCacheValueBase class and tracking memory in destructor.
2024-03-15 17:57:58 +08:00
1abe9d4384 [fix](memory) Fix LRU cache stale sweep (#31122)
Remove LRUCacheValueBase, put last_visit_time into LRUHandle, and automatically update timestamp to last_visit_time during cache insert and lookup.

Do not rely on external modification of last_visit_time, which is often forgotten.
2024-02-21 17:01:29 +08:00
eaaab33f0a [Fix](Top-N opt) evicting quering rowsets in prior to correct use_count (#102) (#30904)
This addresses the scenario where a rowset cannot be removed.
2024-02-16 10:16:40 +08:00
041db03c94 [fix](gc) fix a core introduced by #30854 (#30932)
introduced by #30854, if it is the end of the map _unused_rowsets, program will core.
2024-02-16 10:12:24 +08:00
5b343911e8 [log](gc) add log for unused rowsets gc (#30854) 2024-02-16 10:12:23 +08:00
cc3c6d1479 [improvement](create tablet) backend create tablet round robin among … (#30530)
* [improvement](create tablet) backend create tablet round robin among … (#29818)

* [improvement](create tablet) be choose disk tolerate with little skew (#30354)

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
2024-01-30 10:20:35 +08:00
7c7dbf15bc [feature](merge-cloud) Decouple Tablet/TabletManager/TxnManager from global StorageEngine instance (#29736) 2024-01-12 11:57:16 +08:00
4581618b09 [improvement](disk) pick disk randomly when usage is less than 0.7 (#29368) 2024-01-02 14:08:09 +08:00
3661c316c9 Revert "[improvement](create tablet) backend create tablet round robin among disks (#23218)" (#29347)
This reverts commit df5b5ae0cb2f30f026ec104a64b4d9a5ce2904f3.
2023-12-31 12:51:21 +08:00
1aa9ac4fe4 Prevent making snapshot on remote rowset in single replica compaction (#28716) 2023-12-27 23:43:43 +08:00
f374beaa4e [fix](log) regularise some BE error type and fix a load task check #28729 2023-12-25 10:45:19 +08:00
34fd376f33 [fix](publish version) fix publish fail but return ok (#28425) 2023-12-21 11:10:08 +08:00
f9ddf8c7ef [improvement](be report) add be report http (#28424) 2023-12-19 10:39:19 +08:00
82a91380e6 [enhancement](compaction) Add support for limiting low priority compaction scheduling (#27648) 2023-12-14 18:31:23 +08:00
cd6d75e518 [fix](memory) TabletSchema and Schema no longer track memory, only track columns count. (#28149)
TabletSchema and Schema no longer track memory, only track columns count. because cannot accurately track memory size.

TabletMeta MemTracker changed to track TabletSchema columns count.

Segment::_meta_mem_usage Unknown value overflow, causes the value of SegmentMeta MemTracker is similar to -2912341218700198079. So, temporarily put it in experimental type tracker.
2023-12-13 15:06:46 +08:00
1afdbfe723 [enhance](BE) Refactor TaskWorkerPool (#27555) 2023-12-04 21:46:10 +08:00
Pxl
1188d88a10 [Chore](status) catch some error status on storage (#27132)
catch some error status on storage
2023-11-17 12:00:39 +08:00
c26f5a2bd2 [improvement](BE) Remove unnecessary error handling codes (#26760) 2023-11-12 00:02:51 +08:00
d767804815 [feature](merge-cloud) Decouple rowset id generator and local rowsets gc implementation (#25921) 2023-11-10 10:07:02 +08:00
f31c1d858a [fix](merge-on-write) fix duplicate key in schema change (#25705)
It should be ensured that the obtained versions are continuous when calculate delete bitmap calculations in publish.
The remaining NOTREADY tablet in the schema change failure should be dropped.
When a rowset was deleted, the delete bitmap cannot be deleted until there are no read requests to use the rowset.
2023-10-25 05:59:48 -05:00
6757d2f361 Revert "[Enhancement](show-backends-disks) Add show backends disks (#24229)" (#25389)
This reverts commit 21223e65c59c23cfcb9e8ab610ea321168bcb75a.
2023-10-13 14:08:45 +08:00
21223e65c5 [Enhancement](show-backends-disks) Add show backends disks (#24229)
* Add statement to query disk information corresponding to data directory of BE node


[msyql]->'show backends disks;'
+-----------+-------------+------------------------------+---------+----------+---------------+-------------+-------------------+---------+
| BackendId | Host | RootPath | DirType | DiskState| TotalCapacity | UsedCapacity| AvailableCapacity | UsedPct |
+-----------+-------------+------------------------------+---------+----------+---------------+-------------+-------------------+---------+
| 10002 | 10.xx.xx.90 | /home/work/output/be/storage | STORAGE | ONLINE | 7.049 TB | 2.478 TB | 4.571 TB | 35.16 % |
| 10002 | 10.xx.xx.90 | /home/work/output/be | DEPLOY | ONLINE | 7.049 TB | 2.478 TB | 4.571 TB | 35.16 % |
| 10002 | 10.xx.xx.90 | /home/work/output/be/log | LOG | ONLINE | 7.049 TB | 2.478 TB | 4.571 TB | 35.16 % |
+-----------+-------------+------------------------------+---------+----------+---------------+-------------+-------------------+---------+
2023-10-12 20:24:45 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
8eb14eec7c [enhancement](baddisk) record bad disk in be_custom.conf to handle (#24639) 2023-09-21 18:31:58 +08:00
cac089c7cd [fix](compile) fix mac compile sort failed #24453 2023-09-16 09:52:20 +08:00
df5b5ae0cb [improvement](create tablet) backend create tablet round robin among disks (#23218)
Backend choose disk by disk available bytes and tablet num. If both are equal, round robin among them.
2023-09-15 11:39:43 +08:00
c7ae2a7d22 [Refactor & Bugfix](static variables) move some static vairables to exec_env (#24029) 2023-09-13 09:27:03 +08:00
f8fd8a3d17 [fix](trash) fix clean trash not working (#23936)
When executing admin clean trash, if the backend daemon clean thread is cleaning trash, then SQL command will return immediately. But for the backend daemon thread, it doesn't clean all the trashes, it clean only the expired trashes.
Also if there's lots of trashes, the daemon clean thread will busy handling trashes for a long time.
2023-09-08 18:13:22 +08:00
09bcedb116 [feature](merge-cloud) Remove deprecated old cache (#23881)
* Remove deprecated old cache
2023-09-06 08:07:05 +08:00
acbd8ca185 [improvement](show backends) show backends print trash used (#23792) 2023-09-03 20:30:58 +08:00
91c5640cae [fix](tablet clone) fix clone backend chose wrong disk (#23729) 2023-09-01 15:12:35 +08:00
25b6e4deb2 [fix](daemon) Fix incorrect initialization order of daemon services (#23578)
Current initialization dependency:

      Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo
                │
                │
BackendService ─┘
However, original code incorrectly initialize Daemon before StorageEngine.
This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.
2023-08-31 19:46:38 +08:00
da9eb79ac4 [Enhancement](Schema hash) Remove schema hash in tablet info (#23516) 2023-08-29 10:05:12 +08:00
35a1404bbe [fix](load) add error handle when load data dir (#23457) 2023-08-28 19:33:50 +08:00
81dd00f6e4 [Feature](Compaction) Support do full compaction by table id (#22010) 2023-08-21 11:54:51 +08:00
b9b9071c9b [improvement](create partition) create partition require quorum replicas succ (#22554) 2023-08-11 11:59:05 +08:00
94d563f04d [improvement](garbage sweep) garbage sweep sleep for a while to reduce io (#22762) 2023-08-10 12:11:50 +08:00
0d75a54d6c [fix](compaction) fix null pointer if single replica compaction gets rowset version from peer (#22717) 2023-08-09 20:55:24 +08:00
f2731185c9 [fix](memory) fix cache clean thread (#22472)
fix page cache update last visit time.
fix cache clean thread
2023-08-08 15:38:29 +08:00