Commit Graph

575 Commits

Author SHA1 Message Date
c30c1d2436 [branch-2.1] Picks "[opt](delete) Delete job should retry for failure that is not DELETE_INVALID_XXX #37834" (#38032)
## Proposed changes

picks https://github.com/apache/doris/pull/37834 and
https://github.com/apache/doris/pull/38043
2024-07-18 14:50:30 +08:00
cf2fb6945a [branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658)
pick 
#36235
#35965
2024-07-11 21:04:01 +08:00
9f4e7346fb [fix](compaction) fixing the inaccurate statistics of concurrent compaction tasks (#37318) (#37496) 2024-07-10 22:23:25 +08:00
afcc6170f6 [fix](txn_manager) Add ingested rowsets to unused rowsets when removing txn (#37417)
Generally speaking, as long as a rowset has a version, it can be
considered not to be in a pending state. However, if the rowset was
created through ingesting binlogs, it will have a version but should
still be considered in a pending state because the ingesting txn has not
yet been committed.

This PR updates the condition for determining the pending state. If a
rowset is COMMITTED, the txn should be allowed to roll back even if a
version exists.

Cherry-pick #36551
2024-07-10 14:25:44 +08:00
b75533e72b [branch-2.1](beut) fix BE UT (#36147)
only for branch-2.1
2024-06-12 08:21:38 +08:00
596a9a16d3 [chore](Compile) Fix segment cache ut's compile error due to miss cherry-pick (#36099) 2024-06-11 17:12:42 +08:00
a0f3c1cd1e [chore](Compile) Fix S3 file writer ut's compile error due to miss cherry-pick (#36037)
The S3 File Writer's ut can't pass ut compile, this pr tries to fix it.
2024-06-08 22:21:20 +08:00
af779f5cd8 Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc (#32793)" (#35978)
## Proposed changes
Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc
(#32793)"
2024-06-06 22:24:30 +08:00
f80b856405 [enhancement](oom) return error when bloom filter allocate memory failed (#35790)
## Proposed changes


1. return error when bloom filter allocate memory failed
2. return error when deserialize a block,  it may need a lot of memory.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-06-03 18:22:11 +08:00
9c270e5cdf [fix](delete) Fix unrecognized column name delete handler (#32429) (#35742)
pick doris-master #32429
2024-05-31 20:41:22 +08:00
8fb28244d6 [improvement](page builder) avoid allocating big memory in ctor (#35493)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-29 15:03:54 +08:00
309503855e [Fix](bloom filter) Fix bloom filter memory leak (#34871)
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
2024-05-24 16:23:58 +08:00
95b05928fd [fix](compaction) fix time series compaction merge empty rowsets priority #34562 (#34765) 2024-05-14 09:10:09 +08:00
7e91e69eb9 [fix](compaction) fix single compaction (#33907)
* [fix](compaction)Fix single compaction to get all local versions #33849

add test and comment

* remove single replica compaction prepare input rowsets

reviesd
2024-04-19 23:30:25 +08:00
a4924dabb7 [enhancement](exception) enble exception logic in pipeline execute thread (#33437)
* [enhancement](exception) enble exception logic in pipeline execute thread

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-12 15:09:25 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
39fba884fb [fix](typo) typo fix for 'delete bimap' changing to 'delete bitmap' (#32341) 2024-04-10 11:34:30 +08:00
28e2d89ce3 [Improve](inverted_index) update clucene and improve array inverted index writer (#32436) 2024-04-10 11:34:29 +08:00
7b74b199a5 [fix](memory) Fix LRU cache deleter and memory tracking (#32080)
In order to add common code to the value deleter of LRU cache, let all lru cache values inherit from LRUCacheValueBase class and tracking memory in destructor.
2024-03-15 17:57:58 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00
eea9b56f69 [fix](group commit) handle group commit create plan error (#31757) 2024-03-06 13:07:59 +08:00
7d1db6cd1f [refactor](exception safe) Refactor delete handler and block column predicates to make sure exception safe (#31618) 2024-03-01 14:21:17 +08:00
90ab5ec2d9 [fix](invert index) fix the error issue in the unit test remove_element_only_in_table (#31238) 2024-02-22 13:01:49 +08:00
1abe9d4384 [fix](memory) Fix LRU cache stale sweep (#31122)
Remove LRUCacheValueBase, put last_visit_time into LRUHandle, and automatically update timestamp to last_visit_time during cache insert and lookup.

Do not rely on external modification of last_visit_time, which is often forgotten.
2024-02-21 17:01:29 +08:00
7a1bd6abb0 [improvment](group_commit) Refector scan wal function (#30939)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2024-02-20 09:12:38 +08:00
b5012dc55a [Enhancement](group commit) optimize pre allocated calculation (#30893) 2024-02-18 11:50:17 +08:00
Pxl
0f47f7f389 [Feature](runtime filter) normalize ignore runtime filter (#30152)
normalize ignore runtime filter
2024-02-03 20:24:39 +08:00
e9c112b843 [Refact](inverted index) refact inverted index cache to decouple with reader (#30574) 2024-02-01 19:00:50 +08:00
ccde65b942 [fix](Cooldown) Enhance calculate logic of _has_data_to_cooldown (#30244) (#30299) 2024-01-25 13:25:34 +08:00
1a51d04cb8 [fix](move-memtable) fix schema use-after-free in delta writer v2 (#30254) 2024-01-24 10:00:25 +08:00
d525f576e1 [improve] Use lru cache to count the number of column in tablet schema to control memory (#29668) 2024-01-12 13:58:19 +08:00
81680383e6 [UT](wal) Add wal dirs info be ut (#29759) 2024-01-12 11:57:16 +08:00
0d16ec7345 [improvement](cooldown) do not cooldown tablet without cold data (#29690) 2024-01-12 11:57:16 +08:00
7c7dbf15bc [feature](merge-cloud) Decouple Tablet/TabletManager/TxnManager from global StorageEngine instance (#29736) 2024-01-12 11:57:16 +08:00
b0cac0014d [enhance](FS) Improve FS error code (#29432) 2024-01-06 21:17:22 +08:00
85dd606fd1 [fix](group_commit) Fix group_commit ut (#29587) 2024-01-06 18:11:13 +08:00
a0c3ddf902 [fix](memory) Fix LRUCacheType::NUMBER charge (#29588)
if LRUCacheType::NUMBER, charge not add handle_size, because charge at this time is no longer the memory size, but an independent weight.
2024-01-06 10:37:56 +08:00
f40cce1406 [Fix](partition) Skip rowset partition id eq 0 smaller than config wh… (#29510) 2024-01-05 19:39:51 +08:00
706463781c [refactor](group commit) refactor group commit wal code (#29375) 2024-01-02 15:52:03 +08:00
03901b9a7a [enhancement](group_commit): refector relay wal code (#29183) 2023-12-30 12:59:46 +08:00
82635d4b59 [opt](memory) All LRU Cache inherit from LRUCachePolicy (#28940)
After all LRU Cache inherits from LRUCachePolicy, this will allow prune stale entry, eviction when memory exceeds limit, and define common properties. LRUCache constructor change to private, only allow LRUCachePolicy to construct it.

Impl DummyLRUCache, when LRU Cache capacity is 0, will no longer be meaningless insert and evict.
2023-12-29 16:15:56 +08:00
a525d5c5a3 [refactor](decimal) change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion (#29265)
change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion
2023-12-29 10:11:44 +08:00
xy
fd90c3a6a6 [optimize](cooldown)Reduce the number of calls to the pick_cooldown_rowset (#27091)
Co-authored-by: xingying01 <xingying01@corp.netease.com>
2023-12-28 13:03:33 +08:00
9ff8bd2e9c [Enhancement](Wal)Support dynamic wal space limit (#27726) 2023-12-27 11:51:32 +08:00
0af6bd6390 [fix](group-commit) check if wal need recovery is abnormal (#28769) 2023-12-22 11:06:11 +08:00
aab859be56 [enhance](partition_id) check partition id before store meta (#28055) 2023-12-19 21:31:41 +08:00
e6e8632167 [improvement](merge-on-write) Optimize publish when there are missing versions (#28012)
1. Do not retry publishing on be When there are too many missing versions, just
add to async publish task.
2. To reduce memory consumption, clean up the tasks when there are too many
async publish tasks.
2023-12-13 16:59:25 +08:00
a719d7a222 [fix](memory) Fix LRU Cache of type NUMBER charge (#28175) 2023-12-13 11:15:57 +08:00