-
Notifications
You must be signed in to change notification settings - Fork 429
[store] Add disk eviction feature #1028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[store] Add disk eviction feature #1028
Conversation
Summary of ChangesHello @Vincent-Bo-ali, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the resilience of the StorageBackend by introducing an automated disk eviction mechanism. This feature ensures that the storage system can gracefully handle situations where disk space is limited, proactively clearing space by removing the oldest files to accommodate new writes. This prevents service disruptions caused by out-of-space errors and reduces the need for manual intervention, making the storage system more robust and self-managing. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable disk eviction feature to the storage backend, which is a great step towards a more resilient system. The implementation correctly uses a FIFO strategy and handles space management across various file operations. However, there are several areas that require attention before this can be considered production-ready. Most critically, there's a reliance on sleep_for for synchronization, which is a significant anti-pattern and can lead to race conditions. There are also opportunities to improve maintainability by reducing code duplication, particularly in the StoreObject methods and space-updating logic. Additionally, I've identified a bug in the eviction logic that could prevent space from being reclaimed correctly, along with some minor issues regarding magic numbers, logging levels, and type consistency. Addressing these points will make the implementation more robust and easier to maintain.
| std::this_thread::sleep_for( | ||
| std::chrono::microseconds(50)); // sleep for 50 us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::this_thread::sleep_for for synchronization is unreliable and is a sign of a race condition. The TODO comment acknowledges this is to allow a write thread time to create a file. This approach can lead to flaky behavior where the remove operation might still fail if the sleep duration is not long enough, or it can introduce unnecessary delays. A more robust synchronization mechanism should be implemented to ensure RemoveFile only proceeds after any pending write operations for the same path are complete. For example, since writes are submitted to a thread pool, you could maintain a map of paths to futures for pending writes.
| std::unique_lock<std::shared_mutex> lock(space_mutex_); | ||
| if (file_size <= used_space_) { | ||
| used_space_ -= file_size; | ||
| available_space_ += file_size; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The space reclamation logic here is incomplete and inconsistent with RemoveFile. If file_size > used_space_, no space is reclaimed, which is a bug. This can happen if the internal accounting of used_space_ becomes incorrect. This will cause EvictFile to not free up space, potentially leading to a failure to store a new object even when disk space could have been made available. The logic should be consistent with RemoveFile to handle this edge case correctly by resetting used_space_.
std::unique_lock<std::shared_mutex> lock(space_mutex_);
if (file_size <= used_space_) {
used_space_ -= file_size;
available_space_ += file_size;
} else {
available_space_ += used_space_;
used_space_ = 0;
}| size_t attempts = 0; | ||
| const size_t max_attempts = 1000; | ||
|
|
||
| bool space_reserved = CheckDiskSpace(total_size); | ||
|
|
||
| while (!space_reserved && attempts < max_attempts) { | ||
| std::string evicted_file = EvictFile(); | ||
| if (evicted_file.empty()) { | ||
| LOG(ERROR) << "Failed to evict file to make space."; | ||
| return tl::make_unexpected(ErrorCode::FILE_WRITE_FAIL); | ||
| } | ||
|
|
||
| attempts++; | ||
| space_reserved = CheckDiskSpace(total_size); | ||
| } | ||
|
|
||
| if (!space_reserved) { | ||
| LOG(ERROR) << "Still insufficient disk space after evicting files."; | ||
| return tl::make_unexpected(ErrorCode::FILE_WRITE_FAIL); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code containing the eviction loop is nearly identical to the one in the StoreObject overload for std::span (lines 157-176). Duplicating this logic increases maintenance overhead and the risk of introducing inconsistencies. Please consider refactoring this into a private helper method that attempts to secure the required disk space, performing evictions as needed.
| } | ||
|
|
||
| size_t attempts = 0; | ||
| const size_t max_attempts = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximum number of eviction attempts is hardcoded to 1000. This magic number makes the code harder to understand and configure. If evicting 1000 small files is not enough to free up the required space, the write operation will fail. This limit should be defined as a named constant with a descriptive name (e.g., kMaxEvictionAttempts), and ideally, it should be configurable.
| const size_t max_attempts = 1000; | |
| const size_t kMaxEvictionAttempts = 1000; |
| int64_t slices_total_size = 0; | ||
| for (const auto& slice : slices) { | ||
| iovec io{slice.ptr, slice.size}; | ||
| iovs.push_back(io); | ||
| slices_total_size += slice.size; | ||
| slices_total_size += static_cast<int64_t>(slice.size); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable slices_total_size is of type int64_t, but it's used to accumulate sizes from slice.size which are of type size_t. Later, it's cast back to size_t for comparison. This mix of signed and unsigned types for sizes can be confusing and is a potential source of bugs. It would be cleaner and safer to use size_t or uint64_t consistently for all size-related calculations.
| int64_t slices_total_size = 0; | |
| for (const auto& slice : slices) { | |
| iovec io{slice.ptr, slice.size}; | |
| iovs.push_back(io); | |
| slices_total_size += slice.size; | |
| slices_total_size += static_cast<int64_t>(slice.size); | |
| } | |
| size_t slices_total_size = 0; | |
| for (const auto& slice : slices) { | |
| iovec io{slice.ptr, slice.size}; | |
| iovs.push_back(io); | |
| slices_total_size += slice.size; | |
| } |
| if (file_size > 0) { | ||
| std::unique_lock<std::shared_mutex> lock(space_mutex_); | ||
| if (file_size <= used_space_) { | ||
| used_space_ -= file_size; | ||
| available_space_ += file_size; | ||
| } else { | ||
| available_space_ += used_space_; | ||
| used_space_ = 0; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for updating space accounting when a file is removed is duplicated in RemoveFile (here), RemoveByRegex (lines 418-427), RemoveAll (lines 473-486), and EvictFile (lines 582-592). This duplication makes the code harder to maintain and has already led to an inconsistency in EvictFile. This logic should be extracted into a private helper function, for example ReleaseSpace(uint64_t file_size), to ensure consistency and improve maintainability.
| std::string path_str = entry.path().string(); | ||
|
|
||
| if (std::regex_search(filename, pattern)) { | ||
| paths_to_remove.push_back(entry.path()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| bool StorageBackend::CheckDiskSpace(size_t required_size) { | ||
| std::unique_lock<std::shared_mutex> lock(space_mutex_); | ||
|
|
||
| if (!initialized_.load(std::memory_order_relaxed)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::memory_order_relaxed for checking the initialized_ flag is subtle. While it might be correct in the current call chain from StoreObject (which uses an acquire load), it makes CheckDiskSpace less safe if it's ever called from a different context. For improved clarity and to prevent potential future race conditions, it's safer to use std::memory_order_acquire. The performance impact of using a stronger memory order here is negligible.
| if (!initialized_.load(std::memory_order_relaxed)) { | |
| if (!initialized_.load(std::memory_order_acquire)) { |
| std::string StorageBackend::SelectFileToEvictByFIFO() { | ||
| std::unique_lock<std::shared_mutex> lock(file_queue_mutex_); | ||
| if (file_write_queue_.empty()) { | ||
| LOG(ERROR) << "Queue is empty, cannot select file to evict"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging an ERROR when the eviction queue is empty might be too severe, as this can be a normal state (e.g., when the storage is empty). This can create unnecessary noise in error logs. Consider changing the log level to WARNING or INFO, especially since the calling function EvictFile already logs a WARNING in this case.
LOG(WARNING) << "Queue is empty, cannot select file to evict";There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the lint issue using clang-format
| { | ||
| std::unique_lock<std::shared_mutex> lock(space_mutex_); | ||
|
|
||
| total_space_ = space_info.available; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total_space_ should be a configurable quota
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total_space_ = min(space_info.available, user defined space)
|
|
||
| total_space_ = space_info.available; | ||
| available_space_ = space_info.available; | ||
| used_space_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded used_space_ may be a wrong value when service restarts from a crash.
| namespace fs = std::filesystem; | ||
| uint64_t file_size = 0; | ||
| std::error_code ec; | ||
| file_size = fs::file_size(file_to_evict, ec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this size can be maintained by yourself for performance
| while (it != file_write_queue_.end()) { | ||
| std::string file_to_evict = *it; | ||
|
|
||
| if (std::filesystem::exists(file_to_evict)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same above
| * @param path Path of the file to remove from queue | ||
| */ | ||
| void RemoveFileFromWriteQueue(const std::string& path); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three func should be private
| if (actual_fsdir.rfind("moon_", 0) == 0) { | ||
| actual_fsdir = actual_fsdir.substr(5); | ||
| } | ||
| fs::path storage_root = fs::path(root_dir_) / actual_fsdir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These block is same as the code in StorageBackend::Init. Make it a helper
|
|
||
| void StorageBackend::AddFileToWriteQueue(const std::string& path) { | ||
|
|
||
| std::unique_lock<std::shared_mutex> lock(file_queue_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is necessary. the stdc++ have guargnteed for us ?
Signed-off-by: Vincent Gao <vincentbo@linux.alibaba.com>
dbe0554 to
f8c9714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds an automated disk eviction mechanism to the StorageBackend to handle disk space exhaustion by automatically removing older files when space is needed. The implementation uses a FIFO (First-In-First-Out) eviction strategy and introduces space quota tracking to prevent out-of-space errors.
Key changes:
- Added
Init()method to initialize space tracking based on available disk space - Implemented FIFO-based file eviction with queue management for tracking write order
- Modified
StoreObject()methods to reserve space before writes and release on failures
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| mooncake-store/include/storage_backend.h | Added new public API Init(), eviction methods, and private members for space tracking (mutexes, space counters, file queue) |
| mooncake-store/src/storage_backend.cpp | Implemented disk eviction logic including space reservation/release, FIFO file selection, and updated file removal methods to track freed space |
| mooncake-store/src/client.cpp | Added call to StorageBackend::Init() in PrepareStorageBackend() to initialize the space tracking system |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| file->vector_write(iovs.data(), static_cast<int>(iovs.size()), 0); | ||
| if (!write_result) { | ||
| LOG(ERROR) << "vector_write failed for: " << path | ||
| LOG(ERROR) << "Vector_write failed for: " << path |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalization inconsistency: The log message uses "Vector_write" with a capital 'V', which is inconsistent with standard function naming conventions and the original error message "vector_write" (which would match the actual function name). This makes the error message look like a typo.
Consider keeping it lowercase: "vector_write failed for: "
| LOG(ERROR) << "Vector_write failed for: " << path | |
| LOG(ERROR) << "vector_write failed for: " << path |
| if (has_enough_space) { | ||
| used_space_ += required_size; | ||
| available_space_ -= required_size; |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Potential integer underflow: The function checks available_space_ >= required_size before subtracting, which is good. However, there's no check to ensure used_space_ + required_size won't overflow. If required_size is very large, adding it to used_space_ could overflow a uint64_t.
Consider adding overflow protection:
if (used_space_ > UINT64_MAX - required_size) {
LOG(ERROR) << "Space tracking overflow detected";
return false;
}| /** | ||
| * @brief Evicts a file based on FIFO order (earliest written first out) | ||
| * @return Path of the evicted file, or empty string if no file was evicted | ||
| */ | ||
| std::string EvictFile(); | ||
|
|
||
| /** | ||
| * @brief Add file to write queue for FIFO tracking | ||
| * @param path Path of the file to add to queue | ||
| */ | ||
| void AddFileToWriteQueue(const std::string& path); | ||
|
|
||
| /** | ||
| * @brief Remove file from write queue | ||
| * @param path Path of the file to remove from queue | ||
| */ | ||
| void RemoveFileFromWriteQueue(const std::string& path); |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] API design issue: EvictFile(), AddFileToWriteQueue(), and RemoveFileFromWriteQueue() are declared as public methods, but they are internal implementation details of the disk eviction mechanism. Exposing these methods publicly could lead to:
- External code incorrectly manipulating the eviction queue
- Breaking internal invariants (e.g., calling EvictFile without updating space tracking)
- Confusion about the intended API surface
Consider making these methods private, as they are only used internally by EnsureDiskSpace(), StoreObject(), and RemoveFile().
| * performing evictions if necessary. | ||
| * | ||
| * @param required_size The amount of disk space, in bytes, required for the | ||
| * upcoming write operation. |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation: The EnsureDiskSpace method's documentation doesn't specify what error code is returned on failure or under what conditions it might fail (e.g., max eviction attempts reached, no files available to evict). This makes it difficult for callers to understand and handle different failure scenarios.
Consider documenting the return value and failure conditions:
* @return tl::expected<void, ErrorCode> Returns void on success, or
* ErrorCode::FILE_WRITE_FAIL if insufficient space remains after
* attempting evictions up to the maximum attempt limit.| * performing evictions if necessary. | |
| * | |
| * @param required_size The amount of disk space, in bytes, required for the | |
| * upcoming write operation. | |
| * performing evictions if necessary. | |
| * | |
| * @param required_size The amount of disk space, in bytes, required for the | |
| * upcoming write operation. | |
| * @return tl::expected<void, ErrorCode> Returns void on success, or | |
| * ErrorCode::FILE_WRITE_FAIL if insufficient space remains after | |
| * attempting evictions up to the maximum attempt limit, or if no | |
| * files are available to evict. | |
| * | |
| * Failure conditions: | |
| * - Max eviction attempts reached and still insufficient space. | |
| * - No files available to evict to free up space. |
| ReleaseSpace(file_size); | ||
| return file_to_evict; | ||
| } else { | ||
| if (!ec || ec == std::errc::no_such_file_or_directory) { |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic error in condition: The condition !ec || ec == std::errc::no_such_file_or_directory will always be true when ec == std::errc::no_such_file_or_directory because if ec represents no_such_file_or_directory, then !ec will be false, but the OR condition with the explicit check will still be true. However, when fs::remove returns false, ec should be set to an error. The condition should be !ec || ec == std::errc::no_such_file_or_directory but !ec means "no error", which contradicts fs::remove returning false.
The correct logic should check: ec == std::errc::no_such_file_or_directory (file didn't exist, which is acceptable) rather than also checking !ec (which would mean no error occurred, contradicting the false return from remove).
| if (!ec || ec == std::errc::no_such_file_or_directory) { | |
| if (ec == std::errc::no_such_file_or_directory) { |
| std::unique_lock<std::shared_mutex> lock(file_queue_mutex_); | ||
| file_write_queue_.clear(); | ||
| file_queue_map_.clear(); | ||
| } | ||
|
|
||
| // Iterate through the root directory and remove all files | ||
| uint64_t total_freed_space = 0; | ||
| for (const auto& entry : | ||
| fs::recursive_directory_iterator(storage_root)) { | ||
| if (fs::is_regular_file(entry.status())) { | ||
| uint64_t file_size = 0; | ||
| try { | ||
| file_size = fs::file_size(entry.path()); | ||
| } catch (const fs::filesystem_error& e) { | ||
| LOG(WARNING) << "Failed to get file size: " << e.what(); | ||
| } | ||
|
|
||
| std::error_code ec; | ||
| fs::remove(entry.path(), ec); | ||
| if (ec) { | ||
| LOG(ERROR) << "Failed to delete file: " << entry.path() | ||
| << ", error: " << ec.message(); | ||
| } else { | ||
| total_freed_space += file_size; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| ReleaseSpace(total_freed_space); |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential race condition: The file queue is cleared (lines 377-378) before the actual files are removed (lines 383-402). If a concurrent StoreObject operation adds a file to the queue after the queue is cleared but before the files are physically removed, that file will remain in the queue even though it will be deleted shortly after. This could lead to inconsistent state where the queue references files that don't exist.
Consider holding the file_queue_mutex_ for the entire duration of the RemoveAll operation, or clearing the queue after all files are removed.
| tl::expected<void, ErrorCode> StorageBackend::Init() { | ||
| if (initialized_.load(std::memory_order_acquire)) { | ||
| LOG(WARNING) << "StorageBackend is already initialized. Skipping."; | ||
| return {}; | ||
| } | ||
|
|
||
| namespace fs = std::filesystem; | ||
| std::string actual_fsdir = fsdir_; | ||
| if (actual_fsdir.rfind("moon_", 0) == 0) { | ||
| actual_fsdir = actual_fsdir.substr(5); | ||
| } | ||
| fs::path storage_root = fs::path(root_dir_) / actual_fsdir; | ||
|
|
||
| std::error_code ec; | ||
| if (!fs::exists(storage_root)) { | ||
| fs::create_directories(storage_root, ec); | ||
| if (ec) { | ||
| LOG(ERROR) << "Failed to create storage root directory: " | ||
| << storage_root; | ||
| return tl::make_unexpected(ErrorCode::INTERNAL_ERROR); | ||
| } | ||
| } | ||
| const auto space_info = fs::space(storage_root, ec); | ||
| if (ec) { | ||
| LOG(ERROR) << "Init: Failed to get disk space info: " << ec.message(); | ||
| return tl::make_unexpected(ErrorCode::INTERNAL_ERROR); | ||
| } | ||
| { | ||
| std::unique_lock<std::shared_mutex> lock(space_mutex_); | ||
|
|
||
| total_space_ = space_info.available; | ||
| available_space_ = space_info.available; | ||
| used_space_ = 0; | ||
|
|
||
| LOG(INFO) << "Init: Space tracking initialized. " | ||
| << "Initial Quota (based on current free space): " | ||
| << total_space_ << ", Available Space: " << available_space_ | ||
| << ", Used This Session: " << used_space_; | ||
| } | ||
|
|
||
| initialized_.store(true, std::memory_order_release); | ||
| return {}; |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Init() method initializes space tracking but doesn't scan the storage directory for existing files. This means:
- Existing files won't be added to the eviction queue (
file_write_queue_), so they can never be evicted - The space tracking assumes
used_space_ = 0even if files already exist in the directory, leading to incorrect available space calculations
Consider scanning the directory during initialization to populate the file queue and accurately track used space, or document that Init() should only be called on an empty directory.
| * This method scans the storage directory to build its internal state. | ||
| * It must be called after construction and before any other operations. |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation: The Init() method documentation doesn't explain:
- Whether it's idempotent (calling it multiple times)
- What happens if there are existing files in the directory
- Thread-safety guarantees
- Whether it must be called before other operations
Consider adding these details to help users understand how to properly initialize the backend.
| * This method scans the storage directory to build its internal state. | |
| * It must be called after construction and before any other operations. | |
| * | |
| * This method scans the storage directory to build its internal state. | |
| * | |
| * Idempotency: This method is idempotent; calling it multiple times has the same effect as calling it once. | |
| * | |
| * Existing files: If there are existing files in the storage directory, they will be scanned and incorporated into the internal state. | |
| * No files are deleted or overwritten during initialization. | |
| * | |
| * Thread-safety: This method is not thread-safe and should not be called concurrently from multiple threads. | |
| * It is recommended to call Init() from a single thread before performing any other operations. | |
| * | |
| * Initialization requirement: Init() must be called after construction and before any other operations. | |
| * Using other methods before successful initialization may result in undefined behavior. | |
| * |
| used_space_ = 0; | ||
| available_space_ += used_space_; |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical bug in the else branch logic: when size_to_release > used_space_, the code sets used_space_ to 0 and then adds used_space_ (which is now 0) to available_space_. This means the space is never properly released.
The correct logic should be:
} else {
available_space_ += used_space_;
used_space_ = 0;
}This ensures the old used_space_ value is added to available_space_ before resetting it to 0.
| used_space_ = 0; | |
| available_space_ += used_space_; | |
| available_space_ += used_space_; | |
| used_space_ = 0; |
|
Thanks for this great work! However, I have a few concerns:
Regarding the storage backend, @zhuxinjie-nz and I are also working on related changes. I’ll post an RFC about this soon for reference. |
This pull request introduces a disk eviction mechanism to the StorageBackend. This feature ensures that the storage system can gracefully handle situations where disk space is exhausted by automatically removing older files to make space for new writes, preventing out-of-space errors.
Currently, when the allocated disk quota is fully utilized, any new write operations (StoreObject) will fail with an out-of-space or insufficient space error. This can lead to service disruptions and requires manual intervention to clear up space.
To build a more resilient and self-managing storage system, we need an automated mechanism that can reclaim space when the storage backend is under pressure. This PR addresses this need by implementing a disk eviction strategy.