Skip to content

Conversation

@Vincent-Bo-ali
Copy link

This pull request introduces a disk eviction mechanism to the StorageBackend. This feature ensures that the storage system can gracefully handle situations where disk space is exhausted by automatically removing older files to make space for new writes, preventing out-of-space errors.

Currently, when the allocated disk quota is fully utilized, any new write operations (StoreObject) will fail with an out-of-space or insufficient space error. This can lead to service disruptions and requires manual intervention to clear up space.
To build a more resilient and self-managing storage system, we need an automated mechanism that can reclaim space when the storage backend is under pressure. This PR addresses this need by implementing a disk eviction strategy.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Vincent-Bo-ali, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the resilience of the StorageBackend by introducing an automated disk eviction mechanism. This feature ensures that the storage system can gracefully handle situations where disk space is limited, proactively clearing space by removing the oldest files to accommodate new writes. This prevents service disruptions caused by out-of-space errors and reduces the need for manual intervention, making the storage system more robust and self-managing.

Highlights

  • Disk Eviction Mechanism: Introduced a disk eviction feature to the StorageBackend, enabling automatic removal of older files to free up space when the disk quota is exhausted, preventing out-of-space errors during write operations.
  • FIFO Eviction Strategy: Implemented a First-In, First-Out (FIFO) eviction strategy, ensuring that the oldest written files are removed first to make room for new data.
  • Space Tracking and Management: Added robust disk space tracking (total, used, available) and management, including an initialization method to scan the storage directory and update internal state, and mechanisms to reserve and release space during file operations.
  • Integration with Store Operations: Integrated the disk space check and eviction logic directly into the StoreObject methods, allowing the system to automatically attempt eviction multiple times if initial write attempts fail due to insufficient space.
  • Thread Safety: Incorporated std::shared_mutex for protecting space tracking variables and the file write queue, ensuring thread-safe access and modification of these critical resources.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable disk eviction feature to the storage backend, which is a great step towards a more resilient system. The implementation correctly uses a FIFO strategy and handles space management across various file operations. However, there are several areas that require attention before this can be considered production-ready. Most critically, there's a reliance on sleep_for for synchronization, which is a significant anti-pattern and can lead to race conditions. There are also opportunities to improve maintainability by reducing code duplication, particularly in the StoreObject methods and space-updating logic. Additionally, I've identified a bug in the eviction logic that could prevent space from being reclaimed correctly, along with some minor issues regarding magic numbers, logging levels, and type consistency. Addressing these points will make the implementation more robust and easier to maintain.

Comment on lines 330 to 283
std::this_thread::sleep_for(
std::chrono::microseconds(50)); // sleep for 50 us
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using std::this_thread::sleep_for for synchronization is unreliable and is a sign of a race condition. The TODO comment acknowledges this is to allow a write thread time to create a file. This approach can lead to flaky behavior where the remove operation might still fail if the sleep duration is not long enough, or it can introduce unnecessary delays. A more robust synchronization mechanism should be implemented to ensure RemoveFile only proceeds after any pending write operations for the same path are complete. For example, since writes are submitted to a thread pool, you could maintain a map of paths to futures for pending writes.

Comment on lines 584 to 588
std::unique_lock<std::shared_mutex> lock(space_mutex_);
if (file_size <= used_space_) {
used_space_ -= file_size;
available_space_ += file_size;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The space reclamation logic here is incomplete and inconsistent with RemoveFile. If file_size > used_space_, no space is reclaimed, which is a bug. This can happen if the internal accounting of used_space_ becomes incorrect. This will cause EvictFile to not free up space, potentially leading to a failure to store a new object even when disk space could have been made available. The logic should be consistent with RemoveFile to handle this edge case correctly by resetting used_space_.

                std::unique_lock<std::shared_mutex> lock(space_mutex_);
                if (file_size <= used_space_) {
                    used_space_ -= file_size;
                    available_space_ += file_size;
                } else {
                    available_space_ += used_space_;
                    used_space_ = 0;
                }

Comment on lines 71 to 90
size_t attempts = 0;
const size_t max_attempts = 1000;

bool space_reserved = CheckDiskSpace(total_size);

while (!space_reserved && attempts < max_attempts) {
std::string evicted_file = EvictFile();
if (evicted_file.empty()) {
LOG(ERROR) << "Failed to evict file to make space.";
return tl::make_unexpected(ErrorCode::FILE_WRITE_FAIL);
}

attempts++;
space_reserved = CheckDiskSpace(total_size);
}

if (!space_reserved) {
LOG(ERROR) << "Still insufficient disk space after evicting files.";
return tl::make_unexpected(ErrorCode::FILE_WRITE_FAIL);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code containing the eviction loop is nearly identical to the one in the StoreObject overload for std::span (lines 157-176). Duplicating this logic increases maintenance overhead and the risk of introducing inconsistencies. Please consider refactoring this into a private helper method that attempts to secure the required disk space, performing evictions as needed.

}

size_t attempts = 0;
const size_t max_attempts = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The maximum number of eviction attempts is hardcoded to 1000. This magic number makes the code harder to understand and configure. If evicting 1000 small files is not enough to free up the required space, the write operation will fail. This limit should be defined as a named constant with a descriptive name (e.g., kMaxEvictionAttempts), and ideally, it should be configurable.

Suggested change
const size_t max_attempts = 1000;
const size_t kMaxEvictionAttempts = 1000;

Comment on lines 106 to 111
int64_t slices_total_size = 0;
for (const auto& slice : slices) {
iovec io{slice.ptr, slice.size};
iovs.push_back(io);
slices_total_size += slice.size;
slices_total_size += static_cast<int64_t>(slice.size);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable slices_total_size is of type int64_t, but it's used to accumulate sizes from slice.size which are of type size_t. Later, it's cast back to size_t for comparison. This mix of signed and unsigned types for sizes can be confusing and is a potential source of bugs. It would be cleaner and safer to use size_t or uint64_t consistently for all size-related calculations.

Suggested change
int64_t slices_total_size = 0;
for (const auto& slice : slices) {
iovec io{slice.ptr, slice.size};
iovs.push_back(io);
slices_total_size += slice.size;
slices_total_size += static_cast<int64_t>(slice.size);
}
size_t slices_total_size = 0;
for (const auto& slice : slices) {
iovec io{slice.ptr, slice.size};
iovs.push_back(io);
slices_total_size += slice.size;
}

Comment on lines 352 to 361
if (file_size > 0) {
std::unique_lock<std::shared_mutex> lock(space_mutex_);
if (file_size <= used_space_) {
used_space_ -= file_size;
available_space_ += file_size;
} else {
available_space_ += used_space_;
used_space_ = 0;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for updating space accounting when a file is removed is duplicated in RemoveFile (here), RemoveByRegex (lines 418-427), RemoveAll (lines 473-486), and EvictFile (lines 582-592). This duplication makes the code harder to maintain and has already led to an inconsistency in EvictFile. This logic should be extracted into a private helper function, for example ReleaseSpace(uint64_t file_size), to ensure consistency and improve maintainability.

std::string path_str = entry.path().string();

if (std::regex_search(filename, pattern)) {
paths_to_remove.push_back(entry.path());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable paths_to_remove is populated here but is never used later in the function. This line and the variable's declaration on line 383 should be removed to improve code clarity.

bool StorageBackend::CheckDiskSpace(size_t required_size) {
std::unique_lock<std::shared_mutex> lock(space_mutex_);

if (!initialized_.load(std::memory_order_relaxed)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::memory_order_relaxed for checking the initialized_ flag is subtle. While it might be correct in the current call chain from StoreObject (which uses an acquire load), it makes CheckDiskSpace less safe if it's ever called from a different context. For improved clarity and to prevent potential future race conditions, it's safer to use std::memory_order_acquire. The performance impact of using a stronger memory order here is negligible.

Suggested change
if (!initialized_.load(std::memory_order_relaxed)) {
if (!initialized_.load(std::memory_order_acquire)) {

std::string StorageBackend::SelectFileToEvictByFIFO() {
std::unique_lock<std::shared_mutex> lock(file_queue_mutex_);
if (file_write_queue_.empty()) {
LOG(ERROR) << "Queue is empty, cannot select file to evict";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging an ERROR when the eviction queue is empty might be too severe, as this can be a normal state (e.g., when the storage is empty). This can create unnecessary noise in error logs. Consider changing the log level to WARNING or INFO, especially since the calling function EvictFile already logs a WARNING in this case.

        LOG(WARNING) << "Queue is empty, cannot select file to evict";

@XucSh XucSh requested a review from stmatengss November 6, 2025 09:23
@stmatengss stmatengss requested a review from XucSh November 6, 2025 10:42
@XucSh XucSh self-assigned this Nov 6, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the lint issue using clang-format

{
std::unique_lock<std::shared_mutex> lock(space_mutex_);

total_space_ = space_info.available;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_space_ should be a configurable quota

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_space_ = min(space_info.available, user defined space)


total_space_ = space_info.available;
available_space_ = space_info.available;
used_space_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded used_space_ may be a wrong value when service restarts from a crash.

namespace fs = std::filesystem;
uint64_t file_size = 0;
std::error_code ec;
file_size = fs::file_size(file_to_evict, ec);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this size can be maintained by yourself for performance

while (it != file_write_queue_.end()) {
std::string file_to_evict = *it;

if (std::filesystem::exists(file_to_evict)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same above

* @param path Path of the file to remove from queue
*/
void RemoveFileFromWriteQueue(const std::string& path);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three func should be private

if (actual_fsdir.rfind("moon_", 0) == 0) {
actual_fsdir = actual_fsdir.substr(5);
}
fs::path storage_root = fs::path(root_dir_) / actual_fsdir;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These block is same as the code in StorageBackend::Init. Make it a helper


void StorageBackend::AddFileToWriteQueue(const std::string& path) {

std::unique_lock<std::shared_mutex> lock(file_queue_mutex_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is necessary. the stdc++ have guargnteed for us ?

Signed-off-by: Vincent Gao <vincentbo@linux.alibaba.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds an automated disk eviction mechanism to the StorageBackend to handle disk space exhaustion by automatically removing older files when space is needed. The implementation uses a FIFO (First-In-First-Out) eviction strategy and introduces space quota tracking to prevent out-of-space errors.

Key changes:

  • Added Init() method to initialize space tracking based on available disk space
  • Implemented FIFO-based file eviction with queue management for tracking write order
  • Modified StoreObject() methods to reserve space before writes and release on failures

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
mooncake-store/include/storage_backend.h Added new public API Init(), eviction methods, and private members for space tracking (mutexes, space counters, file queue)
mooncake-store/src/storage_backend.cpp Implemented disk eviction logic including space reservation/release, FIFO file selection, and updated file removal methods to track freed space
mooncake-store/src/client.cpp Added call to StorageBackend::Init() in PrepareStorageBackend() to initialize the space tracking system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

file->vector_write(iovs.data(), static_cast<int>(iovs.size()), 0);
if (!write_result) {
LOG(ERROR) << "vector_write failed for: " << path
LOG(ERROR) << "Vector_write failed for: " << path
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization inconsistency: The log message uses "Vector_write" with a capital 'V', which is inconsistent with standard function naming conventions and the original error message "vector_write" (which would match the actual function name). This makes the error message look like a typo.

Consider keeping it lowercase: "vector_write failed for: "

Suggested change
LOG(ERROR) << "Vector_write failed for: " << path
LOG(ERROR) << "vector_write failed for: " << path

Copilot uses AI. Check for mistakes.
Comment on lines +470 to +472
if (has_enough_space) {
used_space_ += required_size;
available_space_ -= required_size;
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Potential integer underflow: The function checks available_space_ >= required_size before subtracting, which is good. However, there's no check to ensure used_space_ + required_size won't overflow. If required_size is very large, adding it to used_space_ could overflow a uint64_t.

Consider adding overflow protection:

if (used_space_ > UINT64_MAX - required_size) {
    LOG(ERROR) << "Space tracking overflow detected";
    return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +197 to +213
/**
* @brief Evicts a file based on FIFO order (earliest written first out)
* @return Path of the evicted file, or empty string if no file was evicted
*/
std::string EvictFile();

/**
* @brief Add file to write queue for FIFO tracking
* @param path Path of the file to add to queue
*/
void AddFileToWriteQueue(const std::string& path);

/**
* @brief Remove file from write queue
* @param path Path of the file to remove from queue
*/
void RemoveFileFromWriteQueue(const std::string& path);
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] API design issue: EvictFile(), AddFileToWriteQueue(), and RemoveFileFromWriteQueue() are declared as public methods, but they are internal implementation details of the disk eviction mechanism. Exposing these methods publicly could lead to:

  1. External code incorrectly manipulating the eviction queue
  2. Breaking internal invariants (e.g., calling EvictFile without updating space tracking)
  3. Confusion about the intended API surface

Consider making these methods private, as they are only used internally by EnsureDiskSpace(), StoreObject(), and RemoveFile().

Copilot uses AI. Check for mistakes.
Comment on lines +273 to +276
* performing evictions if necessary.
*
* @param required_size The amount of disk space, in bytes, required for the
* upcoming write operation.
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation: The EnsureDiskSpace method's documentation doesn't specify what error code is returned on failure or under what conditions it might fail (e.g., max eviction attempts reached, no files available to evict). This makes it difficult for callers to understand and handle different failure scenarios.

Consider documenting the return value and failure conditions:

 * @return tl::expected<void, ErrorCode> Returns void on success, or 
 *         ErrorCode::FILE_WRITE_FAIL if insufficient space remains after 
 *         attempting evictions up to the maximum attempt limit.
Suggested change
* performing evictions if necessary.
*
* @param required_size The amount of disk space, in bytes, required for the
* upcoming write operation.
* performing evictions if necessary.
*
* @param required_size The amount of disk space, in bytes, required for the
* upcoming write operation.
* @return tl::expected<void, ErrorCode> Returns void on success, or
* ErrorCode::FILE_WRITE_FAIL if insufficient space remains after
* attempting evictions up to the maximum attempt limit, or if no
* files are available to evict.
*
* Failure conditions:
* - Max eviction attempts reached and still insufficient space.
* - No files available to evict to free up space.

Copilot uses AI. Check for mistakes.
ReleaseSpace(file_size);
return file_to_evict;
} else {
if (!ec || ec == std::errc::no_such_file_or_directory) {
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic error in condition: The condition !ec || ec == std::errc::no_such_file_or_directory will always be true when ec == std::errc::no_such_file_or_directory because if ec represents no_such_file_or_directory, then !ec will be false, but the OR condition with the explicit check will still be true. However, when fs::remove returns false, ec should be set to an error. The condition should be !ec || ec == std::errc::no_such_file_or_directory but !ec means "no error", which contradicts fs::remove returning false.

The correct logic should check: ec == std::errc::no_such_file_or_directory (file didn't exist, which is acceptable) rather than also checking !ec (which would mean no error occurred, contradicting the false return from remove).

Suggested change
if (!ec || ec == std::errc::no_such_file_or_directory) {
if (ec == std::errc::no_such_file_or_directory) {

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +404
std::unique_lock<std::shared_mutex> lock(file_queue_mutex_);
file_write_queue_.clear();
file_queue_map_.clear();
}

// Iterate through the root directory and remove all files
uint64_t total_freed_space = 0;
for (const auto& entry :
fs::recursive_directory_iterator(storage_root)) {
if (fs::is_regular_file(entry.status())) {
uint64_t file_size = 0;
try {
file_size = fs::file_size(entry.path());
} catch (const fs::filesystem_error& e) {
LOG(WARNING) << "Failed to get file size: " << e.what();
}

std::error_code ec;
fs::remove(entry.path(), ec);
if (ec) {
LOG(ERROR) << "Failed to delete file: " << entry.path()
<< ", error: " << ec.message();
} else {
total_freed_space += file_size;
}
}
}

ReleaseSpace(total_freed_space);
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition: The file queue is cleared (lines 377-378) before the actual files are removed (lines 383-402). If a concurrent StoreObject operation adds a file to the queue after the queue is cleared but before the files are physically removed, that file will remain in the queue even though it will be deleted shortly after. This could lead to inconsistent state where the queue references files that don't exist.

Consider holding the file_queue_mutex_ for the entire duration of the RemoveAll operation, or clearing the queue after all files are removed.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +57
tl::expected<void, ErrorCode> StorageBackend::Init() {
if (initialized_.load(std::memory_order_acquire)) {
LOG(WARNING) << "StorageBackend is already initialized. Skipping.";
return {};
}

namespace fs = std::filesystem;
std::string actual_fsdir = fsdir_;
if (actual_fsdir.rfind("moon_", 0) == 0) {
actual_fsdir = actual_fsdir.substr(5);
}
fs::path storage_root = fs::path(root_dir_) / actual_fsdir;

std::error_code ec;
if (!fs::exists(storage_root)) {
fs::create_directories(storage_root, ec);
if (ec) {
LOG(ERROR) << "Failed to create storage root directory: "
<< storage_root;
return tl::make_unexpected(ErrorCode::INTERNAL_ERROR);
}
}
const auto space_info = fs::space(storage_root, ec);
if (ec) {
LOG(ERROR) << "Init: Failed to get disk space info: " << ec.message();
return tl::make_unexpected(ErrorCode::INTERNAL_ERROR);
}
{
std::unique_lock<std::shared_mutex> lock(space_mutex_);

total_space_ = space_info.available;
available_space_ = space_info.available;
used_space_ = 0;

LOG(INFO) << "Init: Space tracking initialized. "
<< "Initial Quota (based on current free space): "
<< total_space_ << ", Available Space: " << available_space_
<< ", Used This Session: " << used_space_;
}

initialized_.store(true, std::memory_order_release);
return {};
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Init() method initializes space tracking but doesn't scan the storage directory for existing files. This means:

  1. Existing files won't be added to the eviction queue (file_write_queue_), so they can never be evicted
  2. The space tracking assumes used_space_ = 0 even if files already exist in the directory, leading to incorrect available space calculations

Consider scanning the directory during initialization to populate the file queue and accurately track used space, or document that Init() should only be called on an empty directory.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +122
* This method scans the storage directory to build its internal state.
* It must be called after construction and before any other operations.
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation: The Init() method documentation doesn't explain:

  1. Whether it's idempotent (calling it multiple times)
  2. What happens if there are existing files in the directory
  3. Thread-safety guarantees
  4. Whether it must be called before other operations

Consider adding these details to help users understand how to properly initialize the backend.

Suggested change
* This method scans the storage directory to build its internal state.
* It must be called after construction and before any other operations.
*
* This method scans the storage directory to build its internal state.
*
* Idempotency: This method is idempotent; calling it multiple times has the same effect as calling it once.
*
* Existing files: If there are existing files in the storage directory, they will be scanned and incorporated into the internal state.
* No files are deleted or overwritten during initialization.
*
* Thread-safety: This method is not thread-safe and should not be called concurrently from multiple threads.
* It is recommended to call Init() from a single thread before performing any other operations.
*
* Initialization requirement: Init() must be called after construction and before any other operations.
* Using other methods before successful initialization may result in undefined behavior.
*

Copilot uses AI. Check for mistakes.
Comment on lines +601 to +602
used_space_ = 0;
available_space_ += used_space_;
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical bug in the else branch logic: when size_to_release > used_space_, the code sets used_space_ to 0 and then adds used_space_ (which is now 0) to available_space_. This means the space is never properly released.

The correct logic should be:

} else {
    available_space_ += used_space_;
    used_space_ = 0;
}

This ensures the old used_space_ value is added to available_space_ before resetting it to 0.

Suggested change
used_space_ = 0;
available_space_ += used_space_;
available_space_ += used_space_;
used_space_ = 0;

Copilot uses AI. Check for mistakes.
@ykwd
Copy link
Collaborator

ykwd commented Nov 13, 2025

Thanks for this great work! However, I have a few concerns:

  1. Before removing a file, the master should be notified; otherwise, the master will still think the object exists and retains its replica information. (We plan to address this in a future PR.)
  2. The StorageBackend class is not only used for local storage — it’s also used for 3FS. In distributed deployments, multiple clients may share the same directory, and this PR doesn’t seem to account for that scenario. We might need to either handle it as a special case or separate the eviction logic into its own component.

Regarding the storage backend, @zhuxinjie-nz and I are also working on related changes. I’ll post an RFC about this soon for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants