[RFC] Contribute Local Storage to Distributed Pool

### Changes proposed

This RFC proposes the design of a new feature: support contributing local SSD storage to the distributed store pool, which introduces a unified storage backend interface with multiple backends, file storage Get/Put/Eviction workflows, and master coordination.

## Motivation

Currently, Mooncake relies heavily on in-memory storage for fast access. However, memory capacity is limited and expensive. Adding support for **local SSD-based storage** enables:

* Larger total storage capacity.
* Tiered data access
* Improved data persistence and recovery.
* Flexibility in resource allocation between heterogeneous clients.

## Design Overview

### 1. Storage Backend: Unified Interface

All backends will implement a unified storage interface to simplify integration and extension.
We plan to support three backend implementations:

1. **File-per-key Backend** (Already supported)

   * Each key-value pair is stored in a separate file.
   * Simple but inefficient for large-scale data.

2. **Bucket Backend** @zhuxinjie-nz 

   * Multiple key-value pairs are grouped into a single file (“bucket”).
   * Write and eviction are performed at the bucket level, improving I/O efficiency.

3. **OffsetAllocator Backend**

   * A large pre-allocated file acts as a storage arena.
   * Space is allocated and released using an `OffsetAllocator`.
   * Enables efficient space management and fewer file operations.

### 2. Get Workflow @zhuxinjie-nz 

1. **Client A** issues a `Get` request.
2. **Master** returns the replica information.
3. Client A attempts to read from a **memory replica** first.
4. If only **remote file replica** exists (located on Client B), then:

   * A sends an **RPC** to B.
   * B reads the requested data from **local SSD** into its **local buffer memory**.
   * B returns the **buffer address** to A. B will ensure this address will not write other data before a certain time (e.g. 5s).
   * A performs an **RDMA read** to fetch the data directly.
   * If the read completes before timeout →  A returns **OK**.
   * If it times out → A returns **Error**.
  
### 3. Put Workflow @zhuxinjie-nz 

1. **Client A** issues a `Put` request.
2. **Master** assigns the target replica to **Client B**.
3. A performs an **RDMA write** to B.
4. Once the write succeeds, A sends a **PutEnd** notification to Master.
5. Upon receiving `PutEnd`, Master adds the key to B’s **persistence queue**.
6. B periodically requests pending persistence tasks from Master.
7. B obtains the persistence request, performs a **BatchGetReplica** to acquire the **lease**, and writes the data to **local SSD**.
8. Once successfully persisted, B notifies Master with another **PutEnd** message.

#### Load Balancing Considerations

Clients may have heterogeneous resource configurations:

* Client B: large memory, limited or no SSD.
* Client C: minimal memory, large SSD.

To optimize for this diversity, future versions will enhance Master’s scheduling logic to assign persistence tasks to the most suitable clients. The rest of the flow remains unchanged.

### 4. Eviction Workflow

1. Each client manages its **local SSD storage** usage.
2. When nearing capacity, the client initiates **Eviction**:

   * The client sends a `Remove` request to Master.
   * Upon successful confirmation, the client deletes the local data.
  
### 5. Initialization Workflow

Upon startup, each client:

   * Reads **local file metadata**.
   * Validates file integrity.
   * Reports valid replicas back to Master via `Put` requests.

This ensures that valid persisted data is re-registered after restarts or failures.

### 6. Master Modifications

The Master component requires several extensions:

1. Associate **SSD segment information** to each client.
2. If a client has an SSD, associate it with a **persistence queue**.
3. The **replica metadata** structure will record ip address and replica size, which provides necessary information for clients to issue remote file read.


---

This feature is under active development. Suggestions and contributions are appreciated.

Related PR: https://github.com/kvcache-ai/Mooncake/pull/968 https://github.com/kvcache-ai/Mooncake/pull/1028 https://github.com/kvcache-ai/Mooncake/pull/1031 

### Before submitting a new issue...

- [ ] Make sure you already searched for relevant issues and read the [documentation](https://kvcache-ai.github.io/Mooncake/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Contribute Local Storage to Distributed Pool #1054

Changes proposed

Motivation

Design Overview

1. Storage Backend: Unified Interface

2. Get Workflow @zhuxinjie-nz

3. Put Workflow @zhuxinjie-nz

Load Balancing Considerations

4. Eviction Workflow

5. Initialization Workflow

6. Master Modifications

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Contribute Local Storage to Distributed Pool #1054

Description

Changes proposed

Motivation

Design Overview

1. Storage Backend: Unified Interface

2. Get Workflow @zhuxinjie-nz

3. Put Workflow @zhuxinjie-nz

Load Balancing Considerations

4. Eviction Workflow

5. Initialization Workflow

6. Master Modifications

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions