-
Notifications
You must be signed in to change notification settings - Fork 433
Description
Milestone 1: Core Architecture Refactor & Decoupling
This milestone focuses on foundational architectural changes to improve modularity, flexibility, and prepare for future scaling.
- (TE/Store Separation): Decouple the TE (Task/Tensor Engine) and Store components into separate, independent packages.
- (Client/Worker Decoupling): Decouple the dummy client from the worker to remove strong dependencies.
- (Flexible Deployment): Update the Store to support various flexible deployment models, such as client-only, client + master, etc.
- (Tensor-native APIs): Put/Get Tensor APIs contains TP rank and model info.
Milestone 2: Master Service Enhancements
This milestone enhances the Master component to support new storage architectures and routing logic.
- (Key-based Routing): Implement new key-based routing capabilities in the Master service.
- (Metadata Adaptation - Storage): Adapt the Master's metadata management to support the new multi-level storage architecture.
- (Recovery) kv metadata persistency
- (KVCache Awareness Interface) Exposes hit ratio for different layers.
- (Metadata Adaptation - HA): Upgrade metadata schema and logic to meet new High Availability (HA) requirements.
- (Multi-tenant): Support Multi-tenant with different models, users and auth keys
Milestone 3: Worker: Multi-Level Storage Architecture
This is a major epic to build the next-generation multi-level storage system within the Worker.
-
3.1: Abstraction & Caching
- (Storage Abstraction Layer): Design and implement the core abstraction layer for multi-level storage.
- (Cache Scheduling Interface): Design the abstract interface for cache scheduling logic.
- (Eviction Logic): Implement basic data eviction logic within the new storage architecture. [store] Add disk eviction feature #1028
- (LRU Cache): Implement an LRU (Least Recently Used) policy as the default cache scheduling strategy.
- (Local Client Cache): Keep a local cache for better performance. [RFC]: Add Local Cache Mechanism for Mooncake Store Client #1062
-
3.2: Storage Backend Implementation
- (DRAM Adaptation): Adapt the storage layer for DRAM, including support for NUMA affinity.
- (SSD Adaptation): Adapt the storage layer for SSDs, enabling local external storage read/write capabilities. [RFC] Contribute Local Storage to Distributed Pool #1054
- (VRAM Adaptation): Adapt the storage layer to utilize VRAM.
- (Huawei NPU Adaptation): Implement support for Huawei NPUs (H2D).
-
3.3: Elastic KVCache Storage
Milestone 4: Worker: Networking & Elasticity
This milestone focuses on refactoring worker communication and enabling resource elasticity.
- (RPC Refactor): [Phase 1] Refactor the worker's read/write logic to replace RDMA with RPC-based communication.
- (Barex Transport Support): Support Alibaba barex transport in TE for Mooncake Store. feat[accl-barex]: add barex_transport by build with USE_BAREX #1045
- (Resource Elasticity): Implement single-worker resource elasticity.
- (Event‑driven completion): Provide an option to using event-driven notification worker instead of busy-polling. [Performance]: High CPU usage due to busy-wait in TransferEngineOperationState::wait_for_completion #1033 [Store|TransferEngine]: use condition-variable based completion instead of busy-polling #1053
- (IPv6 Support): Support IPv6 in client, master and metadata server. [Bug]: Error in IPv6 support for mooncake 0.3.6 on vLLM 0.11.0 #1043 [WIP]TCP transport support ipv6 #1067
Milestone 5: Deployment & Operations
This milestone covers K8s integration (i.e., RBG, https://github.com/sgl-project/rbg) and build process improvements.
- (K8s Autoscaling): Implement support for Kubernetes-based autoscaling of worker and dummy client instances.
- (Scenario-based Builds): Implement a build system capable of producing different worker binaries optimized for different scenarios.
- (Integration With AI Configurator): Use AI Configurator for better measuring Resource workers and other configurations.
- (Deployment Documentation & Guides): Create comprehensive, up-to-date deployment documentation and step-by-step setup guides to simplify installation and configuration for all environments.
Milestone 6: CI & CD enhancement
- (End-to-end CI tests): For SGLang, support Hicache, PD, Elatics EP, checkpoint engine tests.
Milestone 7: Performance & Benchmarks
- (Store Master Benchmark): Design and integrate a dedicated benchmark for the Mooncake store master module to evaluate throughput, latency, and scalability.
Thanks for being a part of the Mooncake community! Welcome to discuss and contribute!
If you have any ideas, just leave a comment below and help shape the Roadmap.