Production-ready Mamba 1 & Mamba 2 implementation optimized for Apple Silicon with official pre-trained models
- Mamba 1 & 2 Support - Inference of both architectures with pretrained models from Hugging Face
- Text Generation - Coherent, contextual text generation
- Apple Silicon Support - MPS acceleration for M1/M2/M3/M4
- Dependency Management - Works without CUDA/Triton requirements
- Error Handling - Robust error handling and fallbacks for both architectures
- Multiple Interfaces - CLI, Python API, interactive demos
# 1. Clone and install
git clone https://github.com/purohit10saurabh/mamba-ssm-macos.git
cd mamba-ssm-macos
pip install -r requirements.txt
# 2. Download models
python -m scripts.download_models mamba1 # Mamba 1 (493MB)
python -m scripts.download_models mamba2 # Mamba 2 (493MB)
# 3. Generate text immediately
make run-mamba1 # Quick Mamba 1 demo
make run-mamba2 # Quick Mamba 2 demo
python -m examples.01_demo # Interactive showcase- Architecture Comparison
- Installation
- Usage Examples
- Performance
- Generated Examples
- Repository Structure
- Advanced Usage
- Troubleshooting
- References
- Contributing
| Feature | Mamba 1 | Mamba 2 |
|---|---|---|
| Architecture | SSM (Selective State Space) | SSD (State Space Dual) |
| Training Speed | Standard | ~2x faster |
| State Dimension | 16 | 128 (8x larger) |
| Multi-head | No | Yes (via ngroups) |
| Memory Efficiency | Good | Better |
| Generation Quality | High | Higher |
| Model Size | 129M params | 129M params |
- macOS 12.3+ with Apple Silicon (M1/M2/M3/M4)
- Python 3.8+
- 8GB+ RAM recommended
# Clone repository
git clone https://github.com/purohit10saurabh/mamba-ssm-macos.git
cd mamba-ssm-macos
# Install dependencies (includes PyTorch with MPS support)
pip install -r requirements.txt
# Verify MPS support
python -c "import torch; print('MPS Available:', torch.backends.mps.is_available())"make download-models # Downloads both Mamba 1 & 2python -m scripts.download_models mamba1 # Mamba 1 (original)
python -m scripts.download_models mamba2 # Mamba 2 (latest)python -m examples.01_demo --interactive # Try both models
python -m examples.01_demo --show-structure # See organizationmake run-mamba1 # Quick Mamba 1 demo
make run-mamba2 # Quick Mamba 2 demo
make test-quick # Fast integration test
make show-structure # Show directory layout# Basic generation
python -m scripts.run_models mamba1 --prompt "The future of AI" --max-length 50
python -m scripts.run_models mamba2 --prompt "The future of AI" --max-length 30
# Custom parameters
python -m scripts.run_models mamba1 --prompt "Once upon a time" --temperature 0.8# New organized import structure
from mamba_macos import get_device, load_and_prepare_model, generate_text_with_model
# Load any model
device = get_device() # Automatically detects MPS/CPU
success, model, tokenizer = load_and_prepare_model("mamba1", "./models", device)
if success:
text = generate_text_with_model(
model, tokenizer, "The future of AI", device, max_length=50, temperature=0.7
)
print(text)python -m examples.01_demo # Interactive demo
python -m examples.02_basic # Basic API usage| Model | Loading | Generation | Memory | Quality |
|---|---|---|---|---|
| Mamba 1 | ~1.0s | 3-8 tok/s | ~2GB | Good |
| Mamba 2 | ~1.0s | 3-6 tok/s | ~2GB | Better |
# Test performance
make test-quickMamba 2 Advantages:
- Similar loading speed
- Better context understanding (d_state=128 vs 16)
- Higher quality output (SSD architecture)
- More efficient training (~2x faster during training)
"The future of artificial intelligence is a big topic in the field of artificial intelligence."
"Once upon a time, there was a man named John."
"Python is a programming language that is used to create and manipulate objects."
"The capital of France is a city of the French, and the"
"The future of AI is not in limited solipsistic computing, but in densely-connected
and much richer data. In the next decade, we may be able to take advantage..."
"Once upon a time, in a land far away, there lived one lonely woman, who was
much respected among wolves. She resided at a rendezvous called Buguqrach..."
mamba-ssm-macos/
├── 📦 src/mamba_macos/ # 🆕 Core library (clean imports)
│ ├── __init__.py # Package exports & version
│ ├── utils.py # Device, tokenizer, generation
│ └── models.py # Model loading & preparation
│
├── 🔧 scripts/ # 🆕 Utility scripts
│ ├── download_models.py # Download both models
│ └── run_models.py # Run models with arguments
│
├── 🧪 tests/ # 🆕 Organized test suite
│ ├── unit/ # Component-level tests
│ │ ├── test_mamba_macos.py # Mamba 1 unit tests
│ │ ├── test_mamba2_macos.py # Mamba 2 unit tests
│ │ └── test_generation_macos.py # Generation tests
│ └── integration/ # End-to-end tests
│ └── test_unified_system.py # Complete workflow tests
│
├── 📚 examples/ # 🆕 Curated examples
│ ├── 01_demo.py # 🎯 START HERE - Production demo
│ └── 02_basic.py # Basic forward pass
│ └── README.md # Examples guide
│
├── ⚙️ config/ # 🆕 Configuration files
│ ├── pyproject.toml # Python project config
│ └── setup.py # Package setup
│
├── 🛠️ tools/ # 🆕 Development tools
│ └── run_all_tests.py # Test runner
│
├── 🤖 models/ # Downloaded models
│ ├── mamba1/ # Mamba 1 files
│ └── mamba2/ # Mamba 2 files
│
├── mamba_ssm/ # Core implementation
│ ├── models/ & modules/ # Model architectures
│ └── ... # (Unchanged)
│
├── 📋 Makefile # 🆕 Development commands
├── 📋 requirements.txt # 🆕 Dependencies
├── 📋 PROJECT_STRUCTURE.md # 🆕 Structure documentation
└── 📖 README.md # This file
# Mamba 2 custom config
config = MambaConfig(
d_model=768,
n_layer=24,
d_state=128, # Larger state space
headdim=64, # Head dimension
expand=2, # Expansion factor
ssm_cfg={"layer": "Mamba2", "d_state": 128},
vocab_size=50288
)
# Mamba 1 custom config
config = MambaConfig(
d_model=768,
n_layer=24,
d_state=16, # Smaller state space
ssm_cfg={"layer": "Mamba1"},
vocab_size=50280
)prompts = ["Prompt 1", "Prompt 2", "Prompt 3"]
for prompt in prompts:
# Process each prompt
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids)
print(tokenizer.decode(outputs[0]))# Prepare for fine-tuning
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
# Your training loop here
for batch in dataloader:
outputs = model(batch['input_ids'], labels=batch['labels'])
loss = outputs.loss
loss.backward()
optimizer.step()# Download models using new structure
make download-models # Both models
python -m scripts.download_models mamba1 # Mamba 1 only
python -m scripts.download_models mamba2 # Mamba 2 only# Check MPS support
python -c "import torch; print(torch.backends.mps.is_available())"
# If false, model will automatically use CPU# Use new module structure
python -m examples.01_demo
# Or run with clean imports
from mamba_macos import get_device, load_and_prepare_model- ✅ First run is slower (model loading + compilation)
- ✅ Use shorter prompts for testing
- ✅ Close other apps to free memory
- ✅ Check Activity Monitor for memory usage
UserWarning: selective_scan_cuda module is not available
UserWarning: Triton is not available
These are expected - we use optimized PyTorch fallbacks.
- 📖 Read the docs: Check
PROJECT_STRUCTURE.mdfor organization details - 🧪 Run tests:
make test-quickormake test - 🔍 Check examples:
python -m examples.01_demo --show-structure - 🐛 Report issues: Create GitHub issue with error details
# 1. Download models
make download-models
# 2. Test basic functionality
make run-mamba1
# 3. Explore interactively
python -m examples.01_demo# Use Python API
python -m examples.02_basic
# Custom generation
python -m scripts.run_models mamba1 --prompt "Your text"- State Space Dual (SSD) architecture from official state-spaces/mamba
- Stable cumulative scan for numerical stability
- Multi-head processing with ngroups design × 64 headdim
- Larger state space (d_state=128) for better memory
- Einsum operations for efficient tensor computations
- MPS optimization for Apple Silicon acceleration
- Selective State Space Model (SSM) architecture
- Triton-free operation with PyTorch fallbacks
- Graceful degradation when optimizations unavailable
- Memory efficient selective scan implementation
- Compatible with original mamba-130m weights
- Download model: Choose Mamba 1 or 2
- Test functionality: Run example scripts
- Try your prompts: Experiment with generation
- Read examples: Learn from provided demos
- Fine-tune models: Train on your data
- Build applications: Use as text generation backend
- Contribute: Improve implementation or docs
- Research: Experiment with architectures
@article{gu2023mamba,
title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces},
author={Gu, Albert and Dao, Tri},
journal={arXiv preprint arXiv:2312.00752},
year={2023},
url={https://arxiv.org/abs/2312.00752}
}@article{dao2024transformers,
title={Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality},
author={Dao, Tri and Gu, Albert},
journal={arXiv preprint arXiv:2405.21060},
year={2024},
url={https://arxiv.org/abs/2405.21060}
}- 🔗 Mamba 1 & 2: state-spaces/mamba - Original PyTorch implementation
- 🤗 Mamba 2 Model: state-spaces/mamba2-130m - Pre-trained weights
- 🔬 State Space Models: state-spaces/s4 - Foundational SSM research
- Selective State Spaces: Gu et al., 2022 - S4 foundation
- Hungry Hungry Hippos: Fu et al., 2023 - H3 architecture
- Apple Silicon: PyTorch MPS Guide - Metal Performance Shaders
We welcome contributions! Areas for improvement:
- 🐛 Bug fixes: Report and fix issues
- 📚 Documentation: Improve guides and examples
- ⚡ Performance: Optimize for specific hardware
- 🆕 Features: Add new capabilities
- 🧪 Testing: Expand test coverage
git clone https://github.com/purohit10saurabh/mamba-ssm-macos.git
cd mamba-ssm-macos
pip install -e ".[dev]"
pytest tests/Apache 2.0 License - see LICENSE file.
Optimized for Apple Silicon • Pure Python • Production Ready
Start with python -m examples.01_demo and explore from there! ⬆️