Switch between multiple Claude accounts, GLM, and Kimi instantly.
Stop hitting rate limits. Keep working continuously.
Languages: English | Tiếng Việt | 日本語
npm Package (Recommended)
macOS / Linux / Windows
npm install -g @kaitranntt/ccsAll major package managers are supported:
# yarn
yarn global add @kaitranntt/ccs
# pnpm (70% less disk space)
pnpm add -g @kaitranntt/ccs
# bun (30x faster)
bun add -g @kaitranntt/ccsAlternative: Direct Install (Traditional)
macOS / Linux
curl -fsSL ccs.kaitran.ca/install | bashWindows PowerShell
irm ccs.kaitran.ca/install | iexNote: Traditional installs bypass Node.js routing for faster startup, but npm is prioritized for easier deployment automation.
CCS automatically creates configuration during installation (via npm postinstall script).
~/.ccs/config.json:
{
"profiles": {
"glm": "~/.ccs/glm.settings.json",
"glmt": "~/.ccs/glmt.settings.json",
"kimi": "~/.ccs/kimi.settings.json",
"default": "~/.claude/settings.json"
}
}If Claude CLI is installed in a non-standard location (D drive, custom directory), set CCS_CLAUDE_PATH:
export CCS_CLAUDE_PATH="/path/to/claude" # Unix
$env:CCS_CLAUDE_PATH = "D:\Tools\Claude\claude.exe" # WindowsSee also: Troubleshooting Guide for detailed setup instructions.
Windows users: Enable Developer Mode for true symlinks (better performance, instant sync):
- Open Settings → Privacy & Security → For developers
- Enable Developer Mode
- Reinstall CCS:
npm install -g @kaitranntt/ccs
Warning: Without Developer Mode, CCS automatically falls back to copying directories (works but no instant sync across profiles).
Important
Before using alternative models, update API keys in settings files:
- GLM: Edit
~/.ccs/glm.settings.jsonand add your Z.AI Coding Plan API Key - GLMT: Edit
~/.ccs/glmt.settings.jsonand add your Z.AI Coding Plan API Key - Kimi: Edit
~/.ccs/kimi.settings.jsonand add your Kimi API key
Parallel Workflow: Planning + Execution
# Terminal 1 - Planning (Claude Sonnet)
ccs "Plan a REST API with authentication and rate limiting"
# Terminal 2 - Execution (GLM, cost-optimized)
ccs glm "Implement the user authentication endpoints from the plan"Thinking Models (Kimi & GLMT)
# Kimi - Stable thinking support
ccs kimi "Design a caching strategy with trade-off analysis"
# GLMT - Experimental (see full disclaimer below)
ccs glmt "Debug complex algorithm with reasoning steps"Note: GLMT is experimental and unstable. See GLM with Thinking (GLMT) section below for full details.
You're deep in implementation. Context loaded. Solution crystallizing.
Then: 🔴 "You've reached your usage limit."
Momentum gone. Context lost. Productivity crater.
❌ OLD WAY: Switch When You Hit Limits (Reactive)
graph LR
A[2pm: Building features<br/>In the zone] --> B[3pm: Usage limit hit<br/>BLOCKED]
B --> C[3:05pm: Stop work<br/>Edit settings.json]
C --> D[3:15pm: Switch accounts<br/>Context lost]
D --> E[3:30pm: Restart<br/>Trying to focus]
E --> F[4pm: Finally productive<br/>Back in flow]
style A fill:#d4edda,stroke:#333,color:#000
style B fill:#f8d7da,stroke:#333,color:#000
style C fill:#fff3cd,stroke:#333,color:#000
style D fill:#f8d7da,stroke:#333,color:#000
style E fill:#fff3cd,stroke:#333,color:#000
style F fill:#d4edda,stroke:#333,color:#000
Result: 1 hour lost, momentum destroyed, frustration builds
✨ NEW WAY: Run Parallel From Start (Proactive) - RECOMMENDED
graph LR
A[2pm: Start work] --> B[Terminal 1: Claude Pro<br/>Strategic planning]
A --> C[Terminal 2: GLM<br/>Code execution]
B --> D[3pm: Still shipping<br/>No interruptions]
C --> D
D --> E[4pm: Flow state<br/>Productivity peak]
E --> F[5pm: Features shipped<br/>Context maintained]
style A fill:#e7f3ff,stroke:#333,color:#000
style B fill:#cfe2ff,stroke:#333,color:#000
style C fill:#cfe2ff,stroke:#333,color:#000
style D fill:#d4edda,stroke:#333,color:#000
style E fill:#d4edda,stroke:#333,color:#000
style F fill:#d4edda,stroke:#333,color:#000
Result: Zero downtime, continuous productivity, less frustration
- Setup: Your existing Claude Pro + GLM Lite (cost-effective add-on)
- Value: Save 1 hour/day × 20 workdays = 20 hours/month recovered
- ROI: Your development time is worth more than the setup cost
- Reality: Shipping faster than the overhead
Budget-Focused: GLM Only
- Best for: Cost-conscious development, basic code generation
- Usage: Just use
ccs glmdirectly for cost-effective AI assistance - Reality: No Claude access, but capable for many coding tasks
- Setup: GLM API key only, very affordable
✨ Recommended for Daily Development: 1 Claude Pro + 1 GLM Lite
- Best for: Daily code delivery, serious development work
- Usage:
ccsfor planning +ccs glmfor execution (parallel workflow) - Reality: Perfect balance of capability and cost for most developers
- Value: Never hit session limits, continuous productivity
Power User: Multiple Claude Pro + GLM Pro
- Best for: Heavy workloads, concurrent projects, solo dev
- Unlocks: Never drain session or weekly limits
- Workflow: 3+ terminals running specialized tasks simultaneously
Privacy-Focused: Work/Personal Isolation
- When needed: Strict separation of work and personal AI contexts
- Setup:
ccs auth create work+ccs auth create personal - Note: Advanced feature - most users don't need this
CCS isn't about "switching when you hit limits at 3pm."
| Manual Switching | CCS Orchestration |
|---|---|
| 🔴 Hit limits → Stop work → Edit config files → Restart | ✅ Multiple terminals running different models from the start |
| 😰 Context loss and flow state interruption | 😌 Continuous productivity with preserved context |
| 📝 Sequential task handling | ⚡ Parallel workflows (planning + execution simultaneously) |
| 🛠️ Reactive problem solving when blocked | 🎯 Proactive workflow design prevents blocks |
- Zero Context Switching: Keep your flow state without interruption
- Parallel Productivity: Strategic planning in one terminal, code execution in another
- Instant Account Management: One command switches, no config file editing
- Work-Life Separation: Isolate contexts without logging out
- Cross-Platform Consistency: Same smooth experience on macOS, Linux, Windows
Manual context switching breaks workflow. CCS orchestrates seamlessly.
Settings-based: GLM, GLMT, Kimi, default
- Uses
--settingsflag pointing to config files - GLMT: Embedded proxy for thinking mode support
Account-based: work, personal, team
- Uses
CLAUDE_CONFIG_DIRfor isolated instances - Create with
ccs auth create <profile>
CCS items (v4.1): Commands and skills symlinked from ~/.ccs/.claude/ to ~/.claude/ - single source of truth with auto-propagation.
Profile access: ~/.ccs/shared/ symlinks to ~/.claude/ - no duplication across profiles.
~/.ccs/
├── .claude/ # CCS items (ships with package, v4.1)
│ ├── commands/ccs/ # Delegation commands (/ccs:glm, /ccs:kimi)
│ ├── skills/ccs-delegation/ # AI decision framework
│ └── agents/ccs-delegator.md # Proactive delegation agent
├── shared/ # Symlinks to ~/.claude/ (for profiles)
│ ├── agents@ → ~/.claude/agents/
│ ├── commands@ → ~/.claude/commands/
│ └── skills@ → ~/.claude/skills/
├── instances/ # Profile-specific data
│ └── work/
│ ├── agents@ → shared/agents/
│ ├── commands@ → shared/commands/
│ ├── skills@ → shared/skills/
│ ├── settings.json # API keys, credentials
│ ├── sessions/ # Conversation history
│ └── ...
~/.claude/ # User's Claude directory
├── commands/ccs@ → ~/.ccs/.claude/commands/ccs/ # Selective symlink
├── skills/ccs-delegation@ → ~/.ccs/.claude/skills/ccs-delegation/
└── agents/ccs-delegator.md@ → ~/.ccs/.claude/agents/ccs-delegator.md
Symlink Chain: work profile → ~/.ccs/shared/ → ~/.claude/ → ~/.ccs/.claude/ (CCS items)
| Type | Files |
|---|---|
| CCS items | ~/.ccs/.claude/ (ships with package, selective symlinks to ~/.claude/) |
| Shared | ~/.ccs/shared/ (symlinks to ~/.claude/) |
| Profile-specific | settings.json, sessions/, todolists/, logs/ |
Note
Windows: Symlink support requires Developer Mode (v4.2 will add copy fallback)
ccs # Claude subscription (default)
ccs glm # GLM (cost-optimized)
ccs kimi # Kimi (with thinking support)# Create accounts
ccs auth create work
ccs auth create personalRun concurrently in separate terminals:
# Terminal 1 - Work
ccs work "implement feature"
# Terminal 2 - Personal (concurrent)
ccs personal "review code"ccs --version # Show version
ccs --help # Show all commands and optionsTip
New in v4.0: Delegate tasks to cost-optimized models (GLM, Kimi) directly from your main Claude session. Save 81% on simple tasks with real-time visibility.
CCS Delegation lets you send tasks to alternative models (glm, kimi) from your main Claude session using the -p flag or slash commands (/ccs:glm, /ccs:kimi).
Why use it?
- Token efficiency: Simple tasks cost 81% less on GLM vs main Claude session
- Context preservation: Main session stays clean, no pollution from mechanical tasks
- Real-time visibility: See tool usage as tasks execute (
[Tool] Write: index.html) - Multi-turn support: Resume sessions with
:continuefor iterative work
Direct CLI:
# Delegate simple task to GLM (cost-optimized)
ccs glm -p "add tests for UserService"
# Delegate long-context task to Kimi
ccs kimi -p "analyze all files in src/ and document architecture"
# Continue previous session
ccs glm:continue -p "run the tests and fix any failures"Via Slash Commands (inside Claude sessions):
# In your main Claude session:
/ccs:glm "refactor auth.js to use async/await"
/ccs:kimi "find all deprecated API usages across codebase"
/ccs:glm:continue "also update the README examples"Via Natural Language (Claude auto-delegates):
# Claude detects delegation patterns and auto-executes:
"Use ccs glm to add tests for all *.service.js files"
"Delegate to kimi: analyze project structure"See exactly what's happening as tasks execute:
$ ccs glm -p "/cook create a landing page"
[i] Delegating to GLM-4.6...
[Tool] Write: /home/user/project/index.html
[Tool] Write: /home/user/project/styles.css
[Tool] Write: /home/user/project/script.js
[Tool] Edit: /home/user/project/styles.css
[i] Execution completed in 45.2s
╔══════════════════════════════════════════════════════╗
║ Working Directory: /home/user/project ║
║ Model: GLM-4.6 ║
║ Duration: 45.2s ║
║ Exit Code: 0 ║
║ Session ID: 3a4f8c21 ║
║ Total Cost: $0.0015 ║
║ Turns: 3 ║
╚══════════════════════════════════════════════════════╝
Slash Command Support: Delegation preserves custom slash commands in prompts:
ccs glm -p "/cook create responsive landing page"
# Executes /cook command in delegated GLM sessionSignal Handling: Ctrl+C or Esc properly kills delegated processes (no orphans):
# Hit Ctrl+C during delegation
[!] Parent process terminating, killing delegated session...Time-Based Limits:
10-minute default timeout with graceful termination (supports :continue):
ccs glm -p "complex task" # Auto-terminates after 10min if needed
ccs glm:continue -p "pick up where we left off"Traditional (Main Session):
Context load: 2000 tokens
Discussion: 1500 tokens
Code gen: 4500 tokens
─────────────────────────
Total: 8000 tokens → $0.032
Delegation (GLM):
3x tasks via GLM: 1500 tokens → $0.0045
─────────────────────────────────────────
Savings: $0.0275 (86% reduction)
- Workflow Diagrams: See docs/ccs-delegation-diagrams.md for visual architecture
- Skill Reference:
.claude/skills/ccs-delegation/for AI decision framework - Agent Docs:
.claude/agents/ccs-delegator.mdfor orchestration patterns
Caution
GLMT is experimental and requires extensive debugging:
- Streaming and tool support still under active development
- May experience unexpected errors, timeouts, or incomplete responses
- Requires frequent debugging and manual intervention
- Not recommended for critical workflows or production use
Alternative for GLM Thinking: Consider going through the CCR hustle with the Transformer of Bedolla (ZaiTransformer) for a more stable implementation.
Important
GLMT requires npm installation (npm install -g @kaitranntt/ccs). Not available in native shell versions (requires Node.js HTTP server).
Note
CCS's GLMT implementation owes its existence to the groundbreaking work of @Bedolla, who created ZaiTransformer - the first integration to bridge Claude Code Router (CCR) with Z.AI's reasoning capabilities.
Why this matters: Before ZaiTransformer, no one had successfully integrated Z.AI's thinking mode with Claude Code's workflow. Bedolla's work wasn't just helpful - it was foundational. His implementation of:
- Request/response transformation architecture - The conceptual blueprint for how to bridge Anthropic and OpenAI formats
- Thinking mode control mechanisms - The patterns for managing reasoning_content delivery
- Embedded proxy design - The architecture that CCS's GLMT proxy is built upon
These contributions directly inspired and enabled GLMT's design. Without ZaiTransformer's pioneering work, GLMT wouldn't exist in its current form. The technical patterns, transformation logic, and proxy architecture implemented in CCS are a direct evolution of the concepts Bedolla first proved viable.
Recognition: If you benefit from GLMT's thinking capabilities, you're benefiting from Bedolla's vision and engineering. Please consider starring ZaiTransformer to support pioneering work in the Claude Code ecosystem.
| Feature | GLM (ccs glm) |
GLMT (ccs glmt) |
|---|---|---|
| Endpoint | Anthropic-compatible | OpenAI-compatible |
| Thinking | No | Experimental (reasoning_content) |
| Tool Support | Basic | Unstable (v3.5+) |
| MCP Tools | Limited | Buggy (v3.5+) |
| Streaming | Stable | Experimental (v3.4+) |
| TTFB | <500ms | <500ms (sometimes), 2-10s+ (often) |
| Use Case | Reliable work | Debugging experiments only |
GLMT attempts MCP tools and function calling:
- Bidirectional Transformation: Anthropic tools ↔ OpenAI format (unstable)
- MCP Integration: MCP tools sometimes execute (often output XML garbage)
- Streaming Tool Calls: Real-time tool calls (when not crashing)
- Backward Compatible: May break existing thinking support
- Configuration Required: Frequent manual debugging needed
GLMT attempts real-time streaming with incremental reasoning content delivery:
- Default: Streaming enabled (TTFB <500ms when it works)
- Auto-fallback: Frequently switches to buffered mode due to errors
- Thinking parameter: Claude CLI
thinkingparameter sometimes works- May ignore
thinking.typeandbudget_tokens - Precedence: CLI parameter > message tags > default (when not broken)
- May ignore
Status: Z.AI (tested, tool calls frequently break, requires constant debugging)
- CCS spawns embedded HTTP proxy on localhost (if not crashing)
- Proxy attempts to convert Anthropic format → OpenAI format (often fails)
- Tries to transform Anthropic tools → OpenAI function calling format (buggy)
- Forwards to Z.AI with reasoning parameters and tools (when not timing out)
- Attempts to convert
reasoning_content→ thinking blocks (partial or broken) - Attempts to convert OpenAI
tool_calls→ Anthropictool_useblocks (XML garbage common) - Thinking and tool calls sometimes appear in Claude Code UI (when not broken)
Control Tags:
<Thinking:On|Off>- Enable/disable reasoning blocks (default: On)<Effort:Low|Medium|High>- Control reasoning depth (deprecated - Z.AI only supports binary thinking)
Thinking Keywords (inconsistent activation):
think- Sometimes enables reasoning (low effort)think hard- Sometimes enables reasoning (medium effort)think harder- Sometimes enables reasoning (high effort)ultrathink- Attempts maximum reasoning depth (often breaks)
GLMT features (all experimental):
- Forced English output enforcement (sometimes works)
- Random thinking mode activation (unpredictable)
- Attempted streaming with frequent fallback to buffered mode
General:
CCS_DEBUG_LOG=1- Enable debug file loggingCCS_CLAUDE_PATH=/path/to/claude- Custom Claude CLI path
# Edit GLMT settings
nano ~/.ccs/glmt.settings.jsonSet Z.AI API key (requires coding plan):
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your-z-ai-api-key"
}
}v3.4 Protection Limits:
| Limit | Value | Purpose |
|---|---|---|
| SSE buffer | 1MB max per event | Prevent buffer overflow |
| Content buffer | 10MB max per block | Limit thinking/text blocks |
| Content blocks | 100 max per message | Prevent DoS attacks |
| Request timeout | 120s | Both streaming and buffered |
Enable verbose logging:
ccs glmt --verbose "your prompt"Enable debug file logging:
export CCS_DEBUG_LOG=1
ccs glmt --verbose "your prompt"
# Logs: ~/.ccs/logs/GLMT debugging:
# Verbose logging shows streaming status and reasoning details
ccs glmt --verbose "test"Check reasoning content:
cat ~/.ccs/logs/*response-openai.json | jq '.choices[0].message.reasoning_content'Troubleshooting:
- If absent: Z.AI API issue (verify key, account status)
- If present: Transformation issue (check
response-anthropic.json)
Run diagnostics to verify your CCS installation:
ccs doctorChecks performed:
- ✓ Claude CLI availability
- ✓ Configuration files (config.json, profiles)
- ✓ CCS symlinks to ~/.claude/
- ✓ Delegation system
- ✓ File permissions
Output:
[?] Checking Claude CLI... [OK]
[?] Checking ~/.ccs/ directory... [OK]
[?] Checking config.json... [OK]
[?] Checking CCS symlinks... [OK]
...
Status: Installation healthy
If you modify CCS items or need to re-install symlinks:
ccs syncWhat it does:
- Re-creates selective symlinks from
~/.ccs/.claude/to~/.claude/ - Backs up existing files before replacing
- Safe to run multiple times (idempotent)
When to use:
- After manual modifications to ~/.claude/
- If
ccs doctorreports symlink issues - After upgrading CCS to a new version
# npm
npm uninstall -g @kaitranntt/ccs
# yarn
yarn global remove @kaitranntt/ccs
# pnpm
pnpm remove -g @kaitranntt/ccs
# bun
bun remove -g @kaitranntt/ccs# macOS / Linux
curl -fsSL ccs.kaitran.ca/uninstall | bash
# Windows PowerShell
irm ccs.kaitran.ca/uninstall | iex- YAGNI: No features "just in case"
- KISS: Simple bash, no complexity
- DRY: One source of truth (config)
Complete documentation in docs/:
- Installation Guide
- Configuration
- Usage Examples
- System Architecture
- GLMT Control Mechanisms
- Troubleshooting
- Contributing
We welcome contributions! Please see our Contributing Guide for details.
CCS is licensed under the MIT License.
Made with ❤️ for developers who hit rate limits too often
