Intelligent Model Routing Guide
Overview
CCProxy includes an intelligent routing system that automatically selects the appropriate AI provider and model based on request characteristics. The router is designed to work seamlessly with Claude Code, which always sends Anthropic model names.
How Routing Works
The router evaluates requests in a strict priority order:
- Explicit Provider Selection: When you specify
"provider,model"format - Direct Model Routes: Exact matches for Anthropic model names in routes
- Long Context Routing: Token count > 60,000 triggers
longContextroute - Background Routing: Models starting with
"claude-3-5-haiku"usebackgroundroute - Thinking Routing: Boolean
thinking: trueparameter triggersthinkroute - Default Route: Fallback for all unmatched requests
Routing Configuration
Basic Configuration Structure
{
"routes": {
"default": {
"provider": "anthropic",
"model": "claude-3-sonnet-20240229"
},
"longContext": {
"provider": "anthropic",
"model": "claude-3-opus-20240229"
},
"background": {
"provider": "openai",
"model": "gpt-3.5-turbo"
},
"think": {
"provider": "deepseek",
"model": "deepseek-reasoner"
}
}
}Routing Priority Examples
1. Explicit Provider Selection
Override routing by specifying provider and model:
{
"model": "anthropic,claude-3-opus-20240229"
}This bypasses all routing logic and uses the specified provider/model.
2. Direct Model Routes
Map specific Anthropic models to any provider:
{
"routes": {
"claude-3-opus-20240229": {
"provider": "openai",
"model": "gpt-4-turbo"
}
}
}3. Automatic Long Context Routing
Requests exceeding 60,000 tokens automatically use longContext route:
{
"messages": [...], // If total tokens > 60K
"model": "claude-3-sonnet-20240229" // Will use longContext route
}4. Background Task Routing
Models starting with "claude-3-5-haiku" automatically use background route:
{
"model": "claude-3-5-haiku-20241022" // Uses background route
}5. Thinking Parameter Routing
Requests with thinking: true use the think route:
{
"model": "claude-3-sonnet-20240229",
"thinking": true // Boolean parameter
}6. Default Route
All other requests use the default route.
Recommended Routes by Use Case
General Purpose (Default)
Best for everyday tasks, balanced between performance and cost.
Recommended Models:
- Anthropic: claude-3-5-sonnet-20241022
- OpenAI: gpt-4-turbo-2024-04-09
- Google: gemini-1.5-flash
- DeepSeek: deepseek-chat (Note: 8192 token limit)
Configuration Example:
{
"providers": {
"anthropic": {
"models": {
"claude-3-5-sonnet-20241022": {
"route": "default"
}
}
}
},
"router": {
"routes": {
"default": "anthropic,claude-3-5-sonnet-20241022"
}
}
}Long Context (>60K tokens)
For processing large documents, extensive conversations, or code analysis.
Recommended Models:
- Anthropic: claude-3-opus-20240229 (200K context)
- OpenAI: gpt-4-32k (32K context)
- Google: gemini-1.5-pro (1M context)
Configuration Example:
{
"router": {
"routes": {
"longContext": "google,gemini-1.5-pro"
},
"rules": {
"tokenThreshold": 60000
}
}
}Fast/Background Tasks
For quick responses, simple tasks, or high-volume processing.
Recommended Models:
- Anthropic: claude-3-haiku-20240307
- OpenAI: gpt-3.5-turbo
- Google: gemini-1.5-flash
- DeepSeek: deepseek-chat (Note: 8192 token limit)
Configuration Example:
{
"router": {
"routes": {
"fast": "anthropic,claude-3-haiku-20240307",
"background": "openai,gpt-3.5-turbo"
},
"modelPatterns": {
"haiku": "fast",
"turbo": "fast"
}
}
}Complex Reasoning (Think Mode)
For tasks requiring deep analysis, multi-step reasoning, or complex problem solving. The router checks for a boolean thinking field in the request.
Recommended Models:
- Anthropic: claude-3-opus-20240229
- OpenAI: o1-preview
- Google: gemini-1.5-pro
- DeepSeek: deepseek-coder
Configuration Example:
{
"router": {
"routes": {
"think": "anthropic,claude-3-opus-20240229"
}
}Note: When using Claude Code, only providers that support function calling will work properly. This includes Anthropic, OpenAI, and Google Gemini. DeepSeek and some other providers may have limited or no function calling support.
Advanced Configuration Examples
Multi-Provider Fallback
{
"router": {
"routes": {
"default": "anthropic,claude-3-5-sonnet-20241022"
},
"fallbacks": {
"anthropic": ["openai,gpt-4-turbo", "google,gemini-1.5-flash"],
"openai": ["anthropic,claude-3-5-sonnet-20241022"],
"google": ["anthropic,claude-3-5-sonnet-20241022"]
}
}
}Cost-Optimized Routing
{
"router": {
"routes": {
"default": "google,gemini-1.5-flash",
"fast": "anthropic,claude-3-haiku-20240307"
}
}
}Direct Model Routes
{
"router": {
"routes": {
"gpt-4-turbo": "openai,gpt-4-turbo-2024-04-09",
"claude-3-5-sonnet-20241022": "anthropic,claude-3-5-sonnet-20241022",
"gemini-1.5-pro": "google,gemini-1.5-pro"
}
}
}Multiple Provider Configuration
{
"router": {
"routes": {
"default": "anthropic,claude-3-5-sonnet-20241022",
"longContext": "google,gemini-1.5-pro",
"fast": "openai,gpt-3.5-turbo",
"think": "anthropic,claude-3-opus-20240229"
}
}
}Best Practices
1. Token Count Awareness
- Monitor your typical token usage
- Set appropriate thresholds for long-context routing
- Consider chunking large requests
2. Cost Management
- Use fast models for simple tasks
- Reserve premium models for complex reasoning
- Implement usage quotas per model
3. Latency Optimization
- Route time-sensitive requests to fast models
- Use geographic routing for global deployments
- Implement timeout-based fallbacks
4. Error Handling
- Configure fallback routes for each provider
- Set up circuit breakers for failing models
- Monitor provider availability
5. Testing Routes
# Test specific routing
curl -X POST http://localhost:3456/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-key" \
-d '{
"model": "anthropic,claude-3-5-sonnet-20241022",
"messages": [{"role": "user", "content": "Test"}]
}'
# Test automatic routing (long context)
curl -X POST http://localhost:3456/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-key" \
-d '{
"messages": [{"role": "user", "content": "Very long content..."}]
}'Monitoring and Debugging
View Current Routes
curl http://localhost:3456/health \
-H "x-api-key: your-key" | jq .routerRoute Decision Logging
Enable debug logging to see routing decisions:
{
"logging": {
"level": "debug",
"routingDecisions": true
}
}Metrics Collection
Monitor routing patterns:
- Route hit rates
- Token distribution
- Provider utilization
- Fallback frequency
Common Routing Scenarios
Scenario 1: Development vs Production
{
"environments": {
"development": {
"router": {
"routes": {
"default": "openai,gpt-3.5-turbo"
}
}
},
"production": {
"router": {
"routes": {
"default": "anthropic,claude-3-5-sonnet-20241022"
}
}
}
}
}Scenario 2: Model Pattern Routing
{
"router": {
"routes": {
"default": "anthropic,claude-3-5-sonnet-20241022",
"fast": "anthropic,claude-3-haiku-20240307"
},
"modelPatterns": {
"haiku": "fast"
}
}
}Troubleshooting
Route Not Working
- Check provider configuration
- Verify API keys are set
- Ensure model names are correct
- Review routing priority
Unexpected Model Selection
- Enable debug logging
- Check token count
- Verify routing rules
- Test with explicit routing
Performance Issues
- Monitor route latency
- Check provider availability
- Review fallback configuration
- Optimize route selection
Limitations
- Function Calling: When using Claude Code, only providers with function calling support will work properly (Anthropic, OpenAI, Google Gemini)
- DeepSeek: Limited to 8192 tokens per request
- Routing Rules: Currently supports token-based, model pattern, and thinking parameter routing
- Configuration: Routes must be explicitly configured in the config file