Performance Tuning
Optimize CCProxy for maximum performance and efficiency. This guide covers configuration tuning, monitoring, and best practices.
Performance Overview
CCProxy is designed for high performance with:
- Efficient request routing
- Connection pooling
- Concurrent request handling
- Minimal overhead processing
- Optimized memory usage
Configuration Tuning
Server Settings
Optimize server configuration for your workload:
yaml
# config.yaml
server:
port: 8080
host: "0.0.0.0"
read_timeout: "30s" # Adjust based on provider response times
write_timeout: "30s" # Should be >= read_timeout
idle_timeout: "120s" # Keep connections alive
max_request_size: 10485760 # 10MB - adjust for your needs
performance:
worker_pool_size: 100 # Number of concurrent workers
request_timeout: "300s" # Maximum request duration
connection_pool:
max_idle_conns: 100
max_idle_conns_per_host: 10
idle_conn_timeout: "90s"Provider Configuration
Optimize provider settings:
yaml
providers:
- name: primary
type: anthropic
timeout: "60s" # Provider-specific timeout
max_retries: 3 # Retry on failure
retry_delay: "1s" # Exponential backoff
# Connection pooling per provider
max_connections: 50
connections_per_host: 10Resource Management
Memory Optimization
Monitor and control memory usage:
go
// Set GOGC for garbage collection tuning
export GOGC=100 # Default, adjust based on memory/CPU trade-off
// For low-memory environments
export GOGC=50 # More frequent GC, less memory
// For high-performance environments
export GOGC=200 # Less frequent GC, more memoryCPU Optimization
bash
# Set GOMAXPROCS for CPU cores
export GOMAXPROCS=8 # Use 8 CPU cores
# Or let Go detect automatically (recommended)
# GOMAXPROCS defaults to runtime.NumCPU()Connection Pooling
HTTP Client Configuration
go
// Optimal HTTP client settings
client := &http.Client{
Timeout: 60 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
MaxConnsPerHost: 25,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
DisableKeepAlives: false, // Keep connections alive
DisableCompression: false, // Enable compression
},
}Database Connections (if applicable)
yaml
database:
max_open_conns: 25
max_idle_conns: 5
conn_max_lifetime: "5m"
conn_max_idle_time: "90s"Monitoring Performance
Built-in Metrics
CCProxy exposes metrics at /metrics:
bash
# Get current metrics
curl http://localhost:8080/metrics
# Response includes:
{
"requests": {
"total": 10000,
"active": 5,
"success": 9950,
"failed": 50
},
"latency": {
"p50": 45.5,
"p90": 120.3,
"p95": 150.7,
"p99": 200.1
},
"providers": {
"anthropic": {
"requests": 8000,
"errors": 10,
"avg_latency": 50.2
}
}
}Performance Monitoring
Use the monitoring endpoints:
bash
# Health check (lightweight)
curl http://localhost:8080/health
# Detailed status
curl http://localhost:8080/status
# Prometheus metrics (if enabled)
curl http://localhost:8080/metricsBenchmarking
Running Benchmarks
bash
# Run all benchmarks
make test-benchmark
# Run specific benchmark
go test -bench=BenchmarkTokenCounting -benchmem ./tests/benchmark/
# Run with CPU profiling
go test -bench=. -cpuprofile=cpu.prof ./tests/benchmark/
go tool pprof cpu.profBenchmark Results
Example benchmark output:
BenchmarkTokenCounting/Small-8 50000 23456 ns/op 2048 B/op 10 allocs/op
BenchmarkTokenCounting/Large-8 500 2345678 ns/op 204800 B/op 100 allocs/op
BenchmarkRouting-8 1000000 1045 ns/op 256 B/op 4 allocs/op
BenchmarkTransform-8 100000 15234 ns/op 4096 B/op 25 allocs/opLoad Testing
Run load tests to find limits:
bash
# Basic load test
./scripts/test.sh load
# Custom load test
cat > load-test.js << 'EOF'
import http from 'k6/http';
import { check } from 'k6/check';
export let options = {
stages: [
{ duration: '30s', target: 100 }, // Ramp up
{ duration: '1m', target: 100 }, // Stay at 100
{ duration: '30s', target: 0 }, // Ramp down
],
};
export default function() {
let response = http.post('http://localhost:8080/v1/messages',
JSON.stringify({
model: "claude-3-sonnet",
messages: [{role: "user", content: "Hello"}],
max_tokens: 100
}),
{ headers: { 'Content-Type': 'application/json' } }
);
check(response, {
'status is 200': (r) => r.status === 200,
'latency < 500ms': (r) => r.timings.duration < 500,
});
}
EOF
k6 run load-test.jsOptimization Strategies
1. Circuit Breaker
Prevent cascading failures:
yaml
performance:
circuit_breaker:
enabled: true
failure_threshold: 5 # Failures before opening
recovery_timeout: "60s" # Time before half-open
half_open_requests: 3 # Requests in half-open state2. Request Caching
Cache frequently used responses:
yaml
performance:
caching:
enabled: true
ttl: "3600s" # 1 hour cache
max_size: 1000 # Maximum cached items
cache_key_fields: # Fields for cache key
- model
- messages
- temperature3. Rate Limiting
Protect against overload:
yaml
security:
rate_limiting:
enabled: true
global:
requests_per_minute: 6000
burst: 100
per_ip:
requests_per_minute: 60
burst: 104. Compression
Enable response compression:
yaml
server:
compression:
enabled: true
level: 6 # 1-9, higher = more compression
min_size: 1024 # Minimum size to compress
types:
- application/json
- text/plainPerformance Best Practices
1. Provider Selection
Choose providers based on performance:
yaml
routing:
# Use fastest provider for time-sensitive requests
performance_routing:
enabled: true
latency_threshold: "100ms"
# Route by model performance
model_performance:
"claude-3-haiku": "fast-provider"
"claude-3-opus": "powerful-provider"2. Timeout Configuration
Set appropriate timeouts:
yaml
# Provider-specific timeouts
providers:
- name: fast-provider
timeout: "30s" # Quick responses
- name: slow-provider
timeout: "120s" # Complex requests3. Concurrent Processing
Optimize concurrency:
go
// Use worker pools for parallel processing
workerPool := make(chan struct{}, 100) // Limit concurrent workers
func processRequest(req Request) {
workerPool <- struct{}{} // Acquire worker
defer func() { <-workerPool }() // Release worker
// Process request
}4. Memory Management
Reduce allocations:
go
// Reuse buffers
var bufferPool = sync.Pool{
New: func() interface{} {
return make([]byte, 4096)
},
}
func processData(data []byte) {
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)
// Use buffer
}Scaling Strategies
Horizontal Scaling
Run multiple instances:
yaml
# docker-compose.yml
version: '3.8'
services:
ccproxy:
image: ccproxy:latest
deploy:
replicas: 3
resources:
limits:
cpus: '2'
memory: 2G
ports:
- "8080-8082:8080"Load Balancing
Use a load balancer:
nginx
upstream ccproxy {
least_conn;
server ccproxy-1:8080 weight=3;
server ccproxy-2:8080 weight=2;
server ccproxy-3:8080 weight=1;
}
server {
location / {
proxy_pass http://ccproxy;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}Monitoring and Alerting
Prometheus Integration
yaml
monitoring:
metrics:
enabled: true
port: 9090
path: /metrics
# Key metrics to monitor
alerts:
- name: high_latency
threshold: 500ms
- name: error_rate
threshold: 5%
- name: memory_usage
threshold: 80%Grafana Dashboard
Import the CCProxy dashboard for visualization:
- Request rates
- Latency percentiles
- Error rates
- Provider health
- Resource usage
Troubleshooting Performance
High Latency
- Check provider response times
- Verify network connectivity
- Review timeout settings
- Check CPU/memory usage
Memory Issues
bash
# Profile memory usage
go tool pprof http://localhost:8080/debug/pprof/heap
# Check goroutine leaks
go tool pprof http://localhost:8080/debug/pprof/goroutineCPU Bottlenecks
bash
# Profile CPU usage
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30
# Check hot paths
(pprof) top10
(pprof) list functionNamePerformance Checklist
- [ ] Configure appropriate timeouts
- [ ] Enable connection pooling
- [ ] Set up monitoring
- [ ] Configure circuit breakers
- [ ] Enable compression
- [ ] Optimize worker pool size
- [ ] Configure rate limiting
- [ ] Set up caching (if appropriate)
- [ ] Monitor resource usage
- [ ] Run load tests
Next Steps
- Monitoring Guide - Set up comprehensive monitoring
- Security Best Practices - Secure your deployment
- Production Deployment - Deploy to production