Skip to content

Performance Tuning

Optimize CCProxy for maximum performance and efficiency. This guide covers configuration tuning, monitoring, and best practices.

Performance Overview

CCProxy is designed for high performance with:

  • Efficient request routing
  • Connection pooling
  • Concurrent request handling
  • Minimal overhead processing
  • Optimized memory usage

Configuration Tuning

Server Settings

Optimize server configuration for your workload:

yaml
# config.yaml
server:
  port: 8080
  host: "0.0.0.0"
  read_timeout: "30s"      # Adjust based on provider response times
  write_timeout: "30s"     # Should be >= read_timeout
  idle_timeout: "120s"     # Keep connections alive
  max_request_size: 10485760  # 10MB - adjust for your needs

performance:
  worker_pool_size: 100    # Number of concurrent workers
  request_timeout: "300s"  # Maximum request duration
  
  connection_pool:
    max_idle_conns: 100
    max_idle_conns_per_host: 10
    idle_conn_timeout: "90s"

Provider Configuration

Optimize provider settings:

yaml
providers:
  - name: primary
    type: anthropic
    timeout: "60s"         # Provider-specific timeout
    max_retries: 3         # Retry on failure
    retry_delay: "1s"      # Exponential backoff
    
    # Connection pooling per provider
    max_connections: 50
    connections_per_host: 10

Resource Management

Memory Optimization

Monitor and control memory usage:

go
// Set GOGC for garbage collection tuning
export GOGC=100  # Default, adjust based on memory/CPU trade-off

// For low-memory environments
export GOGC=50   # More frequent GC, less memory

// For high-performance environments
export GOGC=200  # Less frequent GC, more memory

CPU Optimization

bash
# Set GOMAXPROCS for CPU cores
export GOMAXPROCS=8  # Use 8 CPU cores

# Or let Go detect automatically (recommended)
# GOMAXPROCS defaults to runtime.NumCPU()

Connection Pooling

HTTP Client Configuration

go
// Optimal HTTP client settings
client := &http.Client{
    Timeout: 60 * time.Second,
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 10,
        MaxConnsPerHost:     25,
        IdleConnTimeout:     90 * time.Second,
        TLSHandshakeTimeout: 10 * time.Second,
        DisableKeepAlives:   false, // Keep connections alive
        DisableCompression:  false, // Enable compression
    },
}

Database Connections (if applicable)

yaml
database:
  max_open_conns: 25
  max_idle_conns: 5
  conn_max_lifetime: "5m"
  conn_max_idle_time: "90s"

Monitoring Performance

Built-in Metrics

CCProxy exposes metrics at /metrics:

bash
# Get current metrics
curl http://localhost:8080/metrics

# Response includes:
{
  "requests": {
    "total": 10000,
    "active": 5,
    "success": 9950,
    "failed": 50
  },
  "latency": {
    "p50": 45.5,
    "p90": 120.3,
    "p95": 150.7,
    "p99": 200.1
  },
  "providers": {
    "anthropic": {
      "requests": 8000,
      "errors": 10,
      "avg_latency": 50.2
    }
  }
}

Performance Monitoring

Use the monitoring endpoints:

bash
# Health check (lightweight)
curl http://localhost:8080/health

# Detailed status
curl http://localhost:8080/status

# Prometheus metrics (if enabled)
curl http://localhost:8080/metrics

Benchmarking

Running Benchmarks

bash
# Run all benchmarks
make test-benchmark

# Run specific benchmark
go test -bench=BenchmarkTokenCounting -benchmem ./tests/benchmark/

# Run with CPU profiling
go test -bench=. -cpuprofile=cpu.prof ./tests/benchmark/
go tool pprof cpu.prof

Benchmark Results

Example benchmark output:

BenchmarkTokenCounting/Small-8     50000    23456 ns/op    2048 B/op    10 allocs/op
BenchmarkTokenCounting/Large-8       500  2345678 ns/op  204800 B/op   100 allocs/op
BenchmarkRouting-8               1000000     1045 ns/op     256 B/op     4 allocs/op
BenchmarkTransform-8              100000    15234 ns/op    4096 B/op    25 allocs/op

Load Testing

Run load tests to find limits:

bash
# Basic load test
./scripts/test.sh load

# Custom load test
cat > load-test.js << 'EOF'
import http from 'k6/http';
import { check } from 'k6/check';

export let options = {
  stages: [
    { duration: '30s', target: 100 },  // Ramp up
    { duration: '1m', target: 100 },   // Stay at 100
    { duration: '30s', target: 0 },    // Ramp down
  ],
};

export default function() {
  let response = http.post('http://localhost:8080/v1/messages', 
    JSON.stringify({
      model: "claude-3-sonnet",
      messages: [{role: "user", content: "Hello"}],
      max_tokens: 100
    }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  
  check(response, {
    'status is 200': (r) => r.status === 200,
    'latency < 500ms': (r) => r.timings.duration < 500,
  });
}
EOF

k6 run load-test.js

Optimization Strategies

1. Circuit Breaker

Prevent cascading failures:

yaml
performance:
  circuit_breaker:
    enabled: true
    failure_threshold: 5      # Failures before opening
    recovery_timeout: "60s"   # Time before half-open
    half_open_requests: 3     # Requests in half-open state

2. Request Caching

Cache frequently used responses:

yaml
performance:
  caching:
    enabled: true
    ttl: "3600s"           # 1 hour cache
    max_size: 1000         # Maximum cached items
    cache_key_fields:      # Fields for cache key
      - model
      - messages
      - temperature

3. Rate Limiting

Protect against overload:

yaml
security:
  rate_limiting:
    enabled: true
    global:
      requests_per_minute: 6000
      burst: 100
    per_ip:
      requests_per_minute: 60
      burst: 10

4. Compression

Enable response compression:

yaml
server:
  compression:
    enabled: true
    level: 6              # 1-9, higher = more compression
    min_size: 1024        # Minimum size to compress
    types:
      - application/json
      - text/plain

Performance Best Practices

1. Provider Selection

Choose providers based on performance:

yaml
routing:
  # Use fastest provider for time-sensitive requests
  performance_routing:
    enabled: true
    latency_threshold: "100ms"
    
  # Route by model performance
  model_performance:
    "claude-3-haiku": "fast-provider"
    "claude-3-opus": "powerful-provider"

2. Timeout Configuration

Set appropriate timeouts:

yaml
# Provider-specific timeouts
providers:
  - name: fast-provider
    timeout: "30s"      # Quick responses
    
  - name: slow-provider
    timeout: "120s"     # Complex requests

3. Concurrent Processing

Optimize concurrency:

go
// Use worker pools for parallel processing
workerPool := make(chan struct{}, 100) // Limit concurrent workers

func processRequest(req Request) {
    workerPool <- struct{}{}        // Acquire worker
    defer func() { <-workerPool }() // Release worker
    
    // Process request
}

4. Memory Management

Reduce allocations:

go
// Reuse buffers
var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 4096)
    },
}

func processData(data []byte) {
    buf := bufferPool.Get().([]byte)
    defer bufferPool.Put(buf)
    
    // Use buffer
}

Scaling Strategies

Horizontal Scaling

Run multiple instances:

yaml
# docker-compose.yml
version: '3.8'

services:
  ccproxy:
    image: ccproxy:latest
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 2G
    ports:
      - "8080-8082:8080"

Load Balancing

Use a load balancer:

nginx
upstream ccproxy {
    least_conn;
    server ccproxy-1:8080 weight=3;
    server ccproxy-2:8080 weight=2;
    server ccproxy-3:8080 weight=1;
}

server {
    location / {
        proxy_pass http://ccproxy;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Monitoring and Alerting

Prometheus Integration

yaml
monitoring:
  metrics:
    enabled: true
    port: 9090
    path: /metrics
    
  # Key metrics to monitor
  alerts:
    - name: high_latency
      threshold: 500ms
      
    - name: error_rate
      threshold: 5%
      
    - name: memory_usage
      threshold: 80%

Grafana Dashboard

Import the CCProxy dashboard for visualization:

  • Request rates
  • Latency percentiles
  • Error rates
  • Provider health
  • Resource usage

Troubleshooting Performance

High Latency

  1. Check provider response times
  2. Verify network connectivity
  3. Review timeout settings
  4. Check CPU/memory usage

Memory Issues

bash
# Profile memory usage
go tool pprof http://localhost:8080/debug/pprof/heap

# Check goroutine leaks
go tool pprof http://localhost:8080/debug/pprof/goroutine

CPU Bottlenecks

bash
# Profile CPU usage
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30

# Check hot paths
(pprof) top10
(pprof) list functionName

Performance Checklist

  • [ ] Configure appropriate timeouts
  • [ ] Enable connection pooling
  • [ ] Set up monitoring
  • [ ] Configure circuit breakers
  • [ ] Enable compression
  • [ ] Optimize worker pool size
  • [ ] Configure rate limiting
  • [ ] Set up caching (if appropriate)
  • [ ] Monitor resource usage
  • [ ] Run load tests

Next Steps