Performance Tuning

Optimize CCProxy for maximum performance and efficiency. This guide covers configuration tuning, monitoring, and best practices.

Performance Overview

CCProxy is designed for high performance with:

Efficient request routing
Connection pooling
Concurrent request handling
Minimal overhead processing
Optimized memory usage

Configuration Tuning

Server Settings

Optimize server configuration for your workload:

yaml

# config.yaml
server:
  port: 8080
  host: "0.0.0.0"
  read_timeout: "30s"      # Adjust based on provider response times
  write_timeout: "30s"     # Should be >= read_timeout
  idle_timeout: "120s"     # Keep connections alive
  max_request_size: 10485760  # 10MB - adjust for your needs

performance:
  worker_pool_size: 100    # Number of concurrent workers
  request_timeout: "300s"  # Maximum request duration
  
  connection_pool:
    max_idle_conns: 100
    max_idle_conns_per_host: 10
    idle_conn_timeout: "90s"

Provider Configuration

Optimize provider settings:

yaml

providers:
  - name: primary
    type: anthropic
    timeout: "60s"         # Provider-specific timeout
    max_retries: 3         # Retry on failure
    retry_delay: "1s"      # Exponential backoff
    
    # Connection pooling per provider
    max_connections: 50
    connections_per_host: 10

Resource Management

Memory Optimization

Monitor and control memory usage:

// Set GOGC for garbage collection tuning
export GOGC=100  # Default, adjust based on memory/CPU trade-off

// For low-memory environments
export GOGC=50   # More frequent GC, less memory

// For high-performance environments
export GOGC=200  # Less frequent GC, more memory

CPU Optimization

bash

# Set GOMAXPROCS for CPU cores
export GOMAXPROCS=8  # Use 8 CPU cores

# Or let Go detect automatically (recommended)
# GOMAXPROCS defaults to runtime.NumCPU()

Connection Pooling

HTTP Client Configuration

// Optimal HTTP client settings
client := &http.Client{
    Timeout: 60 * time.Second,
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 10,
        MaxConnsPerHost:     25,
        IdleConnTimeout:     90 * time.Second,
        TLSHandshakeTimeout: 10 * time.Second,
        DisableKeepAlives:   false, // Keep connections alive
        DisableCompression:  false, // Enable compression
    },
}

Database Connections (if applicable)

yaml

database:
  max_open_conns: 25
  max_idle_conns: 5
  conn_max_lifetime: "5m"
  conn_max_idle_time: "90s"

Monitoring Performance

Built-in Metrics

CCProxy exposes metrics at /metrics:

bash

# Get current metrics
curl http://localhost:8080/metrics

# Response includes:
{
  "requests": {
    "total": 10000,
    "active": 5,
    "success": 9950,
    "failed": 50
  },
  "latency": {
    "p50": 45.5,
    "p90": 120.3,
    "p95": 150.7,
    "p99": 200.1
  },
  "providers": {
    "anthropic": {
      "requests": 8000,
      "errors": 10,
      "avg_latency": 50.2
    }
  }
}

Performance Monitoring

Use the monitoring endpoints:

bash

# Health check (lightweight)
curl http://localhost:8080/health

# Detailed status
curl http://localhost:8080/status

# Prometheus metrics (if enabled)
curl http://localhost:8080/metrics

Benchmarking

Running Benchmarks

bash

# Run all benchmarks
make test-benchmark

# Run specific benchmark
go test -bench=BenchmarkTokenCounting -benchmem ./tests/benchmark/

# Run with CPU profiling
go test -bench=. -cpuprofile=cpu.prof ./tests/benchmark/
go tool pprof cpu.prof

Benchmark Results

Example benchmark output:

BenchmarkTokenCounting/Small-8     50000    23456 ns/op    2048 B/op    10 allocs/op
BenchmarkTokenCounting/Large-8       500  2345678 ns/op  204800 B/op   100 allocs/op
BenchmarkRouting-8               1000000     1045 ns/op     256 B/op     4 allocs/op
BenchmarkTransform-8              100000    15234 ns/op    4096 B/op    25 allocs/op

Load Testing

Run load tests to find limits:

bash

# Basic load test
./scripts/test.sh load

# Custom load test
cat > load-test.js << 'EOF'
import http from 'k6/http';
import { check } from 'k6/check';

export let options = {
  stages: [
    { duration: '30s', target: 100 },  // Ramp up
    { duration: '1m', target: 100 },   // Stay at 100
    { duration: '30s', target: 0 },    // Ramp down
  ],
};

export default function() {
  let response = http.post('http://localhost:8080/v1/messages', 
    JSON.stringify({
      model: "claude-3-sonnet",
      messages: [{role: "user", content: "Hello"}],
      max_tokens: 100
    }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  
  check(response, {
    'status is 200': (r) => r.status === 200,
    'latency < 500ms': (r) => r.timings.duration < 500,
  });
}
EOF

k6 run load-test.js

Optimization Strategies

1. Circuit Breaker

Prevent cascading failures:

yaml

performance:
  circuit_breaker:
    enabled: true
    failure_threshold: 5      # Failures before opening
    recovery_timeout: "60s"   # Time before half-open
    half_open_requests: 3     # Requests in half-open state

2. Request Caching

Cache frequently used responses:

yaml

performance:
  caching:
    enabled: true
    ttl: "3600s"           # 1 hour cache
    max_size: 1000         # Maximum cached items
    cache_key_fields:      # Fields for cache key
      - model
      - messages
      - temperature

3. Rate Limiting

Protect against overload:

yaml

security:
  rate_limiting:
    enabled: true
    global:
      requests_per_minute: 6000
      burst: 100
    per_ip:
      requests_per_minute: 60
      burst: 10

4. Compression

Enable response compression:

yaml

server:
  compression:
    enabled: true
    level: 6              # 1-9, higher = more compression
    min_size: 1024        # Minimum size to compress
    types:
      - application/json
      - text/plain

Performance Best Practices

1. Provider Selection

Choose providers based on performance:

yaml

routing:
  # Use fastest provider for time-sensitive requests
  performance_routing:
    enabled: true
    latency_threshold: "100ms"
    
  # Route by model performance
  model_performance:
    "claude-3-haiku": "fast-provider"
    "claude-3-opus": "powerful-provider"

2. Timeout Configuration

Set appropriate timeouts:

yaml

# Provider-specific timeouts
providers:
  - name: fast-provider
    timeout: "30s"      # Quick responses
    
  - name: slow-provider
    timeout: "120s"     # Complex requests

3. Concurrent Processing

Optimize concurrency:

// Use worker pools for parallel processing
workerPool := make(chan struct{}, 100) // Limit concurrent workers

func processRequest(req Request) {
    workerPool <- struct{}{}        // Acquire worker
    defer func() { <-workerPool }() // Release worker
    
    // Process request
}

4. Memory Management

Reduce allocations:

// Reuse buffers
var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 4096)
    },
}

func processData(data []byte) {
    buf := bufferPool.Get().([]byte)
    defer bufferPool.Put(buf)
    
    // Use buffer
}

Scaling Strategies

Horizontal Scaling

Run multiple instances:

yaml

# docker-compose.yml
version: '3.8'

services:
  ccproxy:
    image: ccproxy:latest
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 2G
    ports:
      - "8080-8082:8080"

Load Balancing

Use a load balancer:

nginx

upstream ccproxy {
    least_conn;
    server ccproxy-1:8080 weight=3;
    server ccproxy-2:8080 weight=2;
    server ccproxy-3:8080 weight=1;
}

server {
    location / {
        proxy_pass http://ccproxy;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Monitoring and Alerting

Prometheus Integration

yaml

monitoring:
  metrics:
    enabled: true
    port: 9090
    path: /metrics
    
  # Key metrics to monitor
  alerts:
    - name: high_latency
      threshold: 500ms
      
    - name: error_rate
      threshold: 5%
      
    - name: memory_usage
      threshold: 80%

Grafana Dashboard

Import the CCProxy dashboard for visualization:

Request rates
Latency percentiles
Error rates
Provider health
Resource usage

Troubleshooting Performance

High Latency

Check provider response times
Verify network connectivity
Review timeout settings
Check CPU/memory usage

Memory Issues

bash

# Profile memory usage
go tool pprof http://localhost:8080/debug/pprof/heap

# Check goroutine leaks
go tool pprof http://localhost:8080/debug/pprof/goroutine

CPU Bottlenecks

bash

# Profile CPU usage
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30

# Check hot paths
(pprof) top10
(pprof) list functionName

Performance Checklist

[ ] Configure appropriate timeouts
[ ] Enable connection pooling
[ ] Set up monitoring
[ ] Configure circuit breakers
[ ] Enable compression
[ ] Optimize worker pool size
[ ] Configure rate limiting
[ ] Set up caching (if appropriate)
[ ] Monitor resource usage
[ ] Run load tests

Next Steps

Monitoring Guide - Set up comprehensive monitoring
Security Best Practices - Secure your deployment
Production Deployment - Deploy to production

Performance Tuning ​

Performance Overview ​

Configuration Tuning ​

Server Settings ​

Provider Configuration ​

Resource Management ​

Memory Optimization ​

CPU Optimization ​

Connection Pooling ​

HTTP Client Configuration ​

Database Connections (if applicable) ​

Monitoring Performance ​

Built-in Metrics ​

Performance Monitoring ​

Benchmarking ​

Running Benchmarks ​

Benchmark Results ​

Load Testing ​

Optimization Strategies ​

1. Circuit Breaker ​

2. Request Caching ​

3. Rate Limiting ​

4. Compression ​

Performance Best Practices ​

1. Provider Selection ​

2. Timeout Configuration ​

3. Concurrent Processing ​

4. Memory Management ​

Scaling Strategies ​

Horizontal Scaling ​

Load Balancing ​

Monitoring and Alerting ​

Prometheus Integration ​

Grafana Dashboard ​

Troubleshooting Performance ​

High Latency ​

Memory Issues ​

CPU Bottlenecks ​

Performance Checklist ​

Next Steps ​

Performance Tuning

Performance Overview

Configuration Tuning

Server Settings

Provider Configuration

Resource Management

Memory Optimization

CPU Optimization

Connection Pooling

HTTP Client Configuration

Database Connections (if applicable)

Monitoring Performance

Built-in Metrics

Performance Monitoring

Benchmarking

Running Benchmarks

Benchmark Results

Load Testing

Optimization Strategies

1. Circuit Breaker

2. Request Caching

3. Rate Limiting

4. Compression

Performance Best Practices

1. Provider Selection

2. Timeout Configuration

3. Concurrent Processing

4. Memory Management

Scaling Strategies

Horizontal Scaling

Load Balancing

Monitoring and Alerting

Prometheus Integration

Grafana Dashboard

Troubleshooting Performance

High Latency

Memory Issues

CPU Bottlenecks

Performance Checklist

Next Steps