Monitoring

Monitor CCProxy performance, health, and usage with comprehensive monitoring solutions.

Key Metrics to Monitor

Service Health

Service uptime: Monitor /health endpoint
Response time: Track API response latency
Error rate: Monitor failed requests
Request volume: Track requests per second

Provider Performance

Provider latency: Time taken for provider API calls
Provider errors: Failed provider API calls
Token usage: Monitor token consumption per provider
Rate limiting: Track rate limit hits

System Resources

Memory usage: Monitor container/process memory
CPU usage: Track CPU utilization
Network I/O: Monitor network bandwidth
Disk usage: Track log file sizes

Monitoring Stack

Prometheus + Grafana

Prometheus Configuration

yaml

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'ccproxy'
    static_configs:
      - targets: ['localhost:3456']
    metrics_path: /health
    scrape_interval: 30s

Grafana Dashboard

json

{
  "dashboard": {
    "title": "CCProxy Dashboard",
    "panels": [
      {
        "title": "Service Health",
        "type": "stat",
        "targets": [
          {
            "expr": "up{job=\"ccproxy\"}",
            "legendFormat": "Service Status"
          }
        ]
      },
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "Requests/sec"
          }
        ]
      }
    ]
  }
}

Log-Based Monitoring

ELK Stack Integration

yaml

# logstash.conf
input {
  file {
    path => "/var/log/ccproxy/app.log"
    codec => "json"
    type => "ccproxy"
  }
}

filter {
  if [type] == "ccproxy" {
    if [action] == "anthropic_request" {
      mutate {
        add_field => { "metric_type" => "request" }
      }
    }
    if [action] == "anthropic_response" {
      mutate {
        add_field => { "metric_type" => "response" }
      }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "ccproxy-%{+YYYY.MM.dd}"
  }
}

Kibana Visualizations

Request Volume: Line chart showing requests over time
Error Rate: Pie chart showing error distribution
Response Time: Histogram of API response times
Provider Usage: Bar chart showing provider utilization

Health Check Monitoring

Uptime Monitoring

bash

#!/bin/bash
# health-check.sh
ENDPOINT="http://localhost:3456/health"
TIMEOUT=10

while true; do
  if curl -f -s --max-time $TIMEOUT $ENDPOINT > /dev/null; then
    echo "$(date): Service is healthy"
  else
    echo "$(date): Service is unhealthy!"
    # Send alert
  fi
  sleep 30
done

Kubernetes Monitoring

yaml

apiVersion: v1
kind: Service
metadata:
  name: ccproxy-monitoring
  labels:
    app: ccproxy
spec:
  selector:
    app: ccproxy
  ports:
    - name: http
      port: 3456
      targetPort: 3456

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ccproxy
spec:
  selector:
    matchLabels:
      app: ccproxy
  endpoints:
    - port: http
      path: /health
      interval: 30s

Alerting

Prometheus Alerts

yaml

# alerts.yml
groups:
  - name: ccproxy
    rules:
      - alert: CCProxyDown
        expr: up{job="ccproxy"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "CCProxy service is down"
          description: "CCProxy has been down for more than 1 minute"

      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} errors per second"

      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "High latency detected"
          description: "95th percentile latency is {{ $value }} seconds"

Notification Channels

yaml

# alertmanager.yml
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
  - name: 'web.hook'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
        channel: '#alerts'
        title: 'CCProxy Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

Custom Monitoring Scripts

Provider Health Check

bash

#!/bin/bash
# provider-health.sh
PROVIDERS=("groq" "openai" "gemini" "mistral" "xai" "ollama")

for provider in "${PROVIDERS[@]}"; do
  echo "Checking $provider provider..."
  
  # Test provider endpoint
  response=$(curl -s -o /dev/null -w "%{http_code}" \
    -H "Content-Type: application/json" \
    -X POST http://localhost:3456/v1/messages \
    -d '{
      "model": "test",
      "messages": [{"role": "user", "content": "test"}],
      "max_tokens": 10
    }')
  
  if [ "$response" -eq 200 ]; then
    echo "$provider: OK"
  else
    echo "$provider: ERROR (HTTP $response)"
  fi
done

Performance Metrics

bash

#!/bin/bash
# performance-metrics.sh
LOG_FILE="/var/log/ccproxy/app.log"

# Request count in last hour
echo "Requests in last hour:"
grep "$(date -d '1 hour ago' '+%Y-%m-%dT%H')" $LOG_FILE | \
  grep '"action":"anthropic_request"' | wc -l

# Average response time
echo "Average response time:"
grep '"duration_ms"' $LOG_FILE | \
  jq '.duration_ms' | \
  awk '{sum+=$1; count++} END {print sum/count "ms"}'

# Error rate
echo "Error rate:"
total=$(grep '"level":"' $LOG_FILE | wc -l)
errors=$(grep '"level":"error"' $LOG_FILE | wc -l)
echo "scale=2; $errors * 100 / $total" | bc -l

Integration with External Services

Datadog

yaml

# datadog.yaml
logs:
  - type: file
    path: /var/log/ccproxy/app.log
    service: ccproxy
    source: go
    sourcecategory: sourcecode

New Relic

bash

# Install New Relic Go agent
go get github.com/newrelic/go-agent/v3/newrelic

# Configuration
export NEW_RELIC_LICENSE_KEY=your_license_key
export NEW_RELIC_APP_NAME=CCProxy

Best Practices

Monitor key metrics: Focus on SLIs (Service Level Indicators)
Set up proactive alerting: Don't wait for users to report issues
Use dashboards: Visualize metrics for quick understanding
Regular review: Analyze trends and adjust thresholds
Test alerts: Ensure notifications work correctly
Document runbooks: Create incident response procedures

Troubleshooting

Common Monitoring Issues

No data in dashboards: Check Prometheus scraping configuration
False alerts: Adjust alert thresholds and timing
Missing metrics: Verify log format and parsing
High monitoring overhead: Optimize scraping intervals

Performance Impact

Monitor the monitoring system itself:

bash

# Check Prometheus memory usage
ps aux | grep prometheus

# Monitor log file sizes
du -h /var/log/ccproxy/

# Check disk space
df -h

Monitoring ​

Key Metrics to Monitor ​

Service Health ​

Provider Performance ​

System Resources ​

Monitoring Stack ​

Prometheus + Grafana ​

Prometheus Configuration ​

Grafana Dashboard ​

Log-Based Monitoring ​

ELK Stack Integration ​

Kibana Visualizations ​

Health Check Monitoring ​

Uptime Monitoring ​

Kubernetes Monitoring ​

Alerting ​

Prometheus Alerts ​

Notification Channels ​

Custom Monitoring Scripts ​

Provider Health Check ​

Performance Metrics ​

Integration with External Services ​

Datadog ​

New Relic ​

Best Practices ​

Troubleshooting ​

Common Monitoring Issues ​

Performance Impact ​

Monitoring

Key Metrics to Monitor

Service Health

Provider Performance

System Resources

Monitoring Stack

Prometheus + Grafana

Prometheus Configuration

Grafana Dashboard

Log-Based Monitoring

ELK Stack Integration

Kibana Visualizations

Health Check Monitoring

Uptime Monitoring

Kubernetes Monitoring

Alerting

Prometheus Alerts

Notification Channels

Custom Monitoring Scripts

Provider Health Check

Performance Metrics

Integration with External Services

Datadog

New Relic

Best Practices

Troubleshooting

Common Monitoring Issues

Performance Impact