Ollama Provider
Run AI models locally on your own hardware with complete privacy. Ollama provides an OpenAI-compatible API that works seamlessly with CCProxy, allowing you to use Claude Code without sending data to external servers.
Why Choose Ollama?
- 🔒 Complete Privacy: All processing happens locally - your data never leaves your machine
- 💰 Free to Use: No API costs, just your electricity
- 🚀 Fast Response: No network latency for model calls
- 🌐 Works Offline: After initial model download, no internet required
- 🛠️ Easy Setup: Simple installation and model management
Setup
1. Install Ollama
macOS/Linux:
curl -fsSL https://ollama.com/install.sh | shWindows: Download from ollama.com/download
2. Download a Model
Choose a model based on your needs:
# For general use (7B parameters, ~4GB)
ollama pull llama3.1
# For coding tasks (7B parameters, ~4GB)
ollama pull codellama
# For smaller systems (3B parameters, ~2GB)
ollama pull phi3
# For fast responses (7B parameters, ~4GB)
ollama pull mistral3. Configure CCProxy
Create or update your CCProxy configuration:
{
"providers": [
{
"name": "openai",
"api_base_url": "http://localhost:11434/v1",
"api_key": "ollama",
"models": ["llama3.1", "codellama", "mistral"],
"enabled": true
}
],
"routes": {
"default": {
"provider": "openai",
"model": "llama3.1"
}
}
}Important: Use "openai" as the provider name since Ollama provides an OpenAI-compatible API.
4. Start CCProxy
ccproxy start
ccproxy codeAvailable Models
General Purpose
- llama3.1 (8B/70B) - Latest Llama model with tool calling support
- llama3 (8B/70B) - Previous generation, still excellent
- mistral (7B) - Fast and efficient
Coding Specialized
- codellama (7B/13B/34B) - Optimized for code generation
- deepseek-coder (1.3B/6.7B) - Efficient coding model
- qwen2.5-coder (Various sizes) - Strong coding capabilities
Small Models (Low Resource)
- phi3 (3B) - Microsoft's efficient model
- gemma (2B/7B) - Google's lightweight models
- tinyllama (1.1B) - Tiny but capable
Configuration Examples
Basic Setup
Minimal configuration using Llama 3.1:
{
"providers": [
{
"name": "openai",
"api_base_url": "http://localhost:11434/v1",
"api_key": "ollama",
"enabled": true
}
],
"routes": {
"default": {
"provider": "openai",
"model": "llama3.1"
}
}
}Multi-Model Setup
Different models for different tasks:
{
"providers": [
{
"name": "openai",
"api_base_url": "http://localhost:11434/v1",
"api_key": "ollama",
"models": ["llama3.1", "codellama", "phi3"],
"enabled": true
}
],
"routes": {
"default": {
"provider": "openai",
"model": "llama3.1"
},
"background": {
"provider": "openai",
"model": "phi3"
},
"think": {
"provider": "openai",
"model": "llama3.1:70b"
}
}
}Remote Ollama Server
If running Ollama on another machine:
{
"providers": [
{
"name": "openai",
"api_base_url": "http://192.168.1.100:11434/v1",
"api_key": "ollama",
"enabled": true
}
]
}Function Calling Support
Ollama supports function calling with compatible models:
Models with Tool Support:
- ✅ llama3.1 - Full tool calling support
- ✅ mistral - Basic tool support
- ❌ codellama - No tool support (code generation only)
- ❌ phi3 - No tool support
For Claude Code usage, we recommend llama3.1 as it has the best tool calling compatibility.
Performance Optimization
Model Selection by Hardware
High-End (32GB+ RAM, GPU):
- llama3.1:70b
- codellama:34b
Mid-Range (16GB RAM):
- llama3.1:8b
- codellama:13b
- mistral:7b
Low-End (8GB RAM):
- phi3:3b
- gemma:2b
- tinyllama:1.1b
Ollama Performance Settings
Set environment variables before starting Ollama:
# Number of parallel requests
export OLLAMA_NUM_PARALLEL=2
# GPU layers (if you have a GPU)
export OLLAMA_NUM_GPU=999
# CPU threads
export OLLAMA_NUM_THREAD=8Troubleshooting
Ollama Not Responding
Check if Ollama is running:
bashollama listStart Ollama service:
bashollama serveVerify API endpoint:
bashcurl http://localhost:11434/v1/models
Model Download Issues
# Check available models
ollama list
# Remove and re-download
ollama rm llama3.1
ollama pull llama3.1Memory Issues
For large models on limited RAM:
- Use smaller models (phi3, gemma)
- Reduce context size in Ollama
- Close other applications
CCProxy Connection Issues
Ensure your configuration uses:
- Provider name:
"openai"(not "ollama") - Correct base URL with
/v1suffix - API key set to
"ollama"
Best Practices
- Model Selection: Start with smaller models and upgrade based on performance needs
- Privacy: Disable Ollama telemetry for complete privacy:bash
export OLLAMA_NO_ANALYTICS=true - Updates: Keep models updated:bash
ollama pull llama3.1 - Resource Management: Monitor system resources when using large models
Security Considerations
- Ollama binds to
localhostby default (secure) - To expose Ollama to network, use:bash⚠️ Only do this on trusted networks
OLLAMA_HOST=0.0.0.0 ollama serve
Next Steps
- Explore different models for your use case
- Fine-tune models with Ollama's Modelfile feature
- Set up GPU acceleration if available
- Consider running Ollama on a dedicated server
For more information, visit ollama.com.