Model Gateway
Model Gateway
Unified interface to 12+ AI providers with automatic routing and fallback.
Overview
The Model Gateway provides:
- Unified API - Same interface for all providers
- Provider abstraction - Swap providers without code changes
- Auto-selection - Automatically choose the best available provider
- Fallback support - Graceful degradation when providers fail
- Tool translation - Convert tool schemas between formats
Supported Providers
Cloud Providers
| Provider | Models | Tool Support |
|---|---|---|
| Anthropic | Claude Sonnet 4, Opus, Haiku | Full |
| OpenAI | GPT-4o, GPT-4 Turbo | Full |
| Gemini 2.0, Gemini 1.5 | Full | |
| Mistral | Large, Codestral | Full |
| Groq | Llama 3.3, Mixtral | Partial |
| Together | Llama, Mixtral, others | Partial |
| Fireworks | Llama, Code models | Partial |
| AI21 | Jamba 1.5 | Limited |
| HuggingFace | Open models | Limited |
Gateway Providers
| Provider | Description | Tool Support |
|---|---|---|
| OpenRouter | Access any model | Varies by model |
| LiteLLM | Self-hosted proxy | Full |
Local Providers
| Provider | Description | Tool Support |
|---|---|---|
| Ollama | Local model serving | Partial |
Provider Priority
When using auto-selection:
1. Anthropic - Best tool use and coding
2. OpenAI - Strong general capabilities
3. Google - Good multimodal
4. Mistral - European alternative
5. OpenRouter - Access to many models
6. Together - Open-source models
7. Groq - Fast inference
8. Ollama - Local fallback
9. LiteLLM - Proxy fallbackConfiguration
API Keys
Set via environment variables:
# Primary providers
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...
# Additional providers
export MISTRAL_API_KEY=...
export GROQ_API_KEY=...
export TOGETHER_API_KEY=...
export OPENROUTER_API_KEY=...
# Local providers
export OLLAMA_BASE_URL=http://localhost:11434
export LITELLM_BASE_URL=http://localhost:4000Default Models
Each provider has a default model:
| Provider | Default Model |
|---|---|
| Anthropic | claude-sonnet-4-20250514 |
| OpenAI | gpt-4o |
gemini-2.0-flash | |
| Mistral | mistral-large-latest |
| Groq | llama-3.3-70b-versatile |
| Together | meta-llama/Llama-3.3-70B-Instruct-Turbo |
| OpenRouter | anthropic/claude-sonnet-4 |
| Ollama | llama3.3 |
Usage
Basic Chat
import { chat } from './providers';
const response = await chat(
'anthropic', // Provider
messages, // Conversation
tools, // Available tools
'claude-sonnet-4' // Specific model (optional)
);Auto Selection
const response = await chat(
'auto', // Let gateway choose
messages,
tools
);Response Format
All providers return a unified response:
interface LLMResponse {
content: string; // Text response
toolCalls?: ToolCall[]; // Tool invocations
finishReason: 'stop' | 'tool_use' | 'length' | 'error';
usage?: {
inputTokens: number;
outputTokens: number;
};
}Tool Translation
Unified Tool Format
Define tools once:
const tool = {
name: 'shell',
description: 'Execute a shell command',
parameters: {
type: 'object',
properties: {
command: {
type: 'string',
description: 'The command to run',
},
},
required: ['command'],
},
};Provider-Specific Translation
The gateway translates to each provider’s format:
Anthropic:
{
"name": "shell",
"description": "Execute a shell command",
"input_schema": { ... }
}OpenAI:
{
"type": "function",
"function": {
"name": "shell",
"description": "Execute a shell command",
"parameters": { ... }
}
}Error Handling
Provider Errors
Common error scenarios:
| Error | Cause | Gateway Behavior |
|---|---|---|
| API key invalid | Wrong or expired key | Return error |
| Rate limited | Too many requests | Retry with backoff |
| Model unavailable | Model not accessible | Suggest alternatives |
| Empty response | Provider issue | Return error |
Automatic Fallback
With multiple providers configured, the gateway can fall back:
// If Anthropic fails, try OpenAI
const response = await chatWithFallback(
['anthropic', 'openai'],
messages,
tools
);Provider Details
Anthropic
Best for:
- Coding and tool use
- Long-form reasoning
- Constitutional AI safety
// Uses native Anthropic SDK
const client = new Anthropic({ apiKey });
const response = await client.messages.create({
model: 'claude-sonnet-4',
max_tokens: 8192,
messages: anthropicMessages,
tools: anthropicTools,
});OpenAI
Best for:
- General tasks
- Multimodal (images)
- Wide model selection
// Uses native OpenAI SDK
const client = new OpenAI({ apiKey });
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: openaiMessages,
tools: openaiTools,
});Google Gemini
Best for:
- Multimodal tasks
- Very long context (1M tokens)
- Cost efficiency
// Uses Google Generative AI SDK
const genAI = new GoogleGenerativeAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });OpenAI-Compatible
Several providers use OpenAI-compatible APIs:
- OpenRouter
- Together
- Groq
- Fireworks
- Mistral
- AI21
- HuggingFace
- Ollama (v1 API)
- LiteLLM
// Uses OpenAI SDK with custom base URL
const client = new OpenAI({
apiKey,
baseURL: 'https://api.together.xyz/v1',
});Model Information
Getting Available Models
const models = await gateway.listModels('openrouter');
// Returns model list with context lengths, pricing, etc.
Model Metadata
The gateway can retrieve model information:
interface ModelInfo {
id: string;
name: string;
context_length?: number;
pricing?: {
prompt: string; // per 1K tokens
completion: string; // per 1K tokens
};
}Best Practices
1. Configure Multiple Providers
Have fallback options:
export ANTHROPIC_API_KEY=... # Primary
export OPENAI_API_KEY=... # Backup
export OLLAMA_BASE_URL=... # Offline2. Use Appropriate Models
Match model to task:
| Task | Recommended |
|---|---|
| Complex coding | Claude Sonnet 4 |
| Quick queries | GPT-4o-mini, Gemini Flash |
| Large context | Gemini 2.0 (1M tokens) |
| Local/offline | Ollama |
3. Monitor Usage
Track token usage for cost management:
const response = await chat(...);
console.log('Tokens:', response.usage);4. Handle Rate Limits
Implement backoff for high-volume usage:
try {
return await chat(provider, messages, tools);
} catch (error) {
if (error.status === 429) {
await delay(1000);
return await chat(provider, messages, tools);
}
throw error;
}