Model Gateway

Unified interface to 12+ AI providers with automatic routing and fallback.

Overview

The Model Gateway provides:

Unified API - Same interface for all providers
Provider abstraction - Swap providers without code changes
Auto-selection - Automatically choose the best available provider
Fallback support - Graceful degradation when providers fail
Tool translation - Convert tool schemas between formats

Supported Providers

Cloud Providers

Provider	Models	Tool Support
Anthropic	Claude Sonnet 4, Opus, Haiku	Full
OpenAI	GPT-4o, GPT-4 Turbo	Full
Google	Gemini 2.0, Gemini 1.5	Full
Mistral	Large, Codestral	Full
Groq	Llama 3.3, Mixtral	Partial
Together	Llama, Mixtral, others	Partial
Fireworks	Llama, Code models	Partial
AI21	Jamba 1.5	Limited
HuggingFace	Open models	Limited

Gateway Providers

Provider	Description	Tool Support
OpenRouter	Access any model	Varies by model
LiteLLM	Self-hosted proxy	Full

Local Providers

Provider	Description	Tool Support
Ollama	Local model serving	Partial

Provider Priority

When using auto-selection:

1. Anthropic    - Best tool use and coding
2. OpenAI       - Strong general capabilities
3. Google       - Good multimodal
4. Mistral      - European alternative
5. OpenRouter   - Access to many models
6. Together     - Open-source models
7. Groq         - Fast inference
8. Ollama       - Local fallback
9. LiteLLM      - Proxy fallback

Configuration

API Keys

Set via environment variables:

# Primary providers
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...

# Additional providers
export MISTRAL_API_KEY=...
export GROQ_API_KEY=...
export TOGETHER_API_KEY=...
export OPENROUTER_API_KEY=...

# Local providers
export OLLAMA_BASE_URL=http://localhost:11434
export LITELLM_BASE_URL=http://localhost:4000

Default Models

Each provider has a default model:

Provider	Default Model
Anthropic	`claude-sonnet-4-20250514`
OpenAI	`gpt-4o`
Google	`gemini-2.0-flash`
Mistral	`mistral-large-latest`
Groq	`llama-3.3-70b-versatile`
Together	`meta-llama/Llama-3.3-70B-Instruct-Turbo`
OpenRouter	`anthropic/claude-sonnet-4`
Ollama	`llama3.3`

Usage

Basic Chat

import { chat } from './providers';

const response = await chat(
  'anthropic',  // Provider
  messages,     // Conversation
  tools,        // Available tools
  'claude-sonnet-4'  // Specific model (optional)
);

Auto Selection

const response = await chat(
  'auto',    // Let gateway choose
  messages,
  tools
);

Response Format

All providers return a unified response:

interface LLMResponse {
  content: string;           // Text response
  toolCalls?: ToolCall[];    // Tool invocations
  finishReason: 'stop' | 'tool_use' | 'length' | 'error';
  usage?: {
    inputTokens: number;
    outputTokens: number;
  };
}

Tool Translation

Unified Tool Format

Define tools once:

const tool = {
  name: 'shell',
  description: 'Execute a shell command',
  parameters: {
    type: 'object',
    properties: {
      command: {
        type: 'string',
        description: 'The command to run',
      },
    },
    required: ['command'],
  },
};

Provider-Specific Translation

The gateway translates to each provider’s format:

Anthropic:

{
  "name": "shell",
  "description": "Execute a shell command",
  "input_schema": { ... }
}

OpenAI:

{
  "type": "function",
  "function": {
    "name": "shell",
    "description": "Execute a shell command",
    "parameters": { ... }
  }
}

Error Handling

Provider Errors

Common error scenarios:

Error	Cause	Gateway Behavior
API key invalid	Wrong or expired key	Return error
Rate limited	Too many requests	Retry with backoff
Model unavailable	Model not accessible	Suggest alternatives
Empty response	Provider issue	Return error

Automatic Fallback

With multiple providers configured, the gateway can fall back:

// If Anthropic fails, try OpenAI
const response = await chatWithFallback(
  ['anthropic', 'openai'],
  messages,
  tools
);

Provider Details

Anthropic

Best for:

Coding and tool use
Long-form reasoning
Constitutional AI safety

// Uses native Anthropic SDK
const client = new Anthropic({ apiKey });
const response = await client.messages.create({
  model: 'claude-sonnet-4',
  max_tokens: 8192,
  messages: anthropicMessages,
  tools: anthropicTools,
});

OpenAI

Best for:

General tasks
Multimodal (images)
Wide model selection

// Uses native OpenAI SDK
const client = new OpenAI({ apiKey });
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: openaiMessages,
  tools: openaiTools,
});

Google Gemini

Best for:

Multimodal tasks
Very long context (1M tokens)
Cost efficiency

// Uses Google Generative AI SDK
const genAI = new GoogleGenerativeAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });

OpenAI-Compatible

Several providers use OpenAI-compatible APIs:

OpenRouter
Together
Groq
Fireworks
Mistral
AI21
HuggingFace
Ollama (v1 API)
LiteLLM

// Uses OpenAI SDK with custom base URL
const client = new OpenAI({
  apiKey,
  baseURL: 'https://api.together.xyz/v1',
});

Model Information

Getting Available Models

const models = await gateway.listModels('openrouter');
// Returns model list with context lengths, pricing, etc.

Model Metadata

The gateway can retrieve model information:

interface ModelInfo {
  id: string;
  name: string;
  context_length?: number;
  pricing?: {
    prompt: string;     // per 1K tokens
    completion: string; // per 1K tokens
  };
}

Best Practices

1. Configure Multiple Providers

Have fallback options:

export ANTHROPIC_API_KEY=...  # Primary
export OPENAI_API_KEY=...     # Backup
export OLLAMA_BASE_URL=...    # Offline

2. Use Appropriate Models

Match model to task:

Task	Recommended
Complex coding	Claude Sonnet 4
Quick queries	GPT-4o-mini, Gemini Flash
Large context	Gemini 2.0 (1M tokens)
Local/offline	Ollama

3. Monitor Usage

Track token usage for cost management:

const response = await chat(...);
console.log('Tokens:', response.usage);

4. Handle Rate Limits

Implement backoff for high-volume usage:

try {
  return await chat(provider, messages, tools);
} catch (error) {
  if (error.status === 429) {
    await delay(1000);
    return await chat(provider, messages, tools);
  }
  throw error;
}

Commands API Reference