Model Gateway

Model Gateway

Unified interface to 12+ AI providers with automatic routing and fallback.

Overview

The Model Gateway provides:

  • Unified API - Same interface for all providers
  • Provider abstraction - Swap providers without code changes
  • Auto-selection - Automatically choose the best available provider
  • Fallback support - Graceful degradation when providers fail
  • Tool translation - Convert tool schemas between formats

Supported Providers

Cloud Providers

ProviderModelsTool Support
AnthropicClaude Sonnet 4, Opus, HaikuFull
OpenAIGPT-4o, GPT-4 TurboFull
GoogleGemini 2.0, Gemini 1.5Full
MistralLarge, CodestralFull
GroqLlama 3.3, MixtralPartial
TogetherLlama, Mixtral, othersPartial
FireworksLlama, Code modelsPartial
AI21Jamba 1.5Limited
HuggingFaceOpen modelsLimited

Gateway Providers

ProviderDescriptionTool Support
OpenRouterAccess any modelVaries by model
LiteLLMSelf-hosted proxyFull

Local Providers

ProviderDescriptionTool Support
OllamaLocal model servingPartial

Provider Priority

When using auto-selection:

1. Anthropic    - Best tool use and coding
2. OpenAI       - Strong general capabilities
3. Google       - Good multimodal
4. Mistral      - European alternative
5. OpenRouter   - Access to many models
6. Together     - Open-source models
7. Groq         - Fast inference
8. Ollama       - Local fallback
9. LiteLLM      - Proxy fallback

Configuration

API Keys

Set via environment variables:

# Primary providers
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...

# Additional providers
export MISTRAL_API_KEY=...
export GROQ_API_KEY=...
export TOGETHER_API_KEY=...
export OPENROUTER_API_KEY=...

# Local providers
export OLLAMA_BASE_URL=http://localhost:11434
export LITELLM_BASE_URL=http://localhost:4000

Default Models

Each provider has a default model:

ProviderDefault Model
Anthropicclaude-sonnet-4-20250514
OpenAIgpt-4o
Googlegemini-2.0-flash
Mistralmistral-large-latest
Groqllama-3.3-70b-versatile
Togethermeta-llama/Llama-3.3-70B-Instruct-Turbo
OpenRouteranthropic/claude-sonnet-4
Ollamallama3.3

Usage

Basic Chat

import { chat } from './providers';

const response = await chat(
  'anthropic',  // Provider
  messages,     // Conversation
  tools,        // Available tools
  'claude-sonnet-4'  // Specific model (optional)
);

Auto Selection

const response = await chat(
  'auto',    // Let gateway choose
  messages,
  tools
);

Response Format

All providers return a unified response:

interface LLMResponse {
  content: string;           // Text response
  toolCalls?: ToolCall[];    // Tool invocations
  finishReason: 'stop' | 'tool_use' | 'length' | 'error';
  usage?: {
    inputTokens: number;
    outputTokens: number;
  };
}

Tool Translation

Unified Tool Format

Define tools once:

const tool = {
  name: 'shell',
  description: 'Execute a shell command',
  parameters: {
    type: 'object',
    properties: {
      command: {
        type: 'string',
        description: 'The command to run',
      },
    },
    required: ['command'],
  },
};

Provider-Specific Translation

The gateway translates to each provider’s format:

Anthropic:

{
  "name": "shell",
  "description": "Execute a shell command",
  "input_schema": { ... }
}

OpenAI:

{
  "type": "function",
  "function": {
    "name": "shell",
    "description": "Execute a shell command",
    "parameters": { ... }
  }
}

Error Handling

Provider Errors

Common error scenarios:

ErrorCauseGateway Behavior
API key invalidWrong or expired keyReturn error
Rate limitedToo many requestsRetry with backoff
Model unavailableModel not accessibleSuggest alternatives
Empty responseProvider issueReturn error

Automatic Fallback

With multiple providers configured, the gateway can fall back:

// If Anthropic fails, try OpenAI
const response = await chatWithFallback(
  ['anthropic', 'openai'],
  messages,
  tools
);

Provider Details

Anthropic

Best for:

  • Coding and tool use
  • Long-form reasoning
  • Constitutional AI safety
// Uses native Anthropic SDK
const client = new Anthropic({ apiKey });
const response = await client.messages.create({
  model: 'claude-sonnet-4',
  max_tokens: 8192,
  messages: anthropicMessages,
  tools: anthropicTools,
});

OpenAI

Best for:

  • General tasks
  • Multimodal (images)
  • Wide model selection
// Uses native OpenAI SDK
const client = new OpenAI({ apiKey });
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: openaiMessages,
  tools: openaiTools,
});

Google Gemini

Best for:

  • Multimodal tasks
  • Very long context (1M tokens)
  • Cost efficiency
// Uses Google Generative AI SDK
const genAI = new GoogleGenerativeAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });

OpenAI-Compatible

Several providers use OpenAI-compatible APIs:

  • OpenRouter
  • Together
  • Groq
  • Fireworks
  • Mistral
  • AI21
  • HuggingFace
  • Ollama (v1 API)
  • LiteLLM
// Uses OpenAI SDK with custom base URL
const client = new OpenAI({
  apiKey,
  baseURL: 'https://api.together.xyz/v1',
});

Model Information

Getting Available Models

const models = await gateway.listModels('openrouter');
// Returns model list with context lengths, pricing, etc.

Model Metadata

The gateway can retrieve model information:

interface ModelInfo {
  id: string;
  name: string;
  context_length?: number;
  pricing?: {
    prompt: string;     // per 1K tokens
    completion: string; // per 1K tokens
  };
}

Best Practices

1. Configure Multiple Providers

Have fallback options:

export ANTHROPIC_API_KEY=...  # Primary
export OPENAI_API_KEY=...     # Backup
export OLLAMA_BASE_URL=...    # Offline

2. Use Appropriate Models

Match model to task:

TaskRecommended
Complex codingClaude Sonnet 4
Quick queriesGPT-4o-mini, Gemini Flash
Large contextGemini 2.0 (1M tokens)
Local/offlineOllama

3. Monitor Usage

Track token usage for cost management:

const response = await chat(...);
console.log('Tokens:', response.usage);

4. Handle Rate Limits

Implement backoff for high-volume usage:

try {
  return await chat(provider, messages, tools);
} catch (error) {
  if (error.status === 429) {
    await delay(1000);
    return await chat(provider, messages, tools);
  }
  throw error;
}