Context Engine

Context Engine

Intelligent context assembly with token budgeting and priority-based pruning.

Overview

The Context Engine assembles context for AI conversations, ensuring that:

  • Token limits are respected
  • Important context is preserved
  • Less critical content is pruned when necessary
  • Context is properly formatted for each model

How It Works

Assembly Process

1. System Prompt (highest priority, never pruned)
   ↓
2. Tool Definitions
   ↓
3. Explicit Files (user-requested)
   ↓
4. Folder Structures
   ↓
5. Search Results
   ↓
6. Conversation History
   ↓
7. User's Current Prompt (never pruned)
   ↓
   Budget Check → Prune if necessary
   ↓
   Assembled Context

Priority Levels

Content is assigned priority levels that determine pruning order:

PriorityContent TypePrunable
1 (Highest)System promptNo
2User’s promptNo
3Explicit filesYes
4Tool definitionsYes
5Recent historyYes
6Folder structuresYes
7Search resultsYes
8 (Lowest)Old historyYes

Token Budgeting

Each model has a token budget:

interface TokenBudget {
  maxTokens: number;        // Total context limit
  reservedForResponse: number;  // Reserved for AI response
}

// Example: Claude with 200K context
{
  maxTokens: 200000,
  reservedForResponse: 8192
}

Available tokens = maxTokens - reservedForResponse

Context Slices

Context is assembled from “slices” - discrete chunks of content:

Slice Types

TypeDescription
systemSystem prompt
toolsTool definitions
fileIndividual file contents
folderDirectory structure listing
searchSearch results
conversationChat history
customCustom context (user prompt)

Slice Properties

Each slice has:

interface ContextSlice {
  id: string;           // Unique identifier
  type: SliceType;      // Content type
  content: string;      // Actual content
  priority: number;     // Pruning priority
  tokenCount: number;   // Estimated tokens
  prunable: boolean;    // Can be removed?
  source?: string;      // Source file/location
  metadata?: Record<string, unknown>;
}

Loading Content

Loading Files

Add specific files to context:

const context = await contextEngine.assemble({
  files: [
    '/path/to/main.ts',
    '/path/to/config.json',
  ],
  prompt: 'Review this code',
});

Files are loaded with:

  • Size limit (default 100KB)
  • Content included in context
  • Metadata (filename, size)

Loading Folders

Add directory structure:

const context = await contextEngine.assemble({
  folders: ['/path/to/src'],
  prompt: 'What files are in this project?',
});

Folder loading:

  • Lists files (up to 100)
  • Excludes patterns (node_modules, .git, etc.)
  • Shows structure without content

Search Integration

Add search results:

const context = await contextEngine.assemble({
  searchQuery: 'authentication',
  prompt: 'How does auth work?',
});

Pruning Strategy

When assembled context exceeds the budget:

1. Sort by Priority

Slices are sorted by priority (highest first).

2. Include High Priority

Non-prunable slices (system prompt, user prompt) are always included.

3. Fill Remaining Budget

Add prunable slices in priority order until budget is exhausted.

4. Report Pruned Content

Return list of what was pruned for transparency.

Example

Budget: 50,000 tokens

Included:
- System prompt: 500 tokens
- User prompt: 100 tokens
- Tool definitions: 2,000 tokens
- main.ts: 3,000 tokens
- Recent history (5 msgs): 2,500 tokens
Total: 8,100 tokens

Pruned:
- config.ts: 5,000 tokens (exceeded budget)
- Old history: 12,000 tokens

Configuration

Default Configuration

const DEFAULT_CONTEXT_CONFIG = {
  defaultBudget: {
    maxTokens: 100000,
    reservedForResponse: 8192,
  },
  modelBudgets: {
    'claude-sonnet-4': { maxTokens: 200000, reservedForResponse: 8192 },
    'gpt-4o': { maxTokens: 128000, reservedForResponse: 4096 },
  },
  maxFileSize: 100 * 1024,  // 100KB
  excludePatterns: [
    '**/node_modules/**',
    '**/.git/**',
    '**/dist/**',
    '**/*.min.js',
  ],
};

Per-Model Budgets

Different models have different context limits:

ModelMax TokensReserved
Claude Sonnet 4200,0008,192
Claude Opus200,0008,192
GPT-4o128,0004,096
Gemini 2.01,000,0008,192
Mistral Large128,0004,096

Conversation History

History Management

History is tracked per session:

// Add to history
contextEngine.addToHistory(sessionId, 'user', 'Hello');
contextEngine.addToHistory(sessionId, 'assistant', 'Hi there!');

// Clear history
contextEngine.clearHistory(sessionId);

History Limits

  • Last 50 messages retained
  • Older messages can be compressed (see Memory Management)
  • History is prunable if budget is tight

Output Format

Assembled Context Structure

interface AssembledContext {
  slices: ContextSlice[];      // Included slices
  totalTokens: number;         // Total tokens used
  remainingTokens: number;     // Budget remaining
  prunedSlices: ContextSlice[]; // What was cut
  metadata: {
    assembledAt: Date;
    budgetUsed: number;
    budgetTotal: number;
  };
}

Formatting for LLM

Convert to string for the model:

const formatted = contextEngine.formatForLLM(assembled);
// Returns:
// --- file: src/main.ts ---
// [file contents]
//
// --- conversation ---
// user: Hello
// assistant: Hi there!

Best Practices

1. Be Selective with Files

Only add files that are directly relevant:

// Good: specific files
files: ['src/api/users.ts', 'src/models/User.ts']

// Avoid: entire directories
files: ['src/**/*.ts']  // May blow budget

2. Use Search for Discovery

Let search find relevant files rather than loading everything:

{
  searchQuery: 'user authentication',
  prompt: 'How is user auth implemented?',
}

3. Monitor Pruning

Check what was pruned to understand context limits:

const context = await contextEngine.assemble(request);
if (context.prunedSlices.length > 0) {
  console.log('Pruned:', context.prunedSlices.map(s => s.source));
}

4. Use Appropriate Models

For large codebases, use models with larger context:

TaskRecommended Model
Small file reviewAny model
Multi-file analysisClaude (200K)
Large codebaseGemini (1M context)