Context Engine

Intelligent context assembly with token budgeting and priority-based pruning.

Overview

The Context Engine assembles context for AI conversations, ensuring that:

Token limits are respected
Important context is preserved
Less critical content is pruned when necessary
Context is properly formatted for each model

How It Works

Assembly Process

1. System Prompt (highest priority, never pruned)
   ↓
2. Tool Definitions
   ↓
3. Explicit Files (user-requested)
   ↓
4. Folder Structures
   ↓
5. Search Results
   ↓
6. Conversation History
   ↓
7. User's Current Prompt (never pruned)
   ↓
   Budget Check → Prune if necessary
   ↓
   Assembled Context

Priority Levels

Content is assigned priority levels that determine pruning order:

Priority	Content Type	Prunable
1 (Highest)	System prompt	No
2	User’s prompt	No
3	Explicit files	Yes
4	Tool definitions	Yes
5	Recent history	Yes
6	Folder structures	Yes
7	Search results	Yes
8 (Lowest)	Old history	Yes

Token Budgeting

Each model has a token budget:

interface TokenBudget {
  maxTokens: number;        // Total context limit
  reservedForResponse: number;  // Reserved for AI response
}

// Example: Claude with 200K context
{
  maxTokens: 200000,
  reservedForResponse: 8192
}

Available tokens = maxTokens - reservedForResponse

Context Slices

Context is assembled from “slices” - discrete chunks of content:

Slice Types

Type	Description
`system`	System prompt
`tools`	Tool definitions
`file`	Individual file contents
`folder`	Directory structure listing
`search`	Search results
`conversation`	Chat history
`custom`	Custom context (user prompt)

Slice Properties

Each slice has:

interface ContextSlice {
  id: string;           // Unique identifier
  type: SliceType;      // Content type
  content: string;      // Actual content
  priority: number;     // Pruning priority
  tokenCount: number;   // Estimated tokens
  prunable: boolean;    // Can be removed?
  source?: string;      // Source file/location
  metadata?: Record<string, unknown>;
}

Loading Content

Loading Files

Add specific files to context:

const context = await contextEngine.assemble({
  files: [
    '/path/to/main.ts',
    '/path/to/config.json',
  ],
  prompt: 'Review this code',
});

Files are loaded with:

Size limit (default 100KB)
Content included in context
Metadata (filename, size)

Loading Folders

Add directory structure:

const context = await contextEngine.assemble({
  folders: ['/path/to/src'],
  prompt: 'What files are in this project?',
});

Folder loading:

Lists files (up to 100)
Excludes patterns (node_modules, .git, etc.)
Shows structure without content

Search Integration

Add search results:

const context = await contextEngine.assemble({
  searchQuery: 'authentication',
  prompt: 'How does auth work?',
});

Pruning Strategy

When assembled context exceeds the budget:

1. Sort by Priority

Slices are sorted by priority (highest first).

2. Include High Priority

Non-prunable slices (system prompt, user prompt) are always included.

3. Fill Remaining Budget

Add prunable slices in priority order until budget is exhausted.

4. Report Pruned Content

Return list of what was pruned for transparency.

Example

Budget: 50,000 tokens

Included:
- System prompt: 500 tokens
- User prompt: 100 tokens
- Tool definitions: 2,000 tokens
- main.ts: 3,000 tokens
- Recent history (5 msgs): 2,500 tokens
Total: 8,100 tokens

Pruned:
- config.ts: 5,000 tokens (exceeded budget)
- Old history: 12,000 tokens

Configuration

Default Configuration

const DEFAULT_CONTEXT_CONFIG = {
  defaultBudget: {
    maxTokens: 100000,
    reservedForResponse: 8192,
  },
  modelBudgets: {
    'claude-sonnet-4': { maxTokens: 200000, reservedForResponse: 8192 },
    'gpt-4o': { maxTokens: 128000, reservedForResponse: 4096 },
  },
  maxFileSize: 100 * 1024,  // 100KB
  excludePatterns: [
    '**/node_modules/**',
    '**/.git/**',
    '**/dist/**',
    '**/*.min.js',
  ],
};

Per-Model Budgets

Different models have different context limits:

Model	Max Tokens	Reserved
Claude Sonnet 4	200,000	8,192
Claude Opus	200,000	8,192
GPT-4o	128,000	4,096
Gemini 2.0	1,000,000	8,192
Mistral Large	128,000	4,096

Conversation History

History Management

History is tracked per session:

// Add to history
contextEngine.addToHistory(sessionId, 'user', 'Hello');
contextEngine.addToHistory(sessionId, 'assistant', 'Hi there!');

// Clear history
contextEngine.clearHistory(sessionId);

History Limits

Last 50 messages retained
Older messages can be compressed (see Memory Management)
History is prunable if budget is tight

Output Format

Assembled Context Structure

interface AssembledContext {
  slices: ContextSlice[];      // Included slices
  totalTokens: number;         // Total tokens used
  remainingTokens: number;     // Budget remaining
  prunedSlices: ContextSlice[]; // What was cut
  metadata: {
    assembledAt: Date;
    budgetUsed: number;
    budgetTotal: number;
  };
}

Formatting for LLM

Convert to string for the model:

const formatted = contextEngine.formatForLLM(assembled);
// Returns:
// --- file: src/main.ts ---
// [file contents]
//
// --- conversation ---
// user: Hello
// assistant: Hi there!

Best Practices

1. Be Selective with Files

Only add files that are directly relevant:

// Good: specific files
files: ['src/api/users.ts', 'src/models/User.ts']

// Avoid: entire directories
files: ['src/**/*.ts']  // May blow budget

2. Use Search for Discovery

Let search find relevant files rather than loading everything:

{
  searchQuery: 'user authentication',
  prompt: 'How is user auth implemented?',
}

3. Monitor Pruning

Check what was pruned to understand context limits:

const context = await contextEngine.assemble(request);
if (context.prunedSlices.length > 0) {
  console.log('Pruned:', context.prunedSlices.map(s => s.source));
}

4. Use Appropriate Models

For large codebases, use models with larger context:

Task	Recommended Model
Small file review	Any model
Multi-file analysis	Claude (200K)
Large codebase	Gemini (1M context)

Getting Started Memory Management