Context Engine
Intelligent context assembly with token budgeting and priority-based pruning.
Overview
The Context Engine assembles context for AI conversations, ensuring that:
- Token limits are respected
- Important context is preserved
- Less critical content is pruned when necessary
- Context is properly formatted for each model
How It Works
Assembly Process
1. System Prompt (highest priority, never pruned)
↓
2. Tool Definitions
↓
3. Explicit Files (user-requested)
↓
4. Folder Structures
↓
5. Search Results
↓
6. Conversation History
↓
7. User's Current Prompt (never pruned)
↓
Budget Check → Prune if necessary
↓
Assembled ContextPriority Levels
Content is assigned priority levels that determine pruning order:
| Priority | Content Type | Prunable |
|---|---|---|
| 1 (Highest) | System prompt | No |
| 2 | User’s prompt | No |
| 3 | Explicit files | Yes |
| 4 | Tool definitions | Yes |
| 5 | Recent history | Yes |
| 6 | Folder structures | Yes |
| 7 | Search results | Yes |
| 8 (Lowest) | Old history | Yes |
Token Budgeting
Each model has a token budget:
interface TokenBudget {
maxTokens: number; // Total context limit
reservedForResponse: number; // Reserved for AI response
}
// Example: Claude with 200K context
{
maxTokens: 200000,
reservedForResponse: 8192
}Available tokens = maxTokens - reservedForResponse
Context Slices
Context is assembled from “slices” - discrete chunks of content:
Slice Types
| Type | Description |
|---|---|
system | System prompt |
tools | Tool definitions |
file | Individual file contents |
folder | Directory structure listing |
search | Search results |
conversation | Chat history |
custom | Custom context (user prompt) |
Slice Properties
Each slice has:
interface ContextSlice {
id: string; // Unique identifier
type: SliceType; // Content type
content: string; // Actual content
priority: number; // Pruning priority
tokenCount: number; // Estimated tokens
prunable: boolean; // Can be removed?
source?: string; // Source file/location
metadata?: Record<string, unknown>;
}Loading Content
Loading Files
Add specific files to context:
const context = await contextEngine.assemble({
files: [
'/path/to/main.ts',
'/path/to/config.json',
],
prompt: 'Review this code',
});Files are loaded with:
- Size limit (default 100KB)
- Content included in context
- Metadata (filename, size)
Loading Folders
Add directory structure:
const context = await contextEngine.assemble({
folders: ['/path/to/src'],
prompt: 'What files are in this project?',
});Folder loading:
- Lists files (up to 100)
- Excludes patterns (node_modules, .git, etc.)
- Shows structure without content
Search Integration
Add search results:
const context = await contextEngine.assemble({
searchQuery: 'authentication',
prompt: 'How does auth work?',
});Pruning Strategy
When assembled context exceeds the budget:
1. Sort by Priority
Slices are sorted by priority (highest first).
2. Include High Priority
Non-prunable slices (system prompt, user prompt) are always included.
3. Fill Remaining Budget
Add prunable slices in priority order until budget is exhausted.
4. Report Pruned Content
Return list of what was pruned for transparency.
Example
Budget: 50,000 tokens
Included:
- System prompt: 500 tokens
- User prompt: 100 tokens
- Tool definitions: 2,000 tokens
- main.ts: 3,000 tokens
- Recent history (5 msgs): 2,500 tokens
Total: 8,100 tokens
Pruned:
- config.ts: 5,000 tokens (exceeded budget)
- Old history: 12,000 tokensConfiguration
Default Configuration
const DEFAULT_CONTEXT_CONFIG = {
defaultBudget: {
maxTokens: 100000,
reservedForResponse: 8192,
},
modelBudgets: {
'claude-sonnet-4': { maxTokens: 200000, reservedForResponse: 8192 },
'gpt-4o': { maxTokens: 128000, reservedForResponse: 4096 },
},
maxFileSize: 100 * 1024, // 100KB
excludePatterns: [
'**/node_modules/**',
'**/.git/**',
'**/dist/**',
'**/*.min.js',
],
};Per-Model Budgets
Different models have different context limits:
| Model | Max Tokens | Reserved |
|---|---|---|
| Claude Sonnet 4 | 200,000 | 8,192 |
| Claude Opus | 200,000 | 8,192 |
| GPT-4o | 128,000 | 4,096 |
| Gemini 2.0 | 1,000,000 | 8,192 |
| Mistral Large | 128,000 | 4,096 |
Conversation History
History Management
History is tracked per session:
// Add to history
contextEngine.addToHistory(sessionId, 'user', 'Hello');
contextEngine.addToHistory(sessionId, 'assistant', 'Hi there!');
// Clear history
contextEngine.clearHistory(sessionId);History Limits
- Last 50 messages retained
- Older messages can be compressed (see Memory Management)
- History is prunable if budget is tight
Output Format
Assembled Context Structure
interface AssembledContext {
slices: ContextSlice[]; // Included slices
totalTokens: number; // Total tokens used
remainingTokens: number; // Budget remaining
prunedSlices: ContextSlice[]; // What was cut
metadata: {
assembledAt: Date;
budgetUsed: number;
budgetTotal: number;
};
}Formatting for LLM
Convert to string for the model:
const formatted = contextEngine.formatForLLM(assembled);
// Returns:
// --- file: src/main.ts ---
// [file contents]
//
// --- conversation ---
// user: Hello
// assistant: Hi there!
Best Practices
1. Be Selective with Files
Only add files that are directly relevant:
// Good: specific files
files: ['src/api/users.ts', 'src/models/User.ts']
// Avoid: entire directories
files: ['src/**/*.ts'] // May blow budget
2. Use Search for Discovery
Let search find relevant files rather than loading everything:
{
searchQuery: 'user authentication',
prompt: 'How is user auth implemented?',
}3. Monitor Pruning
Check what was pruned to understand context limits:
const context = await contextEngine.assemble(request);
if (context.prunedSlices.length > 0) {
console.log('Pruned:', context.prunedSlices.map(s => s.source));
}4. Use Appropriate Models
For large codebases, use models with larger context:
| Task | Recommended Model |
|---|---|
| Small file review | Any model |
| Multi-file analysis | Claude (200K) |
| Large codebase | Gemini (1M context) |