Memory Management
Automatic conversation memory with intelligent compression.
Overview
The Memory Manager handles long conversations by:
- Tracking all conversation entries
- Detecting when compression is needed
- Creating summaries of old content
- Preserving important information
- Managing token budgets
How It Works
Memory Flow
New Message
↓
Add to Entries
↓
Calculate Total Tokens
↓
├── Under threshold? → Continue
│
└── Over threshold? → Trigger Compression
↓
Select Old Entries
↓
Generate Summary (LLM)
↓
Mark Entries Compressed
↓
Store SummaryEntry Types
| Type | Description | Role |
|---|---|---|
message | User or assistant message | user/assistant |
tool_call | AI tool invocation | assistant |
tool_result | Tool execution output | system |
context | Added context (files, etc.) | system |
summary | Compressed history | system |
Memory Entries
Entry Structure
interface MemoryEntry {
id: string;
type: MemoryEntryType;
role: 'user' | 'assistant' | 'system';
content: string;
timestamp: Date;
tokenCount: number;
compressed: boolean;
summaryId?: string; // If compressed, which summary
metadata?: Record<string, unknown>;
}Adding Entries
// User message
await memory.addUserMessage(sessionId, 'Hello, how are you?');
// Assistant message
await memory.addAssistantMessage(sessionId, 'I am doing well!');
// Tool call
await memory.addToolCall(sessionId, 'shell', { command: 'ls -la' });
// Tool result
await memory.addToolResult(sessionId, 'shell', 'file1.txt\nfile2.txt');
// Context
await memory.addContext(sessionId, fileContent, 'src/main.ts');Compression
When Compression Happens
Compression triggers when:
- Token count exceeds threshold (default: 50,000)
- Entry count exceeds limit (default: 100)
- Manually requested
What Gets Compressed
Entries eligible for compression:
- Older than the recent window (last 10 entries)
- Not already compressed
- Minimum entries available (default: 5)
Compression Process
- Select entries - Old, uncompressed entries
- Calculate target - 30% of original token count
- Summarize - Use LLM to create summary
- Mark compressed - Link entries to summary
- Update totals - Recalculate token counts
Summary Structure
interface MemorySummary {
id: string;
content: string; // Summary text
originalEntryIds: string[]; // What was summarized
tokenCount: number; // Summary tokens
originalTokenCount: number; // Original tokens
compressionRatio: number; // e.g., 3.5x compression
createdAt: Date;
timeRange: {
start: Date;
end: Date;
};
}LLM Summarization
Default Prompt
The summarization prompt:
Summarize the following conversation, preserving:
1. Key topics discussed
2. Important decisions made
3. Tools used and their outcomes
4. Any errors or issues encountered
5. Context that would be needed to continue
Keep the summary concise but informative.
{content}Custom Summarization
You can provide your own summarization callback:
memory.setSummarizeCallback(async (prompt: string) => {
// Use your preferred model
const response = await myLLM.complete(prompt);
return response.text;
});Fallback Summarization
If no LLM callback is set, a heuristic fallback is used:
[Previous conversation summary]
5 user messages
First: "Help me debug this API..."
Last: "Thanks, that fixed it!"
Tools used: shell, read_file, write_file
2 errors encounteredConfiguration
Default Configuration
const DEFAULT_MEMORY_CONFIG = {
maxEntries: 100, // Trigger compression
maxTokens: 50000, // Token threshold
recentWindow: 10, // Keep recent entries
minEntriesToCompress: 5, // Minimum for compression
autoCompress: true, // Auto-trigger
compressionRatio: 0.3, // Target 30% of original
};Updating Configuration
memory.updateConfig({
maxTokens: 100000, // Larger budget
autoCompress: false, // Manual only
});Session Management
State Export
Save session state for persistence:
const state = memory.exportSession(sessionId);
// Store state.entries and state.summaries
State Import
Restore a previous session:
memory.importSession({
sessionId: 'restored-session',
entries: savedEntries,
summaries: savedSummaries,
totalTokens: 15000,
});Clear Session
Reset a session:
memory.clearSession(sessionId);Getting Context for LLM
Full Context
Get summaries + active entries:
const context = memory.getContextForLLM(sessionId);
// Returns:
// === Previous Conversation Summaries ===
// [2024-01-01 - 2024-01-02]
// User discussed API design, created routes...
//
// === Recent Conversation ===
// User: How do I add auth?
// Assistant: You can use JWT...
Active Entries Only
Get non-compressed entries:
const entries = memory.getActiveEntries(sessionId);Memory Stats
Monitor memory usage:
const stats = memory.getStats(sessionId);
// {
// totalEntries: 150,
// activeEntries: 12,
// compressedEntries: 138,
// summaries: 5,
// totalTokens: 25000,
// activeTokens: 8000,
// }
Events
The Memory Manager emits events:
memory.on('entry:added', (sessionId, entry) => {
console.log('New entry:', entry.type);
});
memory.on('compressed', (sessionId, result) => {
console.log('Compressed, saved', result.tokensSaved, 'tokens');
});
memory.on('session:cleared', (sessionId) => {
console.log('Session cleared:', sessionId);
});Best Practices
1. Monitor Token Usage
Keep an eye on token counts:
const stats = memory.getStats(sessionId);
console.log(`Using ${stats.activeTokens} tokens`);2. Let Auto-Compression Work
Don’t fight the compression:
- It preserves important information
- Summaries capture key points
- LLM understands summaries
3. Add Context Deliberately
Use addContext for important files:
// Good: add important context explicitly
await memory.addContext(sessionId, codeContent, 'critical-file.ts');4. Clear When Starting Fresh
For new topics, clear the session:
memory.clearSession(sessionId);
// Now the AI starts without previous context
5. Export for Long-Running Tasks
Save state for tasks that span multiple sessions:
// End of session
const state = memory.exportSession(sessionId);
saveToDatabase(state);
// Start of new session
const state = loadFromDatabase();
memory.importSession(state);Integration with Context Engine
Memory and Context work together:
Context Engine Memory Manager
│ │
│ Request context │
├─────────────────────►│
│ │
│ Return summaries + │
│◄───── active entries─┤
│ │
│ Include in │
│ assembled context │
│ │The Context Engine uses Memory summaries as part of the assembled context, ensuring long conversations maintain coherence while staying within token limits.