Memory Management

Automatic conversation memory with intelligent compression.

Overview

The Memory Manager handles long conversations by:

Tracking all conversation entries
Detecting when compression is needed
Creating summaries of old content
Preserving important information
Managing token budgets

How It Works

Memory Flow

New Message
    ↓
Add to Entries
    ↓
Calculate Total Tokens
    ↓
    ├── Under threshold? → Continue
    │
    └── Over threshold? → Trigger Compression
                              ↓
                        Select Old Entries
                              ↓
                        Generate Summary (LLM)
                              ↓
                        Mark Entries Compressed
                              ↓
                        Store Summary

Entry Types

Type	Description	Role
`message`	User or assistant message	user/assistant
`tool_call`	AI tool invocation	assistant
`tool_result`	Tool execution output	system
`context`	Added context (files, etc.)	system
`summary`	Compressed history	system

Memory Entries

Entry Structure

interface MemoryEntry {
  id: string;
  type: MemoryEntryType;
  role: 'user' | 'assistant' | 'system';
  content: string;
  timestamp: Date;
  tokenCount: number;
  compressed: boolean;
  summaryId?: string;  // If compressed, which summary
  metadata?: Record<string, unknown>;
}

Adding Entries

// User message
await memory.addUserMessage(sessionId, 'Hello, how are you?');

// Assistant message
await memory.addAssistantMessage(sessionId, 'I am doing well!');

// Tool call
await memory.addToolCall(sessionId, 'shell', { command: 'ls -la' });

// Tool result
await memory.addToolResult(sessionId, 'shell', 'file1.txt\nfile2.txt');

// Context
await memory.addContext(sessionId, fileContent, 'src/main.ts');

Compression

When Compression Happens

Compression triggers when:

Token count exceeds threshold (default: 50,000)
Entry count exceeds limit (default: 100)
Manually requested

What Gets Compressed

Entries eligible for compression:

Older than the recent window (last 10 entries)
Not already compressed
Minimum entries available (default: 5)

Compression Process

Select entries - Old, uncompressed entries
Calculate target - 30% of original token count
Summarize - Use LLM to create summary
Mark compressed - Link entries to summary
Update totals - Recalculate token counts

Summary Structure

interface MemorySummary {
  id: string;
  content: string;           // Summary text
  originalEntryIds: string[]; // What was summarized
  tokenCount: number;        // Summary tokens
  originalTokenCount: number; // Original tokens
  compressionRatio: number;  // e.g., 3.5x compression
  createdAt: Date;
  timeRange: {
    start: Date;
    end: Date;
  };
}

LLM Summarization

Default Prompt

The summarization prompt:

Summarize the following conversation, preserving:
1. Key topics discussed
2. Important decisions made
3. Tools used and their outcomes
4. Any errors or issues encountered
5. Context that would be needed to continue

Keep the summary concise but informative.

{content}

Custom Summarization

You can provide your own summarization callback:

memory.setSummarizeCallback(async (prompt: string) => {
  // Use your preferred model
  const response = await myLLM.complete(prompt);
  return response.text;
});

Fallback Summarization

If no LLM callback is set, a heuristic fallback is used:

[Previous conversation summary]

5 user messages
First: "Help me debug this API..."
Last: "Thanks, that fixed it!"

Tools used: shell, read_file, write_file

2 errors encountered

Configuration

Default Configuration

const DEFAULT_MEMORY_CONFIG = {
  maxEntries: 100,           // Trigger compression
  maxTokens: 50000,          // Token threshold
  recentWindow: 10,          // Keep recent entries
  minEntriesToCompress: 5,   // Minimum for compression
  autoCompress: true,        // Auto-trigger
  compressionRatio: 0.3,     // Target 30% of original
};

Updating Configuration

memory.updateConfig({
  maxTokens: 100000,  // Larger budget
  autoCompress: false, // Manual only
});

Session Management

State Export

Save session state for persistence:

const state = memory.exportSession(sessionId);
// Store state.entries and state.summaries

State Import

Restore a previous session:

memory.importSession({
  sessionId: 'restored-session',
  entries: savedEntries,
  summaries: savedSummaries,
  totalTokens: 15000,
});

Clear Session

Reset a session:

memory.clearSession(sessionId);

Getting Context for LLM

Full Context

Get summaries + active entries:

const context = memory.getContextForLLM(sessionId);
// Returns:
// === Previous Conversation Summaries ===
// [2024-01-01 - 2024-01-02]
// User discussed API design, created routes...
//
// === Recent Conversation ===
// User: How do I add auth?
// Assistant: You can use JWT...

Active Entries Only

Get non-compressed entries:

const entries = memory.getActiveEntries(sessionId);

Memory Stats

Monitor memory usage:

const stats = memory.getStats(sessionId);
// {
//   totalEntries: 150,
//   activeEntries: 12,
//   compressedEntries: 138,
//   summaries: 5,
//   totalTokens: 25000,
//   activeTokens: 8000,
// }

Events

The Memory Manager emits events:

memory.on('entry:added', (sessionId, entry) => {
  console.log('New entry:', entry.type);
});

memory.on('compressed', (sessionId, result) => {
  console.log('Compressed, saved', result.tokensSaved, 'tokens');
});

memory.on('session:cleared', (sessionId) => {
  console.log('Session cleared:', sessionId);
});

Best Practices

1. Monitor Token Usage

Keep an eye on token counts:

const stats = memory.getStats(sessionId);
console.log(`Using ${stats.activeTokens} tokens`);

2. Let Auto-Compression Work

Don’t fight the compression:

It preserves important information
Summaries capture key points
LLM understands summaries

3. Add Context Deliberately

Use addContext for important files:

// Good: add important context explicitly
await memory.addContext(sessionId, codeContent, 'critical-file.ts');

4. Clear When Starting Fresh

For new topics, clear the session:

memory.clearSession(sessionId);
// Now the AI starts without previous context

5. Export for Long-Running Tasks

Save state for tasks that span multiple sessions:

// End of session
const state = memory.exportSession(sessionId);
saveToDatabase(state);

// Start of new session
const state = loadFromDatabase();
memory.importSession(state);

Integration with Context Engine

Memory and Context work together:

Context Engine         Memory Manager
      │                      │
      │  Request context     │
      ├─────────────────────►│
      │                      │
      │  Return summaries +  │
      │◄───── active entries─┤
      │                      │
      │  Include in          │
      │  assembled context   │
      │                      │

The Context Engine uses Memory summaries as part of the assembled context, ensuring long conversations maintain coherence while staying within token limits.

Context Engine Commands