Memory Management

Memory Management

Automatic conversation memory with intelligent compression.

Overview

The Memory Manager handles long conversations by:

  • Tracking all conversation entries
  • Detecting when compression is needed
  • Creating summaries of old content
  • Preserving important information
  • Managing token budgets

How It Works

Memory Flow

New Message
    ↓
Add to Entries
    ↓
Calculate Total Tokens
    ↓
    ├── Under threshold? → Continue
    │
    └── Over threshold? → Trigger Compression
                              ↓
                        Select Old Entries
                              ↓
                        Generate Summary (LLM)
                              ↓
                        Mark Entries Compressed
                              ↓
                        Store Summary

Entry Types

TypeDescriptionRole
messageUser or assistant messageuser/assistant
tool_callAI tool invocationassistant
tool_resultTool execution outputsystem
contextAdded context (files, etc.)system
summaryCompressed historysystem

Memory Entries

Entry Structure

interface MemoryEntry {
  id: string;
  type: MemoryEntryType;
  role: 'user' | 'assistant' | 'system';
  content: string;
  timestamp: Date;
  tokenCount: number;
  compressed: boolean;
  summaryId?: string;  // If compressed, which summary
  metadata?: Record<string, unknown>;
}

Adding Entries

// User message
await memory.addUserMessage(sessionId, 'Hello, how are you?');

// Assistant message
await memory.addAssistantMessage(sessionId, 'I am doing well!');

// Tool call
await memory.addToolCall(sessionId, 'shell', { command: 'ls -la' });

// Tool result
await memory.addToolResult(sessionId, 'shell', 'file1.txt\nfile2.txt');

// Context
await memory.addContext(sessionId, fileContent, 'src/main.ts');

Compression

When Compression Happens

Compression triggers when:

  1. Token count exceeds threshold (default: 50,000)
  2. Entry count exceeds limit (default: 100)
  3. Manually requested

What Gets Compressed

Entries eligible for compression:

  • Older than the recent window (last 10 entries)
  • Not already compressed
  • Minimum entries available (default: 5)

Compression Process

  1. Select entries - Old, uncompressed entries
  2. Calculate target - 30% of original token count
  3. Summarize - Use LLM to create summary
  4. Mark compressed - Link entries to summary
  5. Update totals - Recalculate token counts

Summary Structure

interface MemorySummary {
  id: string;
  content: string;           // Summary text
  originalEntryIds: string[]; // What was summarized
  tokenCount: number;        // Summary tokens
  originalTokenCount: number; // Original tokens
  compressionRatio: number;  // e.g., 3.5x compression
  createdAt: Date;
  timeRange: {
    start: Date;
    end: Date;
  };
}

LLM Summarization

Default Prompt

The summarization prompt:

Summarize the following conversation, preserving:
1. Key topics discussed
2. Important decisions made
3. Tools used and their outcomes
4. Any errors or issues encountered
5. Context that would be needed to continue

Keep the summary concise but informative.

{content}

Custom Summarization

You can provide your own summarization callback:

memory.setSummarizeCallback(async (prompt: string) => {
  // Use your preferred model
  const response = await myLLM.complete(prompt);
  return response.text;
});

Fallback Summarization

If no LLM callback is set, a heuristic fallback is used:

[Previous conversation summary]

5 user messages
First: "Help me debug this API..."
Last: "Thanks, that fixed it!"

Tools used: shell, read_file, write_file

2 errors encountered

Configuration

Default Configuration

const DEFAULT_MEMORY_CONFIG = {
  maxEntries: 100,           // Trigger compression
  maxTokens: 50000,          // Token threshold
  recentWindow: 10,          // Keep recent entries
  minEntriesToCompress: 5,   // Minimum for compression
  autoCompress: true,        // Auto-trigger
  compressionRatio: 0.3,     // Target 30% of original
};

Updating Configuration

memory.updateConfig({
  maxTokens: 100000,  // Larger budget
  autoCompress: false, // Manual only
});

Session Management

State Export

Save session state for persistence:

const state = memory.exportSession(sessionId);
// Store state.entries and state.summaries

State Import

Restore a previous session:

memory.importSession({
  sessionId: 'restored-session',
  entries: savedEntries,
  summaries: savedSummaries,
  totalTokens: 15000,
});

Clear Session

Reset a session:

memory.clearSession(sessionId);

Getting Context for LLM

Full Context

Get summaries + active entries:

const context = memory.getContextForLLM(sessionId);
// Returns:
// === Previous Conversation Summaries ===
// [2024-01-01 - 2024-01-02]
// User discussed API design, created routes...
//
// === Recent Conversation ===
// User: How do I add auth?
// Assistant: You can use JWT...

Active Entries Only

Get non-compressed entries:

const entries = memory.getActiveEntries(sessionId);

Memory Stats

Monitor memory usage:

const stats = memory.getStats(sessionId);
// {
//   totalEntries: 150,
//   activeEntries: 12,
//   compressedEntries: 138,
//   summaries: 5,
//   totalTokens: 25000,
//   activeTokens: 8000,
// }

Events

The Memory Manager emits events:

memory.on('entry:added', (sessionId, entry) => {
  console.log('New entry:', entry.type);
});

memory.on('compressed', (sessionId, result) => {
  console.log('Compressed, saved', result.tokensSaved, 'tokens');
});

memory.on('session:cleared', (sessionId) => {
  console.log('Session cleared:', sessionId);
});

Best Practices

1. Monitor Token Usage

Keep an eye on token counts:

const stats = memory.getStats(sessionId);
console.log(`Using ${stats.activeTokens} tokens`);

2. Let Auto-Compression Work

Don’t fight the compression:

  • It preserves important information
  • Summaries capture key points
  • LLM understands summaries

3. Add Context Deliberately

Use addContext for important files:

// Good: add important context explicitly
await memory.addContext(sessionId, codeContent, 'critical-file.ts');

4. Clear When Starting Fresh

For new topics, clear the session:

memory.clearSession(sessionId);
// Now the AI starts without previous context

5. Export for Long-Running Tasks

Save state for tasks that span multiple sessions:

// End of session
const state = memory.exportSession(sessionId);
saveToDatabase(state);

// Start of new session
const state = loadFromDatabase();
memory.importSession(state);

Integration with Context Engine

Memory and Context work together:

Context Engine         Memory Manager
      │                      │
      │  Request context     │
      ├─────────────────────►│
      │                      │
      │  Return summaries +  │
      │◄───── active entries─┤
      │                      │
      │  Include in          │
      │  assembled context   │
      │                      │

The Context Engine uses Memory summaries as part of the assembled context, ensuring long conversations maintain coherence while staying within token limits.