06-consolidation-version-history-management

Dec 7, 2025

Version History Management

Improvement: #4 of 4 Parent Doc: Experiment Improvements Status: Design Phase

Problem Statement

When UPDATE operations occur, previous memory information is lost. This creates several issues:

  1. No Audit Trail: Cannot track what changed and when

  2. No Rollback: Cannot revert to previous versions if mistake occurs

  3. Lost Context: Historical information provides valuable context

  4. Compliance Risk: Some domains require change history (legal, compliance)

Example: Lost Information

// Initial memory
{
  id: "mem_001",
  content: "Q1 마케팅 예산 5000만원. 집행 기간: 1월~3월."
}

// After UPDATE
{
  id: "mem_001",
  content: "Q1 마케팅 예산 6000만원으로 증액. 추가 캠페인 가능."
}

// Lost:
// - Original budget (5000만원)
// - Execution period (1월~3월)
// - When change happened
// - Source of original information

Design Goals

Primary Goals

  1. History Preservation: Track all versions of a memory

  2. Source Traceability: Link each version to source (email, user input, etc.)

  3. Storage Efficiency: Minimize storage cost (don't duplicate full content)

  4. Recoverability: Able to reconstruct previous versions

  5. Auditability: Clear trail of what changed, when, why

Non-Goals

  • Full Content Retention: Not keeping full text of all versions (storage cost)

  • Automatic Conflict Resolution: Not implementing merge strategies (v1)

  • Branching: Not supporting multiple version branches (linear history only)

Core Concepts

Version vs Snapshot

Version: Lightweight reference to a point in time

  • Metadata about change

  • Source link (email thread, messageId)

  • Change summary (what changed)

  • Optional: diff or full content (recent versions only)

Snapshot: Full content at a specific version

  • Used for latest N versions only

  • Older versions pruned to save storage

Storage Strategy: Hybrid Approach

┌─────────────────────────────────────────────────────────────┐
Version Chain Strategy                     
├─────────────────────────────────────────────────────────────┤

v1 (Oldest)     v2           v3           v4 (Current)     
┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐     
Source  Source  Diff    Full    
Link    │→  Link    │→  Source  │→  Content 
Summary Summary Link    
└─────────┘   └─────────┘   └─────────┘   └─────────┘     

Ancient          Old        Recent      Current            
  (pruned)       (pruned)    (with diff)  (full)            

└─────────────────────────────────────────────────────────────┘

Storage: O(1) full content + O(n) metadata
Recovery: Recent instant, Old re-extract from source

Data Structures

MemoryVersion

interface MemoryVersion {
  // Version identity
  versionId: string;                 // Unique version ID
  versionNumber: number;             // Sequential number (1, 2, 3, ...)
  previousVersionId: string | null;  // Previous version ID (null for v1)

  // Content (conditional)
  content: string | null;            // Full content (null if pruned)
  contentHash: string;               // SHA256 hash (for verification)
  diff?: VersionDiff;                // Diff from previous (optional)

  // Source information
  sourceType: SourceType;
  sourceReference: SourceReference;  // Link to original source

  // Change metadata
  changeType: ChangeType;
  changedFields: string[];           // Which fields changed
  changeSummary: string;             // LLM-generated summary

  // Timestamps
  createdAt: Date;
  occurredAt?: Date;                 // When original event happened

  // Metadata
  importance: number;
  category: string;
}

enum ChangeType {
  CREATION = 'creation',             // Initial version
  UPDATE = 'update',                 // Property changed
  ENRICHMENT = 'enrichment',         // Added info (no changes)
  CORRECTION = 'correction',         // Fixed error
  MERGE = 'merge',                   // Merged from multiple sources
}

interface SourceReference {
  // Email source
  threadId?: string;
  messageId?: string;
  extractedAt?: Date;

  // User input source
  userId?: string;
  inputMethod?: 'chat' | 'api' | 'import';

  // Tool source
  toolName?: string;
  toolRunId?: string;

  // Original URL/link
  sourceUrl?: string;
}

interface VersionDiff {
  // JSON patch format (RFC 6902)
  operations: DiffOperation[];

  // Human-readable summary
  summary: string;
}

interface DiffOperation {
  op: 'add' | 'remove' | 'replace' | 'move' | 'copy' | 'test';
  path: string;                      // JSON path (e.g., "/content")
  value?: any;
  from?: string;                     // For move/copy
}

VersionedMemoryNode

interface VersionedMemoryNode extends MemoryNode {
  // Current version
  currentVersionNumber: number;
  currentVersionId: string;

  // Version chain
  versions: MemoryVersion[];         // All versions (sorted by versionNumber)

  // Version policy
  versionPolicy: VersionPolicy;

  // Quick access
  firstVersion: MemoryVersion;       // Creation version
  latestVersion: MemoryVersion;      // Current version
}

interface VersionPolicy {
  // Retention
  maxVersions?: number;              // Max versions to keep (null = unlimited)
  keepFullContentCount: number;      // How many versions keep full content
  pruneAfterDays?: number;           // Auto-prune versions older than N days

  // Content strategy
  contentStrategy: 'full' | 'diff' | 'source_only';

  // Source preservation
  alwaysKeepSourceLinks: boolean;    // Never delete source references
  keepChangeSummaries: boolean;      // Keep LLM-generated summaries
}

// Default policy
const DEFAULT_VERSION_POLICY: VersionPolicy = {
  maxVersions: null,                 // Unlimited versions
  keepFullContentCount: 2,           // Keep latest 2 full contents
  pruneAfterDays: 365,               // Prune after 1 year
  contentStrategy: 'diff',           // Use diffs for middle versions
  alwaysKeepSourceLinks: true,       // Always preserve sources
  keepChangeSummaries: true,         // Keep summaries
};

Version Manager Implementation

Core API

class VersionManager {
  constructor(
    private storage: VersionStorage,
    private sourceFetcher: SourceFetcher
  ) {}

  /**
   * Create a new version when UPDATE occurs
   */
  async createVersion(
    memoryId: string,
    newContent: string,
    sourceReference: SourceReference,
    changeType: ChangeType,
    changedFields: string[]
  ): Promise<MemoryVersion> {
    const memory = await this.storage.getMemory(memoryId);
    const previousVersion = memory.latestVersion;

    // Generate change summary (LLM)
    const changeSummary = await this.generateChangeSummary(
      previousVersion.content || previousVersion.changeSummary,
      newContent
    );

    // Calculate diff
    const diff = this.calculateDiff(
      previousVersion.content || '',
      newContent
    );

    // Create new version
    const newVersion: MemoryVersion = {
      versionId: generateId(),
      versionNumber: memory.currentVersionNumber + 1,
      previousVersionId: previousVersion.versionId,
      content: newContent,
      contentHash: sha256(newContent),
      diff,
      sourceType: sourceReference.threadId ? 'email_extraction' : 'user_input',
      sourceReference,
      changeType,
      changedFields,
      changeSummary,
      createdAt: new Date(),
      occurredAt: new Date(),
      importance: memory.importance,
      category: memory.category,
    };

    // Add to version chain
    memory.versions.push(newVersion);
    memory.currentVersionNumber++;
    memory.currentVersionId = newVersion.versionId;
    memory.latestVersion = newVersion;

    // Apply pruning policy
    await this.applyPruningPolicy(memory);

    // Save
    await this.storage.updateMemory(memory);

    return newVersion;
  }

  /**
   * Apply pruning policy to old versions
   */
  private async applyPruningPolicy(memory: VersionedMemoryNode): Promise<void> {
    const policy = memory.versionPolicy;
    const versions = memory.versions;

    // Sort by version number (oldest first)
    const sortedVersions = [...versions].sort((a, b) => a.versionNumber - b.versionNumber);

    // Determine which versions to keep full content
    const keepFullCount = policy.keepFullContentCount;
    const latestVersions = sortedVersions.slice(-keepFullCount);

    for (const version of sortedVersions) {
      const isRecent = latestVersions.includes(version);

      if (!isRecent) {
        // Prune content, keep metadata and source link
        if (version.content !== null) {
          console.log(`Pruning content for version ${version.versionId}`);
          version.content = null;  // Delete full content
          // Keep: sourceReference, changeSummary, contentHash, diff
        }
      }
    }

    // Delete very old versions (if maxVersions set)
    if (policy.maxVersions) {
      const excessCount = versions.length - policy.maxVersions;
      if (excessCount > 0) {
        // Remove oldest versions
        const toDelete = sortedVersions.slice(0, excessCount);
        for (const version of toDelete) {
          const index = memory.versions.findIndex(v => v.versionId === version.versionId);
          if (index >= 0) {
            memory.versions.splice(index, 1);
          }
        }
      }
    }

    // Delete versions older than pruneAfterDays
    if (policy.pruneAfterDays) {
      const cutoffDate = new Date();
      cutoffDate.setDate(cutoffDate.getDate() - policy.pruneAfterDays);

      memory.versions = memory.versions.filter(
        v => v.createdAt > cutoffDate
      );
    }
  }

  /**
   * Reconstruct a previous version
   */
  async reconstructVersion(
    memoryId: string,
    versionId: string
  ): Promise<string> {
    const memory = await this.storage.getMemory(memoryId);
    const version = memory.versions.find(v => v.versionId === versionId);

    if (!version) {
      throw new Error(`Version ${versionId} not found`);
    }

    // If content is available, return it
    if (version.content !== null) {
      return version.content;
    }

    // Content pruned - attempt reconstruction
    console.log(`Reconstructing version ${versionId} from source`);

    // Strategy 1: Apply diffs backwards from a version with content
    const contentVersion = this.findNearestVersionWithContent(memory, version.versionNumber);
    if (contentVersion && version.diff) {
      return this.applyDiffBackwards(contentVersion.content!, version.diff);
    }

    // Strategy 2: Re-extract from source
    if (version.sourceReference.messageId) {
      return await this.reconstructFromEmailSource(version.sourceReference);
    }

    // Strategy 3: Fallback to change summary
    return `[Content unavailable. Summary: ${version.changeSummary}]`;
  }

  /**
   * Reconstruct from email source
   */
  private async reconstructFromEmailSource(
    sourceRef: SourceReference
  ): Promise<string> {
    if (!sourceRef.threadId || !sourceRef.messageId) {
      throw new Error('Email source reference incomplete');
    }

    // Fetch original email
    const email = await this.sourceFetcher.fetchEmail(
      sourceRef.threadId,
      sourceRef.messageId
    );

    // Re-extract memory from email
    const extracted = await extractMemoryFromEmail(email);

    return extracted.content;
  }

  /**
   * Generate change summary using LLM
   */
  private async generateChangeSummary(
    oldContent: string,
    newContent: string
  ): Promise<string> {
    const prompt = `
이전 내용과 새 내용을 비교하여 변경사항을 한 문장으로 요약해주세요.

이전 내용:
"""
${oldContent}
"""

새 내용:
"""
${newContent}
"""

변경 요약 (한 문장):
`;

    const response = await llm.generate({
      prompt,
      model: 'gpt-4o-mini',
      temperature: 0.0,
      max_tokens: 100,
    });

    return response.content.trim();
  }

  /**
   * Calculate diff between two contents
   */
  private calculateDiff(
    oldContent: string,
    newContent: string
  ): VersionDiff {
    // Use JSON patch library (e.g., fast-json-patch)
    const oldObj = { content: oldContent };
    const newObj = { content: newContent };

    const operations = jsonpatch.compare(oldObj, newObj);

    // Generate human-readable summary
    const summary = operations.map(op => {
      if (op.op === 'replace') {
        return `Changed ${op.path}`;
      }
      return `${op.op} ${op.path}`;
    }).join(', ');

    return { operations, summary }

Storage Schema

Graph Storage (FalkorDB)

// Memory node with version metadata
CREATE (m:Memory {
  id: 'mem_001',
  currentVersionNumber: 3,
  currentVersionId: 'v3',
  // ... current content fields
})

// Version nodes
CREATE (v1:MemoryVersion {
  versionId: 'v1',
  versionNumber: 1,
  content: null,  // Pruned
  contentHash: 'abc123...',
  changeSummary: '초기 생성: Q1 예산 5000만원',
  sourceThreadId: 'thread_001',
  sourceMessageId: 'msg_001',
  createdAt: '2025-01-10T09:00:00Z'
})

CREATE (v2:MemoryVersion {
  versionId: 'v2',
  versionNumber: 2,
  content: null,  // Pruned
  contentHash: 'def456...',
  changeSummary: '예산 5000만원 → 6000만원으로 증액',
  sourceThreadId: 'thread_001',
  sourceMessageId: 'msg_002',
  createdAt: '2025-01-15T14:00:00Z'
})

CREATE (v3:MemoryVersion {
  versionId: 'v3',
  versionNumber: 3,
  content: 'Q1 마케팅 예산 7000만원. 최종 승인.',  // Full content
  contentHash: 'ghi789...',
  changeSummary: '예산 6000만원 → 7000만원, 최종 승인',
  sourceThreadId: 'thread_001',
  sourceMessageId: 'msg_003',
  createdAt: '2025-01-20T16:30:00Z'
})

// Version chain relationships
CREATE (m)-[:HAS_VERSION]->(v1)
CREATE (m)-[:HAS_VERSION]->(v2)
CREATE (m)-[:HAS_VERSION]->(v3)
CREATE (v1)-[:NEXT_VERSION]->(v2)
CREATE (v2)-[:NEXT_VERSION]->(v3)
CREATE (m)-[:CURRENT_VERSION]->(v3)

Querying Version History

// Get all versions of a memory
MATCH (m:Memory {id: 'mem_001'})-[:HAS_VERSION]->(v:MemoryVersion)
RETURN v
ORDER BY v.versionNumber

// Get version chain (in order)
MATCH path = (m:Memory {id: 'mem_001'})-[:HAS_VERSION]->(first:MemoryVersion {versionNumber: 1})
             -[:NEXT_VERSION*]->(latest:MemoryVersion)
RETURN [node IN nodes(path) | node] AS versionChain

// Get source email for a version
MATCH (v:MemoryVersion {versionId: 'v2'})
RETURN v.sourceThreadId, v.sourceMessageId

User-Facing Features

Version History UI

interface VersionHistoryView {
  memoryId: string;
  currentVersion: MemoryVersion;
  versionTimeline: VersionTimelineEntry[];
  canRollback: boolean;
}

interface VersionTimelineEntry {
  versionNumber: number;
  changeSummary: string;
  timestamp: Date;
  source: string;              // "Email from marketing@...", "User input"
  hasFullContent: boolean;     // Can be viewed?
  canRestore: boolean;         // Can be restored?
}

// Example timeline
const timeline: VersionTimelineEntry[] = [
  {
    versionNumber: 1,
    changeSummary: "초기 생성: Q1 마케팅 예산 5000만원",
    timestamp: new Date('2025-01-10T09:00:00Z'),
    source: "Email from finance@company.com",
    hasFullContent: false,
    canRestore: true,
  },
  {
    versionNumber: 2,
    changeSummary: "예산 5000만원 → 6000만원으로 증액",
    timestamp: new Date('2025-01-15T14:00:00Z'),
    source: "Email from finance@company.com",
    hasFullContent: false,
    canRestore: true,
  },
  {
    versionNumber: 3,
    changeSummary: "예산 6000만원 → 7000만원, 최종 승인",
    timestamp: new Date('2025-01-20T16:30:00Z'),
    source: "Email from ceo@company.com",
    hasFullContent: true,
    canRestore: false,  // Current version
  },
];

Rollback Operation

async function rollbackToVersion(
  memoryId: string,
  targetVersionId: string,
  reason: string
): Promise<void> {
  const memory = await storage.getMemory(memoryId);
  const targetVersion = memory.versions.find(v => v.versionId === targetVersionId);

  if (!targetVersion) {
    throw new Error('Target version not found');
  }

  // Reconstruct content if pruned
  const content = await versionManager.reconstructVersion(memoryId, targetVersionId);

  // Create new version (rollback type)
  await versionManager.createVersion(
    memoryId,
    content,
    {
      userId: currentUser.id,
      inputMethod: 'rollback',
      sourceUrl: `version://${targetVersionId}`,
    },
    ChangeType.CORRECTION,
    ['content'],
  );

  // Log rollback event
  await auditLog.log({
    action

Testing Strategy

Unit Tests

describe('VersionManager', () => {
  it('should create new version on UPDATE', async () => {
    const memory = await createTestMemory();
    const newVersion = await versionManager.createVersion(
      memory.id,
      'Updated content',
      { threadId: 'thread_001', messageId: 'msg_002' },
      ChangeType.UPDATE,
      ['content']
    );

    expect(newVersion.versionNumber).toBe(2);
    expect(newVersion.content).toBe('Updated content');
    expect(memory.versions).toHaveLength(2);
  });

  it('should prune old version content', async () => {
    const memory = await createMemoryWithVersions(5);
    memory.versionPolicy.keepFullContentCount = 2;

    await versionManager.applyPruningPolicy(memory);

    // Latest 2 versions should have content
    const latest2 = memory.versions.slice(-2);
    expect(latest2.every(v => v.content !== null)).toBe(true);

    // Older versions should be pruned
    const older = memory.versions.slice(0, -2);
    expect(older.every(v => v.content === null)).toBe(true);

    // But should keep source links and summaries
    expect(older.every(v => v.sourceReference !== null)).toBe(true);
    expect(older.every(v => v.changeSummary !== '')).toBe(true);
  });

  it('should reconstruct version from source', async () => {
    const memory = await createTestMemory();
    const version = memory.versions[0];
    version.content = null;  // Simulate pruning

    const content = await versionManager.reconstructVersion(memory.id, version.versionId);

    expect(content).toContain('Q1 마케팅 예산');
  });
});

Integration Tests

describe('Version History Integration', () => {
  it('should preserve history across multiple updates', async () => {
    // Create initial memory
    const memory = await createMemory('Initial content');

    // Update 3 times
    await updateMemory(memory.id, 'Update 1');
    await updateMemory(memory.id, 'Update 2');
    await updateMemory(memory.id, 'Update 3');

    // Verify history
    const updated = await getMemory(memory.id);
    expect(updated.versions).toHaveLength(4);  // Initial + 3 updates
    expect(updated.currentVersionNumber).toBe(4);

    // Verify timeline
    const timeline = await getVersionTimeline(memory.id);
    expect(timeline).toHaveLength(4);
    expect(timeline[0].changeSummary).toContain('Initial');
    expect(timeline[3].changeSummary).toContain('Update 3');
  });

  it('should allow rollback to previous version', async () => {
    const memory = await createMemory('Original');
    await updateMemory(memory.id, 'Wrong update');

    // Rollback to original
    await rollbackToVersion(memory.id, memory.firstVersion.versionId, 'Mistake');

    // Verify content restored
    const restored = await getMemory(memory.id);
    expect(restored.content).toBe('Original');

    // But version history preserved
    expect(restored.versions).toHaveLength(3);  // Original + Wrong + Rollback
  });
});

Performance Considerations

Storage Cost

Assumptions:
- Average memory size: 500 bytes
- Average version count: 5 versions per memory
- keepFullContentCount: 2

Storage per memory:
- 2 full contents: 2 × 500 = 1000 bytes
- 3 pruned versions: 3 × 200 (metadata) = 600 bytes
- Total: 1600 bytes

Ratio: 1600 / 500 = 3.2x overhead

For 100k memories: 160 MB (acceptable)

Reconstruction Performance

// Benchmark results (estimated)
const RECONSTRUCTION_METRICS = {
  from_full_content: {
    latency: '< 10ms',
    cache_hit: true,
  },
  from_diff: {
    latency: '50-100ms',
    requires: 'nearest version with content',
  },
  from_source: {
    latency: '1-2 seconds',
    requires: 'email fetch + re-extraction',
  },
  fallback_summary: {
    latency: '< 5ms',
    quality: 'low (summary only)',
  },
};

Implementation Plan

Step 1: Types and Storage Schema (Day 7, Morning)

Tasks:

  • Define MemoryVersion type

  • Define VersionedMemoryNode type

  • Define VersionPolicy type

  • Create Cypher schema for version nodes

Deliverable: lib/types/memory-version.ts, schema migrations

Step 2: VersionManager Core (Day 7, Afternoon)

Tasks:

  • Implement VersionManager class

  • Implement createVersion()

  • Implement applyPruningPolicy()

  • Implement change summary generation

Deliverable: lib/storage/version-manager.ts

Step 3: Reconstruction Logic (Day 8, Morning)

Tasks:

  • Implement reconstructVersion()

  • Implement reconstructFromEmailSource()

  • Implement diff calculation and application

  • Add fallback strategies

Deliverable: Updated version-manager.ts with reconstruction

Step 4: Integration and Testing (Day 8, Afternoon)

Tasks:

  • Integrate with DecisionTreeEngine UPDATE logic

  • Create unit tests (15+ test cases)

  • Create integration tests (5+ scenarios)

  • Test pruning and reconstruction

Deliverable: Tests + integrated version management

Success Criteria

Phase 4 Success Metrics

  • Versions created for all UPDATE operations

  • Pruning works correctly (old content deleted, metadata preserved)

  • Reconstruction success rate > 95%

  • Storage overhead < 3x original content size

  • Reconstruction latency < 2 seconds (from source)

  • All tests passing

Production Readiness

  • Version history queryable via API

  • UI can display version timeline

  • Rollback functionality works

  • Audit logs complete

Related Documentation

  • Experiment Improvements - Parent overview

  • UPDATE vs LINK Distinction - Previous phase

  • Decision Tree Logic - Integration point

  • Memory Node Types - Base memory schema

Change Log

Date

Author

Change

2025-12-06

Claude

Initial version history management design