06-consolidation-contextual-similarity

Dec 7, 2025

Contextual Similarity Analysis

Improvement: #2 of 4 Parent Doc: Experiment Improvements Status: Design Phase

Problem Statement

Current similarity calculation uses simple embedding cosine similarity or word overlap, failing to distinguish semantic context when the same keywords appear in different domains.

Failure Example: CREATE_010

Memory A (Company Growth):

회사 성장 전략 회의: 매출 증대 방안 논의.
목표: 전년 대비 30% 성장

Memory B (Employee Growth):

직원 성장 프로그램: 직무 교육, 멘토링, 리더십 과정.
대상: 직원. 신청 마감: 210

Current Problem:

  • Keyword "성장" (growth) appears in both

  • Simple embedding similarity may be HIGH (0.65+)

  • But contexts are completely different:

    • A: Business/Revenue domain

    • B: HR/Training domain

  • Expected: CREATE new memory (different context)

  • Risk: System may incorrectly mark as RELATED and create unwanted link

Root Cause

// Current approach (oversimplified)
const content_similarity = embedding_cosine(A, B);  // → 0.70

// Problem: Single number cannot capture:
// - Domain difference (business vs HR)
// - Intent difference (strategy meeting vs program announcement)
// - Entity type difference (company vs employees)

Design Goals

Primary Goals

  1. Context Awareness: Distinguish same keywords in different semantic contexts

  2. Nuance Detection: Capture subtle differences (update vs new initiative, cause vs effect, etc.)

  3. Explainability: Provide reasoning for similarity judgments

  4. Production Viable: Maintain acceptable latency and cost

Non-Goals

  • Perfect Accuracy: Not aiming for 100% (human-level judgment varies)

  • Real-time Learning: Not implementing online learning from feedback (v1)

  • Multi-hop Reasoning: Not inferring implicit relationships beyond direct comparison

Approach: Semantic Decomposition

Core Idea

Instead of comparing raw content embeddings, decompose each memory into structured semantic components and compare at multiple levels.

Raw Content Semantic Decomposition Multi-level Comparison Context-Aware Similarity

Semantic Decomposition Schema

interface SemanticDecomposition {
  // Core semantic elements
  core: {
    subject: string;            // Main entity/topic
    action: string;             // What's happening
    objects: string[];          // What's being acted upon
  };

  // Contextual metadata
  context: {
    domain: DomainType;         // Business area
    intent: IntentType;         // Purpose of communication
    temporalContext: string;    // Time reference (Q1, 2025, etc.)
    spatialContext?: string;    // Location if relevant
  };

  // Extracted entities
  entities: {
    people: string[];
    organizations: string[];
    projects: string[];
    concepts: string[];         // Abstract concepts
  };

  // Relationship indicators
  relationships: {
    isUpdate: boolean;          // Updates existing info?
    references: string[];       // Mentions other topics
    causality?: {               // Cause-effect relationship
      cause?: string;
      effect?: string;
    };
  };
}

enum DomainType {
  BUSINESS_STRATEGY = 'business_strategy',
  FINANCE = 'finance',
  HR = 'hr',
  MARKETING = 'marketing',
  ENGINEERING = 'engineering',
  OPERATIONS = 'operations',
  LEGAL = 'legal',
  GENERAL = 'general',
}

enum IntentType {
  INFORM = 'inform',            // Sharing information
  REQUEST = 'request',          // Asking for something
  DECISION = 'decision',        // Announcing decision
  DISCUSSION = 'discussion',    // Ongoing conversation
  REPORT = 'report',            // Status update
  ANNOUNCEMENT = 'announcement', // Broadcast
}

Example Decomposition

Memory A (Company Growth):

{
  "core": {
    "subject": "회사 성장 전략",
    "action": "회의 및 논의",
    "objects": ["매출 증대 방안"]
  },
  "context": {
    "domain": "business_strategy",
    "intent": "discussion",
    "temporalContext": "2025"
  },
  "entities": {
    "people": ["ceo@company.com", "executives@company.com"],
    "organizations": ["회사"],
    "projects": [],
    "concepts": ["성장", "매출", "전략"]
  },
  "relationships": {
    "isUpdate": false,
    "references": [],
    "causality": null
  }
}

Memory B (Employee Growth):

{
  "core": {
    "subject": "직원 성장 프로그램",
    "action": "런칭 및 모집",
    "objects": ["직무 교육", "멘토링", "리더십 과정"]
  },
  "context": {
    "domain": "hr",
    "intent": "announcement",
    "temporalContext": "신청 마감: 2월 10일"
  },
  "entities": {
    "people": ["hr@company.com", "all@company.com"],
    "organizations": ["HR"],
    "projects": ["직원 성장 프로그램"],
    "concepts": ["성장", "교육", "멘토링"]
  },
  "relationships": {
    "isUpdate": false,
    "references": [],
    "causality": null
  }
}

Multi-Level Similarity Calculation

Level 1: Domain Match

function calculateDomainSimilarity(
  domainA: DomainType,
  domainB: DomainType
): number {
  // Exact match
  if (domainA === domainB) return 1.0;

  // Related domains (configurable)
  const DOMAIN_RELATIONS: Record<string, Record<string, number>> = {
    'business_strategy': { 'finance': 0.7, 'marketing': 0.6 },
    'finance': { 'business_strategy': 0.7, 'operations': 0.5 },
    'hr': { 'operations': 0.4 },
    // etc.
  };

  return DOMAIN_RELATIONS[domainA]?.[domainB] || 0.0;
}

Example:

  • business_strategy vs hr: 0.0 (unrelated)

  • business_strategy vs finance: 0.7 (related)

Level 2: Core Subject Similarity

function calculateCoreSimilarity(
  coreA: SemanticDecomposition['core'],
  coreB: SemanticDecomposition['core']
): number {
  // Use embedding similarity on structured components
  const subjectSim = embeddingSimilarity(coreA.subject, coreB.subject);
  const actionSim = embeddingSimilarity(coreA.action, coreB.action);
  const objectsSim = jaccardSimilarity(coreA.objects, coreB.objects);

  // Weighted combination
  return subjectSim * 0.5 + actionSim * 0.25 + objectsSim * 0.25;
}

Example:

  • Subjects: "회사 성장 전략" vs "직원 성장 프로그램"

    • Embedding similarity: ~0.65 (shares "성장")

  • Actions: "회의 및 논의" vs "런칭 및 모집"

    • Embedding similarity: ~0.25 (different)

  • Objects: ["매출 증대"] vs ["직무 교육", "멘토링"]

    • Jaccard: 0.0 (no overlap)

Core Similarity: 0.65*0.5 + 0.25*0.25 + 0.0*0.25 = 0.39

Level 3: Entity Overlap

function calculateEntitySimilarity(
  entitiesA: SemanticDecomposition['entities'],
  entitiesB: SemanticDecomposition['entities']
): number {
  const peopleSim = jaccardSimilarity(entitiesA.people, entitiesB.people);
  const orgsSim = jaccardSimilarity(entitiesA.organizations, entitiesB.organizations);
  const projectsSim = jaccardSimilarity(entitiesA.projects, entitiesB.projects);

  // Concepts overlap less important (keywords can repeat)
  const conceptsSim = jaccardSimilarity(entitiesA.concepts, entitiesB.concepts) * 0.5;

  return (peopleSim + orgsSim + projectsSim + conceptsSim) / 3.5;
}

Level 4: Intent and Temporal Context

function calculateContextSimilarity(
  contextA: SemanticDecomposition['context'],
  contextB: SemanticDecomposition['context']
): number {
  // Intent match
  const intentMatch = contextA.intent === contextB.intent ? 1.0 : 0.3;

  // Temporal overlap (e.g., both mention Q1)
  const temporalSim = temporalOverlap(contextA.temporalContext, contextB.temporalContext);

  return intentMatch * 0.6 + temporalSim * 0.4;
}

Combined Contextual Similarity

interface ContextualSimilarityResult {
  overall_score: number;
  category: SimilarityCategory;

  // Detailed breakdown
  breakdown: {
    domain_match: number;
    core_similarity: number;
    entity_overlap: number;
    context_similarity: number;
    raw_embedding: number;       // For comparison
  };

  // Context judgment
  same_context: boolean;          // Same semantic context?
  context_distance: number;       // How different? (0=same, 1=totally different)

  // Explanation
  reasoning: string;
}

function calculateContextualSimilarity(
  decompositionA: SemanticDecomposition,
  decompositionB: SemanticDecomposition,
  rawEmbeddingA: number[],
  rawEmbeddingB: number[]
): ContextualSimilarityResult {
  // Calculate all levels
  const domainMatch = calculateDomainSimilarity(
    decompositionA.context.domain,
    decompositionB.context.domain
  );
  const coreSim = calculateCoreSimilarity(decompositionA.core, decompositionB.core);
  const entitySim = calculateEntitySimilarity(decompositionA.entities, decompositionB.entities);
  const contextSim = calculateContextSimilarity(decompositionA.context, decompositionB.context);
  const rawEmbeddingSim = cosineSimilarity(rawEmbeddingA, rawEmbeddingB);

  // Context-aware weighting
  // If domain is different, heavily penalize overall score
  const domainPenalty = domainMatch < 0.5 ? 0.5 : 1.0;

  // Combined score with domain penalty
  const overallScore = domainPenalty * (
    domainMatch * 0.25 +
    coreSim * 0.35 +
    entitySim * 0.20 +
    contextSim * 0.20
  );

  // Context judgment
  const sameContext = domainMatch > 0.8 && coreSim > 0.7;
  const contextDistance = 1 - (domainMatch * 0.6 + coreSim * 0.4);

  // Determine category
  let category: SimilarityCategory;
  if (overallScore >= 0.95) category = SimilarityCategory.DUPLICATE;
  else if (overallScore >= 0.80) category = SimilarityCategory.UPDATE;
  else if (overallScore >= 0.50) category = SimilarityCategory.RELATED;
  else category = SimilarityCategory.UNRELATED;

  // Generate reasoning
  const reasoning = generateReasoning(
    domainMatch, coreSim, entitySim, contextSim, rawEmbeddingSim, sameContext
  );

  return {
    overall_score: overallScore,
    category,
    breakdown: {
      domain_match: domainMatch,
      core_similarity: coreSim,
      entity_overlap: entitySim,
      context_similarity: contextSim,
      raw_embedding: rawEmbeddingSim,
    },
    same_context: sameContext,
    context_distance: contextDistance,
    reasoning,
  };
}

Example Result

Comparing Company Growth vs Employee Growth:

{
  "overall_score": 0.28,
  "category": "UNRELATED",
  "breakdown": {
    "domain_match": 0.0,
    "core_similarity": 0.39,
    "entity_overlap": 0.05,
    "context_similarity": 0.35,
    "raw_embedding": 0.68
  },
  "same_context": false,
  "context_distance": 0.76,
  "reasoning": "두 메모리는 '성장'이라는 키워드를 공유하지만 완전히 다른 맥락입니다. Memory A는 business_strategy 도메인의 회사 매출 성장에 관한 것이고, Memory B는 hr 도메인의 직원 교육 프로그램에 관한 것입니다. 도메인이 다르고 핵심 주제가 다르므로 별개의 메모리로 취급해야 합니다."
}

Key Insight: Raw embedding was 0.68 (RELATED), but contextual analysis correctly identified as UNRELATED (0.28).

LLM-Based Semantic Decomposition

Extraction Prompt

const DECOMPOSITION_PROMPT = `
당신은 이메일 내용을 분석하여 구조화된 semantic 정보를 추출하는 전문가입니다.

다음 메모리 내용을 분석하여 JSON 형식으로 분해해주세요:

메모리 내용:
"""
{content}
"""

출력 형식:
{
  "core": {
    "subject": "핵심 주제 (예: Q1 마케팅 캠페인)",
    "action": "진행되는 행동 (예: 예산 승인, 회의 개최)",
    "objects": ["행동의 대상들"]
  },
  "context": {
    "domain": "business_strategy|finance|hr|marketing|engineering|operations|legal|general 중 하나",
    "intent": "inform|request|decision|discussion|report|announcement 중 하나",
    "temporalContext": "시간 관련 언급 (예: Q1, 2025년, 2월 10일 마감)"
  },
  "entities": {
    "people": ["이메일 주소들"],
    "organizations": ["조직명, 팀명"],
    "projects": ["프로젝트명"],
    "concepts": ["핵심 개념 키워드들"]
  },
  "relationships": {
    "isUpdate": true/false,
    "references": ["언급된 다른 주제들"],
    "causality": {
      "cause": "원인",
      "effect": "결과"
    }
  }
}

분석 시 주의사항:
1. domain을 신중하게 선택하세요 (같은 키워드도 다른 domain일 수 있음)
2. subject는 구체적으로 명시하세요
3. concepts는 핵심 키워드만 포함하세요 (너무 많지 않게)

JSON만 출력하세요:
`;

async function extractSemanticDecomposition(
  content: string,
  llmClient: LLMClient
): Promise<SemanticDecomposition> {
  const prompt = DECOMPOSITION_PROMPT.replace('{content}', content);

  const response = await llmClient.generate({
    prompt,
    model: 'gpt-4o-mini',  // Fast, cheap model
    temperature: 0.0,       // Deterministic
    response_format: { type

Alternative: Direct LLM Judgment

For cases where latency is acceptable, use LLM to directly judge context similarity with few-shot examples.

Contrastive Few-Shot Prompt

interface ContrastiveExample {
  memoryA: string;
  memoryB: string;
  judgment: 'SAME_CONTEXT' | 'RELATED_CONTEXT' | 'DIFFERENT_CONTEXT';
  reasoning: string;
}

const FEW_SHOT_EXAMPLES: ContrastiveExample[] = [
  {
    memoryA: "Q1 마케팅 캠페인 예산 5000만원 승인",
    memoryB: "Q1 마케팅 캠페인 예산 6000만원으로 증액",
    judgment: 'SAME_CONTEXT',
    reasoning: "같은 Q1 마케팅 캠페인의 예산 정보. 금액만 변경됨.",
  },
  {
    memoryA: "Q1 OKR: 사용자 증가 20% 목표",
    memoryB: "Q2 OKR: 사용자 증가 15% 목표 설정",
    judgment: 'RELATED_CONTEXT',
    reasoning: "다른 분기의 OKR이지만 연속성 있는 관련 정보",
  },
  {
    memoryA: "회사 성장 전략 회의: 매출 증대 방안",
    memoryB: "직원 성장 프로그램: 교육 및 멘토링",
    judgment: 'DIFFERENT_CONTEXT',
    reasoning: "'성장' 키워드는 같지만 완전히 다른 맥락. 회사 매출 vs 직원 개발",
  },
  // More examples...
];

const JUDGMENT_PROMPT = `
당신은 두 메모리가 같은 맥락(context)인지 판단하는 전문가입니다.

예시들:
${FEW_SHOT_EXAMPLES.map(ex => `
Memory A: "${ex.memoryA}"
Memory B: "${ex.memoryB}"
판단: ${ex.judgment}
이유: ${ex.reasoning}
`).join('\n')}

이제 다음을 판단해주세요:

Memory A: "${memoryA}"
Memory B: "${memoryB}"

JSON 형식으로 답변:
{
  "judgment": "SAME_CONTEXT|RELATED_CONTEXT|DIFFERENT_CONTEXT",
  "confidence": 0.0~1.0,
  "reasoning": "판단 근거 (한국어)",
  "key_factors": ["판단에 중요한 요소들"]
}
`;

async function judgeContextSimilarity(
  memoryA: string,
  memoryB: string,
  llmClient: LLMClient
): Promise<ContextJudgment> {
  const response = await llmClient.generate({
    prompt: JUDGMENT_PROMPT.replace('{memoryA}', memoryA).replace('{memoryB}

Hybrid Approach (Recommended)

Combine both approaches for optimal cost/latency/accuracy:

async function calculateSimilarity(
  memoryA: MemoryNode,
  memoryB: MemoryNode
): Promise<ContextualSimilarityResult> {
  // Step 1: Fast embedding similarity
  const rawSim = cosineSimilarity(memoryA.embedding, memoryB.embedding);

  // Step 2: Clear cases - skip LLM
  if (rawSim >= 0.98) {
    // Almost identical - SKIP LLM
    return { overall_score: rawSim, category: 'DUPLICATE', ... };
  }
  if (rawSim < 0.30) {
    // Clearly unrelated - SKIP LLM
    return { overall_score: rawSim, category: 'UNRELATED', ... };
  }

  // Step 3: Boundary cases (0.30-0.98) - use contextual analysis
  // Option A: Semantic decomposition + multi-level calc (faster, cheaper)
  const decompositionA = await extractSemanticDecomposition(memoryA.content, llm);
  const decompositionB = await extractSemanticDecomposition(memoryB.content, llm);
  const contextualSim = calculateContextualSimilarity(
    decompositionA, decompositionB, memoryA.embedding, memoryB.embedding
  );

  // Option B: Direct LLM judgment (slower, more accurate for ambiguous cases)
  // Use for very ambiguous cases (rawSim 0.45-0.85)
  if (rawSim > 0.45 && rawSim < 0.85) {

Performance Optimization

Caching Strategy

interface DecompositionCache {
  memoryId: string;
  decomposition: SemanticDecomposition;
  cachedAt: Date;
  ttl: number;  // Time to live (seconds)
}

class SemanticCache {
  private cache: Map<string, DecompositionCache> = new Map();

  async getOrCompute(
    memoryId: string,
    content: string,
    llm: LLMClient
  ): Promise<SemanticDecomposition> {
    const cached = this.cache.get(memoryId);
    if (cached && !this.isExpired(cached)) {
      return cached.decomposition;
    }

    // Compute and cache
    const decomposition = await extractSemanticDecomposition(content, llm);
    this.cache.set(memoryId, {
      memoryId,
      decomposition,
      cachedAt

Batch Processing

async function batchDecompose(
  memories: MemoryNode[],
  llm: LLMClient
): Promise<Map<string, SemanticDecomposition>> {
  // Batch multiple memories in single LLM call
  const batchPrompt = `
다음 메모리들을 각각 분석해주세요:

${memories.map((m, i) => `
[Memory ${i}]
${m.content}
`).join('\n')}

JSON 배열로 출력:
[
  { "index": 0, "decomposition": {...} },
  { "index": 1, "decomposition": {...} },
  ...
]
  `;

  const response = await llm.generate({ prompt: batchPrompt, ... });
  const results = JSON.parse(response.content);

  return new Map(
    results.map((r: any) => [memories[r.index].id, r.decomposition])
  );
}

Validation Experiments

Experiment 2A: Context Distinction Accuracy

Goal: Validate that contextual similarity outperforms raw embedding on context-sensitive cases

Test Cases (20 new cases):

  • 10 "same keyword, different domain" (like CREATE_010)

  • 10 "different keywords, same context" (opposite case)

Metrics:

  • Context distinction accuracy (correct SAME vs DIFFERENT judgment)

  • False positive rate (marking different contexts as same)

  • False negative rate (marking same context as different)

Baseline: Raw embedding similarity Test: Contextual similarity (decomposition-based)

Success Criteria:

  • Contextual accuracy > 90%

  • False positive rate < 10%

  • Outperforms raw embedding by > 20% on context-sensitive cases

Experiment 2B: Latency and Cost

Goal: Measure production viability

Test: Process 100 memory comparisons

Metrics:

  • Average latency per comparison

  • LLM API cost per comparison

  • Cache hit rate (after warm-up)

Targets:

  • Latency: < 2 seconds per comparison (with cache)

  • Cost: < $0.001 per comparison

  • Cache hit rate: > 80% (after warm-up)

Implementation Plan

Step 1: Core Types (Day 3, Morning)

Tasks:

  • Define SemanticDecomposition type

  • Define ContextualSimilarityResult type

  • Define ContextJudgment type

Deliverable: lib/types/semantic-decomposition.ts

Step 2: LLM Extraction (Day 3, Afternoon)

Tasks:

  • Implement decomposition prompt

  • Implement extractSemanticDecomposition()

  • Implement caching layer

Deliverable: lib/consolidation/semantic-decomposition.ts

Step 3: Multi-Level Similarity (Day 4, Morning)

Tasks:

  • Implement domain, core, entity, context similarity functions

  • Implement combined contextual similarity calculation

  • Implement reasoning generation

Deliverable: lib/consolidation/contextual-similarity.ts

Step 4: Integration and Testing (Day 4, Afternoon)

Tasks:

  • Integrate with existing decision tree

  • Create 20 context-sensitive test cases

  • Run Experiment 2A

  • Analyze results and tune weights

Deliverable: Experiment run + analysis report

Success Criteria

Phase 2 Success Metrics

  • Context distinction accuracy > 90%

  • False positive rate < 10%

  • Latency < 2 seconds per comparison

  • Cost < $0.001 per comparison

  • Outperforms raw embedding by > 20% on context-sensitive cases

Go/No-Go for Phase 3

Proceed if:

  • Context accuracy > 85%

  • Latency < 3 seconds (acceptable)

  • False positive rate < 15%

Block/Revise if:

  • Context accuracy < 80%

  • Latency > 5 seconds (unacceptable)

  • False positive rate > 20%

Related Documentation

  • Experiment Improvements - Parent overview

  • Realistic Dataset Design - Previous phase

  • UPDATE vs LINK Distinction - Next phase

  • Similarity Types - Current similarity system

Change Log

Date

Author

Change

2025-12-06

Claude

Initial contextual similarity design