Context compaction strategies

Dec 4, 2025

Background: why is session management necessary?

context windows are limited
high costs
need to figure out how to know what content to throw out without losing context

Key takeaway:

overly aggressive compaction strategies lead to info loss
you can’t predict which info is going to be vaulable 10 steps further, so you need to find a way to preserve it in a compacted way

List of compaction strategies:

Keep the last N turns: only keeps most recent N turns of conversation and discards everything older → not recommended
- depends on which value is needed for the specific app you’re trying to build
- if the agent’s task is confined to a very particular task with expected outcomes, info that needs to be retained becomes extremely clear. in this case, semantic compaction would be more reasonable
Token-based Truncation: includes as many messages as possible within predefined token limit → doesn’t really help as well
Recursive Summarization: Older parts of conversation are replaced by AI generated summary → better cuz it doesn’t completely erase older histories
design around kv-cache hit rate (memory of past tokens model can reuse)
- kv-cache exists because generating new tokens is expensive
- agents have huge prefills (reading all previous context) and tiny decodes (producing the next token), so kv-cache helps skip recomputing all previous tokens
- kv-cache hit means it found the prefix to reuse the cache. but even one token difference can breake the cache
- common mistakes that cause cache breaks:
  - changing system prompts by adding timestamps
  - changing tool definitions: They usually sit at the front of the context. Changing them invalidates everything
  - non-deterministic serialization (ex. JSON objects with shuffled key order)
  - removing or inserting lines in the prefix. even whitespace or one token difference breaks it
- How to keep it high:
  - Keep your prompt stable and deterministic
  - Make contexts append-only (never rewrite earlier messages)
  - Avoid dynamic tool loading (use masking instead)
  - Use session IDs or prefix caching if self-hosting
dont dynamically add/remove tools mid-iteration cuz it breaks the kv-cache
- agents usually use hundreds of tools (browser*, shell*, search*, email*, etc)
- if you dynamically remove tools, it destroys the kv-cache
  - ex.
  - Step 1: model uses browser_open()
  - Step 3: you removed that tool definition
    → The past context still says “browser_open()”, but the tool is no longer defined now. → leads to hallucination
- mask the tools instead: keep all the tools in context, but just show the ones that the agent needs to see at that point
using the “file system” as context: content of web page can be dropped from context as long as the URL is preserved → this is designed to be restorable
manipulate attention by constantly rewrititing the todo list, and recite the objectives into the end of the context → so that it focuses on the most recent attention span
keep wrong stuff in: not erasing hallucinations or tool call failures, to keep evidence so that mdoel can adapt
don’t get few-shot too much stuff cuz it can backfire. if context is full of similar past action-observation pairs, the model will tend to follow that pattern even when it’s no longer optimal → increase diversity by controlled randomness

...more to come in this list.

‹ When does context rot happen?

Basics of Context Engineering: Session and Memory ›