Concepts

Auto-Compaction

Auto-compaction is Claude Code’s built-in feature that compresses conversation history when the context window approaches its limit. Instead of crashing or truncating, it summarizes older messages so the session can continue.

How it works

Claude Code tracks how many tokens the current conversation has consumed. When usage crosses a configurable percentage of the context window, it pauses the session, summarizes the oldest messages, and replaces them with a shorter summary. The session then continues with freed-up space.

Provider-specific tuning

Different models have different context windows, so compaction thresholds need to match. claude-multi’s provider templates set two key variables:

CLAUDE_CODE_AUTO_COMPACT_WINDOW: the model’s actual context window size in tokens
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: the percentage threshold that triggers compaction

For example, the GLM template sets the window to 131072 (128K) and the trigger to 75%. Without these overrides, Claude Code assumes a 200K window (the default for Claude models), which means compaction never triggers early enough and the API call fails when the real 128K limit is exceeded.

What happens without it

If compaction doesn’t trigger at the right time, the model hits its token limit mid-request. The API returns an error and the conversation dies. That’s why getting these values right per provider matters.

Auto-Compaction

How it works

Provider-specific tuning

What happens without it

Related terms