Read the Context Breakdown Before Blaming MCP

When an AI coding agent starts drifting mid-session, forgetting a constraint, repeating a solved question, giving thinner answers than it did an hour ago, the temptation is to reach for the explanation you already carry. Mine was MCP. I had servers connected, I'd read that connected servers eat the context window, and the conclusion felt obvious. I almost started disconnecting before I measured. Measuring is the whole point of this post, because the breakdown didn't match the story in my head.

Guessing names the suspect you already had

The trap with this kind of slow degradation is that it has no error to follow, so you fill the gap with a prior. I "knew" MCP was heavy, so MCP became the answer before I'd looked at a single number. That's the move worth catching: the explanation that arrives instantly is usually the one you walked in with, not the one the evidence supports.

The agent I use can print a breakdown of what's filling the context window by category. So instead of cutting, I read it. I'll describe it in proportions, since the raw numbers depend on the model and window and the shape is what carries across setups.

What the breakdown actually showed

In a session that had started drifting, the proportions came out roughly like this:

Conversation history, the accumulated back-and-forth, was the largest slice by a wide margin, about a fifth of the whole window on its own.
Fixed startup overhead, the system prompt and tool framework and memory, was a meaningful but stable, one-time chunk.
Connected MCP tool definitions were a small slice, smaller than the margin I'd been worried about.

The thing I was about to blame sat near the bottom. The thing I'd never thought to check, the plain weight of a long conversation, sat at the top.

Why "MCP is heavy" can be true and still not be your problem

I want to be careful, because "MCP is free" is the wrong lesson and isn't what I found.

MCP can be heavy. A server can load its full tool schema and carry it every turn, and a client that front-loads all of that can lose a real chunk of the window before you type. That warning is legitimate, people have measured it, and disconnecting unused servers is sound advice in that setup.

The narrower point is that it depends on how your client loads tools. Some defer the schema and only pull a tool's definition in when it's needed, and there an idle server costs much less than the worst case. "MCP is generally expensive" and "MCP wasn't what filled my window" aren't in conflict; they describe different loading behavior. So the lesson isn't that MCP is innocent. It's that the line item you're sure about and the line item that's actually large are often different, and only the breakdown tells you which is which.

The fix follows the measurement

Once history is the heavy part, the fixes are unremarkable. I stop letting one exploratory session run forever, and start fresh when a thread of work is done rather than carrying its whole transcript into an unrelated task. When I need continuity, I have the agent summarize where things stand and carry the summary into a new session instead of dragging the full history across. Move the gist, not the back-and-forth, because the back-and-forth is the weight I measured.

The model that stuck: the context window is a desk, not a filing cabinet. Everything the model uses at once has to fit on the surface, and a long conversation slowly buries it until there's no room to work. Sometimes clearing the desk beats buying a bigger one.

The transferable part isn't about MCP

Had I followed the instinct, I'd have freed a small slice, watched the drift continue, and learned nothing, because the fix would have missed the cause. So what I keep isn't "history is always the culprit," since on your setup it might genuinely be the servers or the memory or something neither of us is picturing. What I keep is the order: when the agent drifts, read the breakdown before you cut. The suspect you're certain of and the cause that's actually there usually aren't the same, and looking is the only way to tell them apart.

I build WordPress plugins and write about AI tooling and security at https://raplsworks.com/.

Before you disconnect MCP servers, read the context breakdown

Guessing names the suspect you already had

What the breakdown actually showed

Why "MCP is heavy" can be true and still not be your problem

The fix follows the measurement

The transferable part isn't about MCP

Comments

More from this blog

The reader of your error message was never in the room

Ship the rule before it's airtight, and let the comments find the leak

Prompt injection isn't the whole story: secure what the model hands back

A WordPress chatbot that answers from your own content: the design decisions behind it

Command Palette

Guessing names the suspect you already had

What the breakdown actually showed

Why "MCP is heavy" can be true and still not be your problem

The fix follows the measurement

The transferable part isn't about MCP

Comments

More from this blog