Skip to main content

Command Palette

Search for a command to run...

Before you disconnect MCP servers, read the context breakdown

My coding agent drifted mid-session and I was sure connected tools were the cause. Measuring first changed which line item I cut, and taught me to stop guessing at the culprit.

Updated
4 min read
Before you disconnect MCP servers, read the context breakdown
R
Freelance web developer from the Kansai region of Japan. I build WordPress plugins (Rapls AI Chatbot, Thanks Mail for Stripe, Rapls PDF Image Creator) and write about AI coding tools, Claude Code, and the security side of shipping with LLMs. I work in public and write down what breaks. Blog: https://raplsworks.com

When an AI coding agent starts drifting mid-session, forgetting a constraint, repeating a solved question, giving thinner answers than it did an hour ago, the temptation is to reach for the explanation you already carry. Mine was MCP. I had servers connected, I'd read that connected servers eat the context window, and the conclusion felt obvious. I almost started disconnecting before I measured. Measuring is the whole point of this post, because the breakdown didn't match the story in my head.

Guessing names the suspect you already had

The trap with this kind of slow degradation is that it has no error to follow, so you fill the gap with a prior. I "knew" MCP was heavy, so MCP became the answer before I'd looked at a single number. That's the move worth catching: the explanation that arrives instantly is usually the one you walked in with, not the one the evidence supports.

The agent I use can print a breakdown of what's filling the context window by category. So instead of cutting, I read it. I'll describe it in proportions, since the raw numbers depend on the model and window and the shape is what carries across setups.

What the breakdown actually showed

In a session that had started drifting, the proportions came out roughly like this:

  • Conversation history, the accumulated back-and-forth, was the largest slice by a wide margin, about a fifth of the whole window on its own.

  • Fixed startup overhead, the system prompt and tool framework and memory, was a meaningful but stable, one-time chunk.

  • Connected MCP tool definitions were a small slice, smaller than the margin I'd been worried about.

The thing I was about to blame sat near the bottom. The thing I'd never thought to check, the plain weight of a long conversation, sat at the top.

Why "MCP is heavy" can be true and still not be your problem

I want to be careful, because "MCP is free" is the wrong lesson and isn't what I found.

MCP can be heavy. A server can load its full tool schema and carry it every turn, and a client that front-loads all of that can lose a real chunk of the window before you type. That warning is legitimate, people have measured it, and disconnecting unused servers is sound advice in that setup.

The narrower point is that it depends on how your client loads tools. Some defer the schema and only pull a tool's definition in when it's needed, and there an idle server costs much less than the worst case. "MCP is generally expensive" and "MCP wasn't what filled my window" aren't in conflict; they describe different loading behavior. So the lesson isn't that MCP is innocent. It's that the line item you're sure about and the line item that's actually large are often different, and only the breakdown tells you which is which.

The fix follows the measurement

Once history is the heavy part, the fixes are unremarkable. I stop letting one exploratory session run forever, and start fresh when a thread of work is done rather than carrying its whole transcript into an unrelated task. When I need continuity, I have the agent summarize where things stand and carry the summary into a new session instead of dragging the full history across. Move the gist, not the back-and-forth, because the back-and-forth is the weight I measured.

The model that stuck: the context window is a desk, not a filing cabinet. Everything the model uses at once has to fit on the surface, and a long conversation slowly buries it until there's no room to work. Sometimes clearing the desk beats buying a bigger one.

The transferable part isn't about MCP

Had I followed the instinct, I'd have freed a small slice, watched the drift continue, and learned nothing, because the fix would have missed the cause. So what I keep isn't "history is always the culprit," since on your setup it might genuinely be the servers or the memory or something neither of us is picturing. What I keep is the order: when the agent drifts, read the breakdown before you cut. The suspect you're certain of and the cause that's actually there usually aren't the same, and looking is the only way to tell them apart.

I build WordPress plugins and write about AI tooling and security at https://raplsworks.com/.

More from this blog

R

Rapls Works — WordPress Plugin Dev & Japanese Input Notes

16 posts

Field notes from a WordPress plugin developer. I publish three plugins on WordPress.org and write about the real problems I hit — IME and Japanese input on the web, Cocoon and Xserver quirks, plugin development, and the occasional deep dive into esoteric programming. Cross-posted from my main blog, Rapls Works (raplsworks.com).