MCPs Dirty Secret: Scalability
Every software vendor is converging
on MCP as the standard for tool
connectivity and who can blame them.
Here's an ecosystem that's
legitimately allows these vendors
to extend their offering to places
they've never had access to before.
No shade.
However, what nobody seems to be talking
about is that every MCP server you add
makes your AI system a little dumber.
And that doesn't matter whether
the MCP server is local or remote.
The damage is done.
Welcome to the Enduring Advantage Podcast.
I'm your host, Zachary Alexander.
Look, I need to start by saying
what you're hearing about MCP
server and protocol are real.
This isn't vaporware.
This isn't another protocol that's going
to die in committee or be forgotten
before the beginning of next quarter.
Anthropic open sourced
MCP in November, 2024.
So that their competitors
wouldn't feel like Anthropic
had controlled their futures.
And it worked.
By March, 2025, OpenAI adopted it.
Sam Altman's
words were simple.
People love MCP, and we're excited
to add support across products.
Then Google DeepMind
confirmed support for Gemini.
Then Microsoft integrated it into
Windows 11 co-pilot and Azure.
Anthropic made them all in off
where they couldn't refuse.
Adopt the standard or cede
the integration layer.
By December, 2025,
Anthropic donated MCP to the Linux
Foundation's Agentic AI Foundation.
Co-founded by Anthropic, Open AI,
Block, Google, Microsoft, AWS,
CloudFlare, and Bloomberg backing it.
Think about that for a second.
Anthropic, OpenAI, Google and
Microsoft companies that compete
on everything, all converged on the
same protocol in under 12 months.
You'd be hard pressed to find another
technology standard that achieved this
kind of crossed vendor adoption this fast.
OAuth took four years.
OpenAI took five years.
Application program interfaces, used
to be the best way to programmatically
work with third party apps.
Obviously, MCP is the replacement.
Now there's a lot that goes into
that statement, but let's just
save that for another podcast on
the future of computer interface.
Now who can really blame all the
vendors for coalescing around the thing?
Here's an ecosystem that legitimately
allows these vendors to extend
their offerings to places they've
never had access to before.
Every major software vendor sees
the same thing, a universal standard
for connecting AI to everything.
The write once integrate everywhere
promise that the industry
has been chasing for decades.
No shade.
This is a genuine.
Infrastructure shift, the kind that
rewrites how systems talk to each other.
But infrastructure shifts are
still governed by physics and when
adoption is driven by competitive
pressure rather than considered
architecture, the physics catch up fast.
So here's where the
commentators get it wrong.
Everybody's talking about MCP servers.
How many servers do you have
connected to your agent?
Has your competitor release remote
MCP servers or are they still using
standard io, which is local resources?
Servers are the unit of measuring
every article, every conference
talk, every vendor pitch.
Server count is a vanity metrics.
The real number that
matters is tool count.
When you connect an MCP server to
your agent, the LLM doesn't just
learn there's a server over there.
Every tool that that server exposes
is loaded into the context window.
It's name, it's description,
it's parameters, it's schema.
That's the menu that LLM has to
read before it can do anything.
And here's the part that
catches teams off guard.
A single MCP server can expose
somewhere between 17 and 50 tools.
One server connected to your GitHub
repository might expose tools for
creating issues, managing pool requests,
listing repos, handling branches,
managing labels, and on and on.
One server, dozens of tools.
As a middle market, CEO, you think you're
being conservative and allowing things
to play out because why would you and
your team want to push the envelope?
Say you've got five MCP servers,
maybe a CRM MCP server, a file system
MCP server, a database MCP server, a
communications platform MCP server,
and something for project management.
Five servers.
Sounds manageable.
Except those five servers might
be pushing 80 to a hundred to
150 tools, descriptions into the
context window on every single call.
Your operators at level 300 and above
are already maxing out this window.
Cursor, heart capped their
tool limit at 40, regardless of
how many servers you connect.
That's not a design choice,
that's an admission.
Developers on GitHub are reporting
that performance starts to degrade,
noticeably around 50, tools.
Over 80 and one developer said
there are reported issues.
Claude just shipped a
feature called MCP Tool.
Search that lazy loads tool descriptions
instead of dumping them all into the
context window at session startup.
Because with seven or more in
servers, users were losing 50
to 70% of their context window
before they'd even typed a prompt.
Let me say that again.
50 to 70% of AI's working memory
consumed by tool descriptions
before you ask it anything.
That's not a scaling problem you
see on the vendor's pitch deck.
That's the physics catching up.
Now, here's where things go sideways.
And I don't mean gradually, I mean like
floors collapsing under a great weight.
The tool count problem I just
described, that's just a snapshot.
It's what your system looks like today.
But MCP (model context protocol) has
more surface area that most vendors
aren't even considering implementing.
Tools are just one primitive.
The protocol also defines
resources, prompts, and sampling
flow where servers can request
the LLM to do work on its behalf.
And we haven't even gotten to
Roots and Elicitation primitives.
They deserve a podcast
episode of their own.
You may think.
That doesn't sound like a bad thing.
It is.
It means that your current tool
count is the floor, not the ceiling.
As vendors mature their implementations
and they will, because that's
how competitive markets work.
Each service context footprint will grow.
The CRM MCP server that ships
with 20 tools today could expose
40 next quarter when the vendor
implements resource subscriptions.
Your database MCP server could double its
context cost when it adds sampling, and
you will not have deployed anything new.
You will not have made a decision
to shrink your context window.
A vendor pushed an update and your context
window got smaller, noticeably smaller.
But the third party vendors
aren't your only exposure.
In past episodes, we've talked
at length about the benefits
of custom internal MCP servers.
We've suggested you think
of them as IT Silo busters.
You and your team could decide
to implement more of the MCP
protocol within your organization.
Building custom internal MCP servers,
providing better service to your users.
Richer tool descriptions, more complete
schemas sampling flows that let your
servers orchestrate work autonomously.
All good engineering decisions, all
providing genuine value, and every
single one exacerbating the problem.
This is the part that
nobody is telling you.
Competence accelerates the problem.
The team's doing MCP, right?
Using the protocols primitives,
they are the ones that will eat
up the context windows fastest.
The work slop that passes for most third
party vendors, MCP servers will allow
you to push this problem off for a while.
It could cost you your competitive
positioning and the company's future.
There's always a price to
be paid for tech trade offs.
On the other hand, the problem exists
whether you and your team create
custom internal MCP servers or not.
It could just as easily spike because one
of your vendors generates buzz using one
of the seldom implemented core primitive.
But if you build internally and
you should understand that you're
accelerating at a cost that nobody's
giving you the tools to monitor.
Then there's also what Scott McNeely said
when he was running Sun Microsystem that
stuck with me Every middle market, CEO
should already know this in their bones.
You gotta find interesting things for
good people to do or you lose them.
The MCP protocol is full
of under implemented
Zachary Alexander: primitives , Resources,
prompts, sampling, roots, elicitation.
These are all frontier capabilities that
most vendors haven't even touched yet.
For a skilled engineer on
your team, that's not a risk.
That's a brand new opportunity space.
And it's your job to find brand
new opportunity spaces for people.
You don't want to saddle them
with nothing but maintenance.
You want your best people exploring.
You want them pushing the edges of what's
possible within your infrastructure.
That curiosity is the
reason you hire them.
It's the reason they stay.
The moment you tell an engineer
that's off limit, it's the moment
they start looking for another job.
So here's the scenario.
It's a lazy Tuesday afternoon in
the middle of a quarter when things
are running relatively smoothly.
One of your best engineers reads a blog
post about MCP sampling or watches a
conference talk on YouTube or see someone
demo it on X or LinkedIn or Instagram.
They get excited, legitimately excited
because they see how sampling could
enhance the workings of your internal
MCP servers and a way of providing
capabilities that the senior management
team has been asking for forever.
They spin it up, it works.
They share it on Slack,
the team's impressed.
Maybe someone else builds on it.
Nobody notices that your systems
context window just shrank 30%.
no architecture review caught it because
it didn't look like a deployment decision.
It looked like a developer experimenting.
No capacity planning flagged
it because nobody's tracking
average context cost per tool.
No change management process was
triggered because nothing was
deployed in the traditional sense.
An engineer got excited about a
protocol primitive and your AI got
measurably dumber, And it's not
just that nobody caught it, it's
that everyone incentivized it.
The senior management team asked for the
capability, the engineer delivered it.
The Slack channel celebrated it.
The entire organizational reward
system pointed toward, the
thing that caused the problem.
This isn't a process failure.
This is a management paradox.
Scott McNeally understood it.
the cost of board talent is higher
than almost any operational risk
because it results in the need for new
operators, new engineers, new people.
You can't govern your way out of this
because governance kills the thing
that makes your team worth having.
You can't tell your best
engineers to stop exploring.
You can't put a review gate on
curiosity or the productivity of
your whole department will decrease.
The only resolution is architectural.
You need a layer that lets your engineers
explore freely while managing the
context consequences automatically.
Something that monitors what's entering
the context window knows which servers
are relevant to which task and keeps
your AI focus regardless on how many
primitives your teams has implemented.
Without that layer, you're
choosing between two losses.
Lose your engineers to boredom
or lose your AI's effectiveness
to their ambitious competitors.
That's not a choice any middle
market CEO should have to make.
So what's actually happening to your
AI when all of this is accumulating?
There's a term for it.
I think it's the most important
concept in the entire episode.
Context rot.
Context Rot happens when information
about the surrounding situation,
prior events, and current state that
lets an agent assign meaning and
choose appropriate actions degrade.
And I need you to understand
this is not a storage problem.
Your context window isn't
a hard drive that fills up.
It's the AI's ability to focus
and focus is a zero sum game.
When you add something, you gotta give up
something else that something is clarity.
Every tool description, sitting in
your context window is competing
for attention against the thing
you actually ask the AI to do.
Your prompt, your data, your question.
The stuff that matters when 150
descriptions are loaded into the
window, the LLM doesn't just have
less room, it has less clarity.
It's trying to reason about
your business question.
Why?
Simultaneously holding the
complete menu of every tool
it could theoretically invoke.
That's not a computer
running out of dis space.
That's a person trying to think in a room
where 150 people are talking at once.
Think about what would happen if you
were to hand someone a filing cabinet
and say The answer is in here somewhere.
Figure it out.
Versus handing them the
three documents they need.
The person is equally capable in both
scenarios, same brain, same expertise,
but in one, they're spending most of
their effort on retrieval and relevance
filtering and the other, they're
spending it on the actual reasoning.
That's context rot.
Your AI isn't getting dumber
. It's getting more distracted
and distraction compounds.
As more tool descriptions
enter the context window, the
signal to noise ratio degrades.
The AI starts making worse tool
selections, choosing the wrong
server for the task, misinterpreting
parameters, Generating responses that
are plausible but not precise, not
because the model is bad, because
the model is drowning in menu options
instead of focusing on your problem.
And here's the part that should
concern every middle market, CEO in
the audience context rot is silent.
There's no error, there's no
alert, no dashboard turning red.
Your AI system just gets gradually
and perceptibly less effective.
The outputs are still formatted correctly.
For the most part.
They still sound confident.
They just are slightly less right than
they were before the degradation, and
slightly less right than the month before.
You won't notice it until someone
asks why the AI's recommendations
aren't landing the way they used to.
By then the rot has been
compounding for months.
Now the obvious response to the thing
I've just described is better search.
Better retrieval, smarter tool selection.
Just use AI to pick the right tools before
loading them into the context window.
. That's the intuitive answer.
It's the kind of thing senior managers
say as they pose for a mic drop
moment when heading out the door.
And it's where most of
the industry is headed.
Claude Codes, MCP tool
search does exactly this.
It builds a lightweight index and
tries to load only relevant tools on
demand, and it's a genuine improvement
over dumping everything in at once,
but it doesn't solve the problem.
It manages a symptom and the reason it
can't solve the problem comes down to how
search actually works programmatically.
Most AI retrieval systems, including
the ones selecting which tools to
load, rely on vector embeddings, they
convert tool descriptions into numerical
coordinates and high dimensional space.
Think of high dimensional
space as characteristics.
Two dimensions would be like
plotting something by cost and speed.
Three dimensions would add a third
characteristic, maybe reliability.
With this information, you increase
the precision of your search.
Similar things end up close together.
Dissimilar things end up far apart
when your system needs to decide
which tools are relevant to ask.
It converts your task into a
vector and it asks what's closest.
That sounds elegant and for simple
cases it works However, there is a
well-documented mathematical problem
called the cursed of dimensionality.
Which states as the number of dimensions
increases (characteristics), the
distance between all points converge.
Everything becomes roughly equal distance,
the same distance from everything else.
And for the record, modern
vector embeddings use hundreds
of thousands of dimensions.
Think about what that
means for tool selection.
Your retrieval system is supposed to
say, these three tools are relevant.
Those 40 aren't.
But in high dimensional space, the
difference between your third most
relevant tool and your 40th most relevant
tool shrinks to statistical noise.
The system can't confidently
say this, not that.
So it hedges, it loads more
tools than is necessary because
the math can't tear them apart.
The very mechanism that's supposed to
create precision, more dimensions, more
nuance, gradually undermines the distance
matrix that makes recommendations work.
You end up with retrieval that's
approximately rather than precisely right.
And approximately right at scale is
just a slower path to context rot.
Vectors tell you what's similar.
They cannot tell you what's
related, and those are
fundamentally different questions.
Similarly means these tool descriptions
use similar language and these
capabilities overlap conceptually.
Relatedness yields this tool has
authority over that task, or this tool
depends on the output of that tool.
This tool is relevant when these
specific conditions are met.
Your CRM MCP server and your project
management MCP server might have
similar vector embeddings because
they both deal with customer related
data, but one has authority over the
customer records and the other doesn't.
Vectors can't encode that
distinction at scale.
They see proximity where
there should be hierarchy.
So you have two kinds of flatness at play.
Structural flatness, vectors in
code, no relationship, no authority,
no hierarchy between the things
they represent, and mathematical
flatness, high dimensional degrades
the distance matrix that's supposed to
compensate for that structural absence.
the result is a retrieval system
that becomes less decisive
as your ecosystem grows.
more tools, more embeddings, more
dimensions, and progressively less
ability to say with confidence, which
tools your AI actually needs right now.
You can't buy your way out of the
context rot with better embeddings,
the architecture of similarity based
retrieval has a ceiling and every MCP
server you add pushes you closer to it.
So let me bring this to a close.
The MCP era is upon us.
The gold rush is justified.
The model context protocol is the
most important infrastructure shift
in AI connectivity we've seen.
And it's not slowing down, but every
MCP server, you add loads, tool
descriptions into your context window.
That's the tool count problem.
The protocol surface area is larger
than what vendors are currently
implementing, which means your exposure
is growing whether you like it or not.
That's the structural bomb.
Your best engineers will accelerate this
because that's what good engineers do.
And you need them to,
that's the human problem.
Read the result is context rot
Silent degradation of your AI's
ability to assign meaning and
choose appropriate actions.
And a better search, a better embedding.
A better retrieval can't solve it
because vectors tell you what's
similar, not what's related.
So what do you do?
You build better architectures
using context graphs rather
than vector embeddings.
Graphs encode the things vectors can't.
Which tool has authority over which
domain, which servers depend on each
other, which primitives are relevant
to which task, under which conditions?
When your AI needs to select tools,
the graph doesn't ask what's closer.
It asks what belongs here?
That's a fundamentally different
question, and it produces
fundamentally better results.
Precise tool selection instead of
an approximate retrieval, three
documents instead of a filing cabinet.
Middle market, CEO should
see this as a missing link.
It will allow you to capture institutional
knowledge to improve your positioning.
Sticking with Vector embeddings will
save some tokens, but misses the plot.
We save tokens where we can, however we
fixate on creating enduring advantage.