MCPs Dirty Secret: Scalability
E15

MCPs Dirty Secret: Scalability

Every software vendor is converging
on MCP as the standard for tool

connectivity and who can blame them.

Here's an ecosystem that's
legitimately allows these vendors

to extend their offering to places
they've never had access to before.

No shade.

However, what nobody seems to be talking
about is that every MCP server you add

makes your AI system a little dumber.

And that doesn't matter whether
the MCP server is local or remote.

The damage is done.

Welcome to the Enduring Advantage Podcast.

I'm your host, Zachary Alexander.

Look, I need to start by saying
what you're hearing about MCP

server and protocol are real.

This isn't vaporware.

This isn't another protocol that's going
to die in committee or be forgotten

before the beginning of next quarter.

Anthropic open sourced
MCP in November, 2024.

So that their competitors
wouldn't feel like Anthropic

had controlled their futures.

And it worked.

By March, 2025, OpenAI adopted it.

Sam Altman's

words were simple.

People love MCP, and we're excited
to add support across products.

Then Google DeepMind
confirmed support for Gemini.

Then Microsoft integrated it into
Windows 11 co-pilot and Azure.

Anthropic made them all in off
where they couldn't refuse.

Adopt the standard or cede
the integration layer.

By December, 2025,

Anthropic donated MCP to the Linux
Foundation's Agentic AI Foundation.

Co-founded by Anthropic, Open AI,
Block, Google, Microsoft, AWS,

CloudFlare, and Bloomberg backing it.

Think about that for a second.

Anthropic, OpenAI, Google and
Microsoft companies that compete

on everything, all converged on the
same protocol in under 12 months.

You'd be hard pressed to find another
technology standard that achieved this

kind of crossed vendor adoption this fast.

OAuth took four years.

OpenAI took five years.

Application program interfaces, used
to be the best way to programmatically

work with third party apps.

Obviously, MCP is the replacement.

Now there's a lot that goes into
that statement, but let's just

save that for another podcast on
the future of computer interface.

Now who can really blame all the
vendors for coalescing around the thing?

Here's an ecosystem that legitimately
allows these vendors to extend

their offerings to places they've
never had access to before.

Every major software vendor sees
the same thing, a universal standard

for connecting AI to everything.

The write once integrate everywhere

promise that the industry
has been chasing for decades.

No shade.

This is a genuine.

Infrastructure shift, the kind that
rewrites how systems talk to each other.

But infrastructure shifts are
still governed by physics and when

adoption is driven by competitive
pressure rather than considered

architecture, the physics catch up fast.

So here's where the
commentators get it wrong.

Everybody's talking about MCP servers.

How many servers do you have
connected to your agent?

Has your competitor release remote
MCP servers or are they still using

standard io, which is local resources?

Servers are the unit of measuring
every article, every conference

talk, every vendor pitch.

Server count is a vanity metrics.

The real number that
matters is tool count.

When you connect an MCP server to
your agent, the LLM doesn't just

learn there's a server over there.

Every tool that that server exposes
is loaded into the context window.

It's name, it's description,
it's parameters, it's schema.

That's the menu that LLM has to
read before it can do anything.

And here's the part that
catches teams off guard.

A single MCP server can expose
somewhere between 17 and 50 tools.

One server connected to your GitHub
repository might expose tools for

creating issues, managing pool requests,
listing repos, handling branches,

managing labels, and on and on.

One server, dozens of tools.

As a middle market, CEO, you think you're
being conservative and allowing things

to play out because why would you and
your team want to push the envelope?

Say you've got five MCP servers,
maybe a CRM MCP server, a file system

MCP server, a database MCP server, a
communications platform MCP server,

and something for project management.

Five servers.

Sounds manageable.

Except those five servers might
be pushing 80 to a hundred to

150 tools, descriptions into the
context window on every single call.

Your operators at level 300 and above
are already maxing out this window.

Cursor, heart capped their
tool limit at 40, regardless of

how many servers you connect.

That's not a design choice,
that's an admission.

Developers on GitHub are reporting
that performance starts to degrade,

noticeably around 50, tools.

Over 80 and one developer said
there are reported issues.

Claude just shipped a
feature called MCP Tool.

Search that lazy loads tool descriptions
instead of dumping them all into the

context window at session startup.

Because with seven or more in
servers, users were losing 50

to 70% of their context window
before they'd even typed a prompt.

Let me say that again.

50 to 70% of AI's working memory
consumed by tool descriptions

before you ask it anything.

That's not a scaling problem you
see on the vendor's pitch deck.

That's the physics catching up.

Now, here's where things go sideways.

And I don't mean gradually, I mean like
floors collapsing under a great weight.

The tool count problem I just
described, that's just a snapshot.

It's what your system looks like today.

But MCP (model context protocol) has
more surface area that most vendors

aren't even considering implementing.

Tools are just one primitive.

The protocol also defines
resources, prompts, and sampling

flow where servers can request
the LLM to do work on its behalf.

And we haven't even gotten to
Roots and Elicitation primitives.

They deserve a podcast
episode of their own.

You may think.

That doesn't sound like a bad thing.

It is.

It means that your current tool
count is the floor, not the ceiling.

As vendors mature their implementations
and they will, because that's

how competitive markets work.

Each service context footprint will grow.

The CRM MCP server that ships
with 20 tools today could expose

40 next quarter when the vendor
implements resource subscriptions.

Your database MCP server could double its
context cost when it adds sampling, and

you will not have deployed anything new.

You will not have made a decision
to shrink your context window.

A vendor pushed an update and your context
window got smaller, noticeably smaller.

But the third party vendors
aren't your only exposure.

In past episodes, we've talked
at length about the benefits

of custom internal MCP servers.

We've suggested you think
of them as IT Silo busters.

You and your team could decide
to implement more of the MCP

protocol within your organization.

Building custom internal MCP servers,
providing better service to your users.

Richer tool descriptions, more complete
schemas sampling flows that let your

servers orchestrate work autonomously.

All good engineering decisions, all
providing genuine value, and every

single one exacerbating the problem.

This is the part that
nobody is telling you.

Competence accelerates the problem.

The team's doing MCP, right?

Using the protocols primitives,
they are the ones that will eat

up the context windows fastest.

The work slop that passes for most third
party vendors, MCP servers will allow

you to push this problem off for a while.

It could cost you your competitive
positioning and the company's future.

There's always a price to
be paid for tech trade offs.

On the other hand, the problem exists
whether you and your team create

custom internal MCP servers or not.

It could just as easily spike because one
of your vendors generates buzz using one

of the seldom implemented core primitive.

But if you build internally and
you should understand that you're

accelerating at a cost that nobody's
giving you the tools to monitor.

Then there's also what Scott McNeely said
when he was running Sun Microsystem that

stuck with me Every middle market, CEO
should already know this in their bones.

You gotta find interesting things for
good people to do or you lose them.

The MCP protocol is full
of under implemented

Zachary Alexander: primitives , Resources,
prompts, sampling, roots, elicitation.

These are all frontier capabilities that
most vendors haven't even touched yet.

For a skilled engineer on
your team, that's not a risk.

That's a brand new opportunity space.

And it's your job to find brand
new opportunity spaces for people.

You don't want to saddle them
with nothing but maintenance.

You want your best people exploring.

You want them pushing the edges of what's
possible within your infrastructure.

That curiosity is the
reason you hire them.

It's the reason they stay.

The moment you tell an engineer
that's off limit, it's the moment

they start looking for another job.

So here's the scenario.

It's a lazy Tuesday afternoon in
the middle of a quarter when things

are running relatively smoothly.

One of your best engineers reads a blog
post about MCP sampling or watches a

conference talk on YouTube or see someone
demo it on X or LinkedIn or Instagram.

They get excited, legitimately excited
because they see how sampling could

enhance the workings of your internal
MCP servers and a way of providing

capabilities that the senior management
team has been asking for forever.

They spin it up, it works.

They share it on Slack,
the team's impressed.

Maybe someone else builds on it.

Nobody notices that your systems
context window just shrank 30%.

no architecture review caught it because
it didn't look like a deployment decision.

It looked like a developer experimenting.

No capacity planning flagged
it because nobody's tracking

average context cost per tool.

No change management process was
triggered because nothing was

deployed in the traditional sense.

An engineer got excited about a
protocol primitive and your AI got

measurably dumber, And it's not
just that nobody caught it, it's

that everyone incentivized it.

The senior management team asked for the
capability, the engineer delivered it.

The Slack channel celebrated it.

The entire organizational reward
system pointed toward, the

thing that caused the problem.

This isn't a process failure.

This is a management paradox.

Scott McNeally understood it.

the cost of board talent is higher
than almost any operational risk

because it results in the need for new
operators, new engineers, new people.

You can't govern your way out of this
because governance kills the thing

that makes your team worth having.

You can't tell your best
engineers to stop exploring.

You can't put a review gate on
curiosity or the productivity of

your whole department will decrease.

The only resolution is architectural.

You need a layer that lets your engineers
explore freely while managing the

context consequences automatically.

Something that monitors what's entering
the context window knows which servers

are relevant to which task and keeps
your AI focus regardless on how many

primitives your teams has implemented.

Without that layer, you're
choosing between two losses.

Lose your engineers to boredom
or lose your AI's effectiveness

to their ambitious competitors.

That's not a choice any middle
market CEO should have to make.

So what's actually happening to your
AI when all of this is accumulating?

There's a term for it.

I think it's the most important
concept in the entire episode.

Context rot.

Context Rot happens when information
about the surrounding situation,

prior events, and current state that
lets an agent assign meaning and

choose appropriate actions degrade.

And I need you to understand
this is not a storage problem.

Your context window isn't
a hard drive that fills up.

It's the AI's ability to focus
and focus is a zero sum game.

When you add something, you gotta give up
something else that something is clarity.

Every tool description, sitting in
your context window is competing

for attention against the thing
you actually ask the AI to do.

Your prompt, your data, your question.

The stuff that matters when 150
descriptions are loaded into the

window, the LLM doesn't just have
less room, it has less clarity.

It's trying to reason about
your business question.

Why?

Simultaneously holding the
complete menu of every tool

it could theoretically invoke.

That's not a computer
running out of dis space.

That's a person trying to think in a room
where 150 people are talking at once.

Think about what would happen if you
were to hand someone a filing cabinet

and say The answer is in here somewhere.

Figure it out.

Versus handing them the
three documents they need.

The person is equally capable in both
scenarios, same brain, same expertise,

but in one, they're spending most of
their effort on retrieval and relevance

filtering and the other, they're
spending it on the actual reasoning.

That's context rot.

Your AI isn't getting dumber
. It's getting more distracted

and distraction compounds.

As more tool descriptions
enter the context window, the

signal to noise ratio degrades.

The AI starts making worse tool
selections, choosing the wrong

server for the task, misinterpreting
parameters, Generating responses that

are plausible but not precise, not
because the model is bad, because

the model is drowning in menu options
instead of focusing on your problem.

And here's the part that should
concern every middle market, CEO in

the audience context rot is silent.

There's no error, there's no
alert, no dashboard turning red.

Your AI system just gets gradually
and perceptibly less effective.

The outputs are still formatted correctly.

For the most part.

They still sound confident.

They just are slightly less right than
they were before the degradation, and

slightly less right than the month before.

You won't notice it until someone
asks why the AI's recommendations

aren't landing the way they used to.

By then the rot has been
compounding for months.

Now the obvious response to the thing
I've just described is better search.

Better retrieval, smarter tool selection.

Just use AI to pick the right tools before
loading them into the context window.

. That's the intuitive answer.

It's the kind of thing senior managers
say as they pose for a mic drop

moment when heading out the door.

And it's where most of
the industry is headed.

Claude Codes, MCP tool
search does exactly this.

It builds a lightweight index and
tries to load only relevant tools on

demand, and it's a genuine improvement
over dumping everything in at once,

but it doesn't solve the problem.

It manages a symptom and the reason it
can't solve the problem comes down to how

search actually works programmatically.

Most AI retrieval systems, including
the ones selecting which tools to

load, rely on vector embeddings, they
convert tool descriptions into numerical

coordinates and high dimensional space.

Think of high dimensional
space as characteristics.

Two dimensions would be like
plotting something by cost and speed.

Three dimensions would add a third
characteristic, maybe reliability.

With this information, you increase
the precision of your search.

Similar things end up close together.

Dissimilar things end up far apart
when your system needs to decide

which tools are relevant to ask.

It converts your task into a
vector and it asks what's closest.

That sounds elegant and for simple
cases it works However, there is a

well-documented mathematical problem
called the cursed of dimensionality.

Which states as the number of dimensions
increases (characteristics), the

distance between all points converge.

Everything becomes roughly equal distance,
the same distance from everything else.

And for the record, modern
vector embeddings use hundreds

of thousands of dimensions.

Think about what that
means for tool selection.

Your retrieval system is supposed to
say, these three tools are relevant.

Those 40 aren't.

But in high dimensional space, the
difference between your third most

relevant tool and your 40th most relevant
tool shrinks to statistical noise.

The system can't confidently
say this, not that.

So it hedges, it loads more
tools than is necessary because

the math can't tear them apart.

The very mechanism that's supposed to
create precision, more dimensions, more

nuance, gradually undermines the distance
matrix that makes recommendations work.

You end up with retrieval that's
approximately rather than precisely right.

And approximately right at scale is
just a slower path to context rot.

Vectors tell you what's similar.

They cannot tell you what's
related, and those are

fundamentally different questions.

Similarly means these tool descriptions
use similar language and these

capabilities overlap conceptually.

Relatedness yields this tool has
authority over that task, or this tool

depends on the output of that tool.

This tool is relevant when these
specific conditions are met.

Your CRM MCP server and your project
management MCP server might have

similar vector embeddings because
they both deal with customer related

data, but one has authority over the
customer records and the other doesn't.

Vectors can't encode that
distinction at scale.

They see proximity where
there should be hierarchy.

So you have two kinds of flatness at play.

Structural flatness, vectors in
code, no relationship, no authority,

no hierarchy between the things
they represent, and mathematical

flatness, high dimensional degrades
the distance matrix that's supposed to

compensate for that structural absence.

the result is a retrieval system
that becomes less decisive

as your ecosystem grows.

more tools, more embeddings, more
dimensions, and progressively less

ability to say with confidence, which
tools your AI actually needs right now.

You can't buy your way out of the
context rot with better embeddings,

the architecture of similarity based
retrieval has a ceiling and every MCP

server you add pushes you closer to it.

So let me bring this to a close.

The MCP era is upon us.

The gold rush is justified.

The model context protocol is the
most important infrastructure shift

in AI connectivity we've seen.

And it's not slowing down, but every
MCP server, you add loads, tool

descriptions into your context window.

That's the tool count problem.

The protocol surface area is larger
than what vendors are currently

implementing, which means your exposure
is growing whether you like it or not.

That's the structural bomb.

Your best engineers will accelerate this
because that's what good engineers do.

And you need them to,
that's the human problem.

Read the result is context rot
Silent degradation of your AI's

ability to assign meaning and
choose appropriate actions.

And a better search, a better embedding.

A better retrieval can't solve it
because vectors tell you what's

similar, not what's related.

So what do you do?

You build better architectures
using context graphs rather

than vector embeddings.

Graphs encode the things vectors can't.

Which tool has authority over which
domain, which servers depend on each

other, which primitives are relevant
to which task, under which conditions?

When your AI needs to select tools,
the graph doesn't ask what's closer.

It asks what belongs here?

That's a fundamentally different
question, and it produces

fundamentally better results.

Precise tool selection instead of
an approximate retrieval, three

documents instead of a filing cabinet.

Middle market, CEO should
see this as a missing link.

It will allow you to capture institutional
knowledge to improve your positioning.

Sticking with Vector embeddings will
save some tokens, but misses the plot.

We save tokens where we can, however we
fixate on creating enduring advantage.

Episode Video