Back-of-the-Envelope Value Density Calculations
There's a truism in management and that
is that you can hide a whole lot of sins
in the math use to describe any situation.
So what I'd like to do is do a
back of the envelope calculation.
If it doesn't work there,
then it doesn't work.
With that in mind, I'd like to talk about
why value density design makes more sense
than agent to agent architecture or any
of the other agent intensive architectures
in production grade AI services.
Welcome to the Value Density
Podcast, where we talk about how
good things come in small prompts.
I'm your host, Zachary Alexander.
Here's the thing.
There's a huge flaw in the way that
AI is being sold to the general
public and middle market companies.
The flaw is that AI tokens don't matter
when tackling real business challenges.
Let me break it down for you.
An AI token is a unit of
measure that natural language
programs use to process prompts.
In a nutshell, your prompt plus some
housekeeping is split into tokens.
Currently, roughly about
four characters per token.
Those tokens can then been fed
to LLMs and the magic captains.
Then tokens come out the other side in
the form of some bright new shiny object.
Now if you watch YouTube where
the majority of people get their
information about the AI revolution,
you think that AI tokens are free or
maybe part of some promotional scheme.
Reality check, no and no.
They cost real money the process and
real money to store, and that includes
money for power and real estate and
all the overhead that goes with it.
So let's just talk about the
reality of token economics.
Most consumers still use
free accounts, roughly 70%.
So they think that using a free account
is actually the way that business is done.
Here's where it gets
interesting and expensive.
These consumers can't understand
why AI capable applications
might cost more than less capable
applications they're used to using.
Further they also think
that the price of tokens
Zachary Alexander: are coming down.
They'd be wrong.
The price of tokens is actually
going up and ask anyone who has
built a system on Anthropic's Claude.
Now granted, Anthropic's business
model is a little different
than other major players.
They are an API first
organization, API stands for
Application Programming Interface.
What this means is that they cater
to companies building AI systems, not
primarily to consumers and ad hoc AI bots.
Small difference, but it matters.
This gives us a perfect window
into real enterprise cost.
While OpenAI subsidizes consumer
usage with enterprise revenue.
Anthropic shows us the true
cost structure, and guess what?
It's not decreasing.
You would be forgiven for assuming
that the cost of tokens is going down.
We've been led to believe for over a
hundred years that automation brings
down costs âthank you Henry Ford.
And what could be more automated
than the production of AI tokens?
Unfortunately, here's where the
glitch and the matrix shows up.
The price of tokens keeps going up because
AI systems are becoming more capable.
Hence, tagline, "Good things
come in small prompts."
Think about it this way.
When Claude Opus processes
your request, it's not running
a simple search algorithm.
It's running billions of parameters
through complex neural networks.
You can think of a neural network
as a distributed set of computer
systems that process patterns,
much like mini human brain.
They learn to solve problems and make
decisions by finding connections and data.
They also consume massive amounts
of computational resources.
More sophisticated models, cost more
per token to operate, which leads to a
conversation about why not every task or
operation should be handled by AI agents.
This creates what I call
the compatibility paradox.
The better AI gets at solving
complex problems, the more
it costs per interaction.
It's like hiring a brain surgeon when
what you really needed was a general
practitioner or even a handyman.
You get better options.
But you consume resources that are
difficult, if not impossible to replace.
Spoiler alert, if you think that
it's okay to waste AI resources,
then you may be setting yourself up
for challenges in the near future.
The reason is that LLMs are going
to continue to get more capable
and the most capable are going
to always be in short supply.
With that in mind, let's talk about
why agent to agent architecture is a
financial disaster waiting to happen.
It's also the reason I
started this podcast.
It was like people were going
around talking about things
as if tokens didn't matter.
That they were some free resources
laying on the carpet there and
waiting on you to pick it up.
The big LLM providers are pushing schemes
like agent to agent architectures that
cause you to use more tokens or more
sophisticated tokens than you need.
Big surprise, right?
They are in the business
of selling tokens.
Imagine this scenario.
You have an AI agent that needs
to collaborate with three other
AI agents to complete a customer
service request Agent A, processes
the initial query tokens consumed.
Agent A communicates with agent B
about inventory levels more tokens.
Agent B queries agent C
about shipping logistics.
Even more tokens.
Agency reports back through the chain.
You guessed it.
More tokens Sounds complicated.
It is.
Sounds expensive.
It's even worse when you're doing
back of the envelope calculations.
Here's where we have to talk about
the second glitch in the matrix.
AI workflows are non-deterministic,
which is a fancy way of saying that the
same question will result in slightly
different options or completions,
Without the proper context, aI systems
are prone to hallucinations (i.e.,
they can make stuff up.).
You can fix the LLM decision making
process with better context and
additional evaluation stages.
Okay, now let me know if you can
hear the token cash register going.
Kaching, kaching, kaching.
It's almost like pumping gas.
If you listen to the big LLM providers,
you're pumping gas into a network
of agents each with the potential
of going wildly off course because
of the lack of proper context.
And as you can imagine, if you've
gotten this far in the video, the
answer is value density design.
What does this mean?
You can make the case that value
density design is an exploration value.
Density design is about doing
better with less, fewer tokens,
fewer more focused staff.
It's about addressing concerns more
completely, not just skimming the
surface, The disconnect: People think that
starting small is the right way to go.
It's hard to start small because
AI wants to give you options.
It's up to you to pick the best option.
You can shut it down and complain
that it generated something you hated.
But the value density way of thinking
about things, it's to take what the
LLM gives you and iterate on it.
Remember, the goal is to do better,
not necessarily complete the task
in the shortest amount of time.
Everybody can do faster.
It's what Henry Ford would do.
He famously said, you can have any
color you want as long as it's black.
AI allows you to generate completions
that are more what you want and need.
Speed is no longer based on how fast
your monitor is or your computer, or even
your internet connection for that matter.
Speed is based on value density.
It's based on how complete the
answer is, how close to what you
desire through deeper interactions.
Value density solves that Christmas
morning phenomenon where you open that
very special gift play with it, and
it performs exactly as advertised.
Unfortunately, much of what passes as good
AI engineering is without value density.
For more value density insight,
subscribe and check out additional
episodes of the Value Density Podcast.