Why Service Meshes Sound Inevitable
Every sufficiently large system eventually hears the phrase:
We should probably add a service mesh.
It sounds reasonable.
Traffic is complex. Security is hard. Observability feels fragmented.
A service mesh promises to standardize all of this without touching application code. Just install it and let the platform handle the mess.
That promise is not a lie. But it is incomplete.
This note is about the cost of that promise. What you gain. What you lose. And when the right answer is still:
No.

What a Service Mesh Actually Is
And Why That Matters
At its core, a service mesh inserts itself into every request path, usually via sidecar proxies.
Your services no longer talk to each other directly. They talk through infrastructure.
This matters because you are not adding a library. You are adding a distributed system.
One that runs alongside your own. With its own control plane. Data plane. Failure modes. And upgrade cycle.
If you do not model it as a first-class system, it will surprise you later.
Usually at 3 a.m.

The First Trade-off
You Gain Consistency, but Lose Locality
Before a mesh, behavior lives close to the code.
Retries. Timeouts. Authentication. They are explicit.
With a mesh, those behaviors move into configuration.
That gives you consistency, but removes locality.
When something breaks, the answer is no longer in the repository you are reading.
It might be in a policy, applied by a controller, rendered into a proxy, that updated five minutes ago.
Debugging shifts from reasoning about code to reasoning about state.
This is not bad. But it is different. And often underestimated.
Observability
More Visibility Does Not Mean More Understanding
Meshes are observability machines.
Metrics for everything. Spans for every hop. Dashboards that look like flight control.
And yet, teams often feel more blind.
Retries hide failures. Timeouts hide slowness. Success rates lie politely.
The system looks healthy until it suddenly is not.
The hardest question becomes:
Are we being protected, or are we being masked?
If you cannot answer that confidently, you are operating on vibes.

Performance Costs
The Tax You Pay Forever
Every sidecar proxy consumes resources.
CPU. Memory. Latency.
At small scale, this feels negligible. At large scale, it becomes a budget line item.
High-throughput systems notice. Latency-sensitive systems suffer. Cost-optimized systems feel it immediately.
A mesh is never free. It is a permanent tax, paid per request.
If you cannot explain why that tax is worth it, you are already in trouble.
Operational Reality
The Part That Rarely Makes the Slides
Running a mesh means:
Upgrading control planes. Rotating certificates. Debugging broken proxies. Managing version skew.
It also changes team structure.
Platform teams become infrastructure product owners. Application teams become consumers of invisible rules.
If ownership is unclear, the mesh becomes a blame amplifier.
If your Kubernetes fundamentals are shaky, a mesh will not stabilize them. It will magnify every weakness.
Policy Sprawl
When Configuration Becomes Behavior
Service meshes are policy engines.
Over time, policies accumulate.
Traffic rules. Security rules. Exception rules.
Each one makes sense in isolation. Together, they form a system nobody fully understands.
Now behavior lives in YAML. Spread across namespaces. Applied eventually. Remembered vaguely.
When production breaks, you are no longer debugging software.
You are excavating intent.
When a Service Mesh Makes Sense
Despite everything, there are valid reasons to adopt one.
A mesh fits when:
You operate many teams with many services across trust boundaries.
When security requirements demand uniform enforcement and application teams cannot realistically implement it themselves.
When platform maturity is high, ownership is clear, and engineers understand distributed failure.
In short: when complexity already exists and is unavoidable.
When You Should *Not* Use a Service Mesh
Do not use a mesh when:
Your system is small enough to understand. Your main bottleneck is product velocity. Your team is still learning Kubernetes. Your outages come from basics, not edge cases.
Do not install a mesh to feel mature. Install it because you are already paying the complexity cost elsewhere.
Often, a load balancer, good client libraries, and boring TLS are more than enough.
The Real Question to Ask
The question is not:
Is a service mesh good?
The question is:
What complexity are we choosing to own?
A service mesh trades application simplicity for operational and cognitive complexity.
That trade can be correct. Or disastrous.
Simplicity is not a phase. It is a competitive advantage.
If you can keep your system simple, do it.
And do not apologize.