Back to Writing
Growth status: Evergreen EvergreenUpdated: Feb 9, 20264 min read

A Practical Critique of Service Meshes: Trade-offs, Pitfalls, and When *Not* to Use One

A service mesh promises to magically standardize everything without touching your application code. Just install it and let the platform clean up the chaos. That promise isn’t a lie. It’s just… not the whole truth

Why Service Meshes Sound Inevitable

Every sufficiently large system eventually hears the phrase:

We should probably add a service mesh.

It sounds reasonable.

Traffic is complex. Security is hard. Observability feels fragmented.

A service mesh promises to standardize all of this without touching application code. Just install it and let the platform handle the mess.

That promise is not a lie. But it is incomplete.

This note is about the cost of that promise. What you gain. What you lose. And when the right answer is still:

No.

image


What a Service Mesh Actually Is

And Why That Matters

At its core, a service mesh inserts itself into every request path, usually via sidecar proxies.

Your services no longer talk to each other directly. They talk through infrastructure.

This matters because you are not adding a library. You are adding a distributed system.

One that runs alongside your own. With its own control plane. Data plane. Failure modes. And upgrade cycle.

If you do not model it as a first-class system, it will surprise you later.

Usually at 3 a.m.

image

The First Trade-off

You Gain Consistency, but Lose Locality

Before a mesh, behavior lives close to the code.

Retries. Timeouts. Authentication. They are explicit.

With a mesh, those behaviors move into configuration.

That gives you consistency, but removes locality.

When something breaks, the answer is no longer in the repository you are reading.

It might be in a policy, applied by a controller, rendered into a proxy, that updated five minutes ago.

Debugging shifts from reasoning about code to reasoning about state.

This is not bad. But it is different. And often underestimated.


Observability

More Visibility Does Not Mean More Understanding

Meshes are observability machines.

Metrics for everything. Spans for every hop. Dashboards that look like flight control.

And yet, teams often feel more blind.

Retries hide failures. Timeouts hide slowness. Success rates lie politely.

The system looks healthy until it suddenly is not.

The hardest question becomes:

Are we being protected, or are we being masked?

If you cannot answer that confidently, you are operating on vibes.

image


Performance Costs

The Tax You Pay Forever

Every sidecar proxy consumes resources.

CPU. Memory. Latency.

At small scale, this feels negligible. At large scale, it becomes a budget line item.

High-throughput systems notice. Latency-sensitive systems suffer. Cost-optimized systems feel it immediately.

A mesh is never free. It is a permanent tax, paid per request.

If you cannot explain why that tax is worth it, you are already in trouble.


Operational Reality

The Part That Rarely Makes the Slides

Running a mesh means:

Upgrading control planes. Rotating certificates. Debugging broken proxies. Managing version skew.

It also changes team structure.

Platform teams become infrastructure product owners. Application teams become consumers of invisible rules.

If ownership is unclear, the mesh becomes a blame amplifier.

If your Kubernetes fundamentals are shaky, a mesh will not stabilize them. It will magnify every weakness.


Policy Sprawl

When Configuration Becomes Behavior

Service meshes are policy engines.

Over time, policies accumulate.

Traffic rules. Security rules. Exception rules.

Each one makes sense in isolation. Together, they form a system nobody fully understands.

Now behavior lives in YAML. Spread across namespaces. Applied eventually. Remembered vaguely.

When production breaks, you are no longer debugging software.

You are excavating intent.


When a Service Mesh Makes Sense

Despite everything, there are valid reasons to adopt one.

A mesh fits when:

You operate many teams with many services across trust boundaries.

When security requirements demand uniform enforcement and application teams cannot realistically implement it themselves.

When platform maturity is high, ownership is clear, and engineers understand distributed failure.

In short: when complexity already exists and is unavoidable.


When You Should *Not* Use a Service Mesh

Do not use a mesh when:

Your system is small enough to understand. Your main bottleneck is product velocity. Your team is still learning Kubernetes. Your outages come from basics, not edge cases.

Do not install a mesh to feel mature. Install it because you are already paying the complexity cost elsewhere.

Often, a load balancer, good client libraries, and boring TLS are more than enough.


The Real Question to Ask

The question is not:

Is a service mesh good?

The question is:

What complexity are we choosing to own?

A service mesh trades application simplicity for operational and cognitive complexity.

That trade can be correct. Or disastrous.

Simplicity is not a phase. It is a competitive advantage.

If you can keep your system simple, do it.

And do not apologize.

Update History

Feb 9, 2026The Real Question to Ask
Feb 9, 2026When You Should Not Use a Service Mesh
Feb 9, 2026When a Service Mesh Makes Sense
Feb 9, 2026Policy Sprawl
Feb 9, 2026Operational Reality
Feb 9, 2026Performance Costs
Feb 9, 2026The First Trade-off
Feb 9, 2026The First Trade-off
Feb 9, 2026What a Service Mesh Actually Is
Feb 9, 2026Why Service Meshes Sound Inevitable
Feb 9, 2026Introduction

Share this writing