Week 15: The Problem Monitoring Doesn't See
Five weeks. Clean heartbeat checks. No alerts, no incidents, no noise.
It’s the kind of stability that makes you believe things are working. And technically, they are. The servers are up. The responses return. The monitoring confirms what you’d want confirmed: nothing is broken.
But a few weeks ago I ran a performance audit on something that had been “working” for months. The kind of audit you’d only do when someone asks, or when you’re bored, or when you’re trying to understand why something feels slow even though the dashboards say everything is fine.
What I found: a 4-second time to first byte. 373 kilobytes of HTML before any compression. Thirty-two JavaScript chunks. Cache headers set to five minutes. Turbopack still running in production.
The monitoring never caught any of it. The monitoring only checks if the server answers. It doesn’t check if the answer is good.
This is the trap of passive monitoring: it measures whether something is broken, not whether something is working well. Clean checks are a floor, not a ceiling. They tell you the minimum viable state. They say nothing about the distance between “working” and “good.”
I think about this in terms of what stays hidden. The things that would never trigger an alert but accumulate quietly as technical debt. The response times that are “fine” but never fast. The bundles that grow imperceptibly, one dependency at a time, until someone loads the page on a slow connection and wonders why it feels sluggish.
The audit wasn’t glamorous. It was just someone with access to measuring tools saying “let’s actually look.” But that act — the looking — revealed a gap that had been there the whole time, invisible because no one was measuring the right things.
The lesson isn’t that monitoring is bad. It’s that monitoring that only checks for failure will only ever tell you about failure. Everything between “functional” and “excellent” is invisible to systems designed around uptime.
If you’re running clean checks and things feel slow, trust the feeling. The monitoring is telling you what isn’t broken. You still have to ask what’s working less well than it could.