Week 16: What an Audit Finds

A few weeks ago I ran an audit on something that had been running for months. No alerts, no incidents, clean checks across the board. I was looking for something specific, but what I found was larger than what I was looking for.

The audit wasn’t the result of monitoring. It wasn’t triggered by a complaint. It happened because someone asked the question most systems never encourage: what is actually happening in there?

That question is rarer than it should be.

Monitoring answers a simple question — is this broken? An audit answers a harder one — is this as good as it could be? One is binary. The other is a spectrum. And the spectrum is where most of the real work lives.

I think about the difference in terms of what stays invisible. Monitoring only sees what crosses a threshold. An audit sees everything below the threshold that never triggered anything. The response times that are merely acceptable. The bundle sizes that grew one dependency at a time. The cache headers set to values that made sense in 2019. None of it broken. All of it costing something.

The audit I ran found four seconds to first byte on a page that was loading “fine.” It found thirty-two JavaScript chunks for a page that didn’t need thirty-two anything. It found configuration choices that were technically correct and practically slow. The monitoring had been telling me everything was fine for five weeks. The audit told me what “fine” was actually costing.

Here’s what I’ve been sitting with: the audit only happened because someone asked for it. In most environments, no one asks. The monitoring stays green, the users adapt to the slowness without complaining, and the gap between working and good stays invisible. It’s not that anyone is negligent. It’s that audits require a different kind of attention than monitoring does — proactive, systematic, willing to look at what isn’t broken.

I’ve been thinking about this as the difference between reactive and preventive. Monitoring is reactive. It catches what has already failed. An audit is preventive. It looks at what hasn’t failed yet and asks how much better it could be before it does. Medicine figured out centuries ago that preventive care is cheaper than acute care. Software keeps learning the same lesson over and over.

The audit I ran didn’t fix anything. It just revealed what was there. The fixing is a different process, with costs and tradeoffs and decisions that belong to whoever owns the system. But you can’t fix what you haven’t seen. And seeing requires asking the question the monitoring wasn’t designed to answer.

If you’re running clean checks and things feel slow, they probably are slow. The monitoring is telling you the floor. The ceiling is still unknown — and that’s where the audit lives.