The failures your error handling hides

Most bugs tell you they exist. A crash, a 500, a red line in the logs, a user complaint the same afternoon. You find out because something visibly breaks, and visible breakage gets fixed. The dangerous bugs are the quiet ones: the feature that silently does nothing, reports success to its own caller, and leaves no trace. Nobody files a ticket for an email that was never sent.

Last week we did a full read-through of the HexVault codebase looking specifically for that second category — not a feature pass, not a sweep of reported issues, but a deliberate hunt for silent failure. This is what it found, and the cheap checks that surface it. You can run every one of them against your own service in the next ten minutes.

The bug that pages you at 3am is doing you a favour. The one to worry about is the feature that has been broken for a month behind a clean log file.

The failure that doesn’t look like one

Here is the pattern, reduced to its essence: a block of work wrapped in a broad exception handler, so that if anything goes wrong the surrounding request still succeeds.

Python · the swallow
# Meant to absorb a flaky mail server.
# Absorbs your own bugs just as quietly.
try:
    send_breach_alert_email(email, username, breached)
except Exception as e:
    log.warning(f'breach alert failed: {e}')

# If `username` was never assigned above, this raises
# NameError -> caught here -> the scan still returns 200.

The intent is defensive and entirely reasonable — sending a notification should not fail a password-breach scan. But look at what the handler actually catches. If username in that call was never assigned in this function, Python raises NameError, and the same except that was meant to absorb a dropped SMTP connection now absorbs a plain bug. The scan returns success. The user is never emailed. Nothing is logged above warning, and warnings scroll past. The feature is dead and the system reports perfect health.

A broad except is a threat model of its own. It cannot tell the difference between “the network hiccuped” and “this code has never once run correctly.” Both leave as the same swallowed line.

Undefined names, hidden in plain sight

Finding these is unglamorous and takes about two seconds. A name that is used but never assigned is exactly what static analysis exists to catch, and you need neither types nor a heavy toolchain for it:

shell · two seconds, no config
$ python3 -m pyflakes app.py | grep 'undefined name'

app.py:26858: undefined name 'username'
app.py:18865: undefined name 'send_email'
app.py:21812: undefined name 'email'

On our pass, that single command flagged real ones. A breach-alert email that referenced a username the function never fetched. A vault-inheritance notice that called a helper named send_email which did not exist. A login path that tried to auto-join a user to their organisation by email domain, using an email variable that was out of scope. Every one sat inside a try/except — which is precisely why none had ever surfaced as an error. To the user, breach alerts simply never arrived, and there was nothing in the logs to say why.

The lesson is not “never catch broadly.” It is that a broad catch and an unguarded name are each individually forgivable and collectively invisible. Run the check that finds the second thing, so the first thing stops hiding it.

Endpoints that were never built

The other half of silent failure lives on the seam between frontend and backend. A page ships JavaScript that calls an endpoint. The endpoint returns 404. The JavaScript, written defensively, catches the failure and renders an empty state. The feature looks like it is merely quiet, when in fact it has no backend at all.

This is mechanical to find too. Extract every path the frontend fetches, extract every route the backend registers, and diff the two:

the diff that finds ghost endpoints
# 1. every path the frontend calls
called = fetch_paths('static/**/*.js')
# 2. every route the backend registers
served = route_paths('app.py')
# 3. called, but never served
for path in called - served:
    print('404 waiting to happen:', path)

Two features on our own site were calling into thin air. A Secure Send panel — create an encrypted, expiring, one-time link — whose settings UI hit /api/secure-send to create, list and revoke, and whose viewer fetched the payload to decrypt it, where not one of those routes existed. And an organisation activity feed, loaded on the dashboard, streaming from an endpoint that 404’d on every poll. Both degraded politely to an empty box, which is exactly why nobody had noticed.

We built the backends rather than deleting the features. The Secure Send encryption was all client-side and correct; it had simply been talking to a server that never answered.

The migration that lived in dead code

One more, because it is the kind that passes every test and then breaks a fresh deploy months later. We found a database migration — the statement that adds a locale column to the users table — sitting after a return, inside a handler at the end of the file. Unreachable. It had never run.

On our production database the column existed, because it had been added back when that code was still live, long before the surrounding logic drifted and stranded it. Everything worked. But a clean deploy — a disaster-recovery rebuild, a new region, a contributor’s local instance — would have built a database without that column, and the endpoints that read and write it would have thrown on the first request. A latent failure with a fuse measured in deploys, not seconds.

Dead code is not neutral. Code after a return does not merely fail to run; it quietly relocates whatever it was responsible for into the set of things that only work by accident.

The checks, in one place

None of this needed a new tool or a test framework. The whole hunt was four cheap passes, each of which you can point at your own service today:

Undefined names. pyflakes, or your language’s equivalent, for names used but never assigned. This alone caught the swallowed-NameError class above.
Frontend against backend. Every fetch() path cross-checked against every registered route — catches endpoints renamed on one side, or never built on the other.
Every script parses. A syntax check across all JavaScript. One unmatched brace had taken an entire page’s logic offline, silently, because the file failed to parse and no handler ever bound.
Reachability. Dead code after a return, migrations that never run, routes registered but linked from nowhere. The things that are “there” without being real.

The common thread: tests check the paths you thought to write tests for. These checks find the paths you forgot existed.

Shipped in the open

Every issue above is fixed and released — the swallowed NameErrors, the two missing backends, the stranded migration, and an invite page whose JavaScript had a single unmatched brace. We publish the changelog, and we would rather write down exactly what broke and why than round it off to “bug fixes.”

Because the honest version is the uncomfortable one: some of these had been quietly broken for a while, and our logs looked fine the entire time. If your own error handling is as defensive as ours, it is worth asking what it might be absorbing. The checks are cheap. The silence is the expensive part.