Design by error
Errors are the universal language of programming. Make them count.
A well-designed error helps fix bugs
The other day, I saw this server error in my inbox:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/site-packages/aiormq/base.py", line 115, in <lambda>
future.add_done_callback(lambda x: x.exception())
asyncio.exceptions.CancelledError
… and the service stalled. I had to restart it manually. There was no error message at all.
This is unacceptable.
I wrote a new library, carehare, to replace aiormq. Then I saw this new error:
Error during render of workflow [REDACTED]
Traceback (most recent call last):
...
carehare._exceptions.ConnectionClosedByServer: RabbitMQ closed the connection: 501 FRAME_ERROR - type 3, all octets = <<>>: {frame_too_large,160193,131064}
In the meantime, the service restarted automatically and I enjoyed the rest of my weekend.
What joy! The error message told me exactly what I wanted to know. It took 10 minutes to diagnose and fix this bug.
A good error is the difference between “the site’s down — panic!” and “no problemo — I’ll fix it Monday.”
The most important part of every API is its errors
Healthy error design helps programs hum.
Carehare was designed to raise the right error — in this case,ConnectionClosedByServer
— in the right place. With zero lines of code, the service auto-restarted instead of stalling. I enjoyed a peaceful weekend and fixed the bug on Monday.
This is what healthy error design can do for you.
Users have features; programmers have errors
Users play until something breaks. Programmers play until nothing breaks.
A user follows the happy path. Alice buys a product. Bob files taxes. Charlotte shares documents. Users usually see no errors at all.
A programmer dances from error to error. When you’re editing code, you encounter a syntax error, failing test or broken behavior every five minutes. Once you’ve cleared all the errors, you commit your code and move on.
And why does a programmer edit code in the first place? You’re either fixing a bug (an old happy path isn’t working) or creating a new feature (a new happy path isn’t working). Either way, you’re done when there are no more deviations from expected behavior — that is, no errors.
In production, everything that can go wrong will go wrong. Kind programmers contribute to a culture of helping people discuss and resolve errors.
… and now, some shade for Python asyncio
Over my 20 years of programming in over 20 languages, I have never seen an error as badly-designed as Python’s asyncio.CancelledError
. It can happen at any time, for any reason, without stack trace or error message. Python’s own documentation recommends against catching it.
And to add insult to insult, Python includes a decorator, asyncio.shield()
, that purports to “shield” a function from CancelledError
. Everybody wants to shield code from CancelledError
, but the shield function doesn’t do that one thing. (It only shields the called function from a CancelledError
in the caller function; but that doesn’t help when CancelledError
often comes from elsewhere without cause or documentation).
Heck, task cancellation isn’t even an error! Nor is it, well, feasible. Java famously deprecated its similar Thread.stop()
in 1998 because everybody who called it fell into a trap. Python’s asyncio
came 14 years later. It’s a trap. You can safely cancel asyncio.Queue.get()
, and that’s about it.
Don’t use asyncio.Future.cancel()
or asyncio.wait_for()
. They lead to catastrophe, because the errors they produce are hard to handle. For cancellation, pass a “stop” argument (maybe an asyncio.Future[None]
). For timeouts, use asyncio.wait()
.
Python’s asyncio
breaks all the rules. Let’s use it as inspiration formalize the Rules of Errors:
The Rules of Errors
- Obey your programming language. Java, Python and Ruby use Exceptions. Go uses “errors”. C uses errno. Rust uses Result. Scala uses Either. Don’t waste everyone’s time writing Results in Python or catching panics in Rust.
- Design for all possible errors. If you’re using a network, design for disconnects. If you’re writing to disk, design for failed permissions and full disks. If you’re accessing the database, design for an SQL error. And remember: you can have two errors at once.
- Design error abstraction layers. An HTTP library should treat
ClosedConnectionException
andSSLException
as distinct errors. A Twitter API wrapper can probably treat them all asIOException
. Many languages use inheritance for this. - Document every function’s errors. Emulate Java’s API documentation (example): for every function, list all the possible problems, what they mean, and how to fix them.
- Plan what happens next. Should an SSL error crash an app? Decide! Write documentation and error-message text to suggest what your hapless reader should do next. Make stack traces are legible. If an error causes another error, help the programmer log and address both.
- Test each error. Unit tests are quickest. They prove that a programmer who experiences the error can see it and handle it.
And then … you’re done!
The next time you set down to work, try this: design and implement all your errors first.
You’ll be shocked: after you’ve built and tested all the errors, you’ll be done!
The “happy path” manifests, like an epiphany, once errors are handled correctly.