Date APIs and their impossible promises
When we brainstorm Workbench features, I have a mantra: don’t promise the impossible.
Last month we designed our date and time system. My mantra came up again and again. A lot of APIs promise the impossible.
TL;DR when you’re programming with dates and times, store data in two formats:
- “Timestamp” or “Instant”: a moment on Earth. Almost everyone should use UTC (Coordinated Universal Time) — not in a timezone.
- “Date” or “Calendar date”: a wishy-washy concept. We read dates in news articles and publish dates in annual reports. Store them in a format very different from a timestamp — not in UTC, and not in a timezone.
If you try and introduce timezones, you can easily promise the impossible.
Here are some impossible promises. Treat them as cautionary tales.
Impossible promise: our API can handle timestamps in local time
First, let’s clear up confusion. UTC, Coordinated Universal Time, is not a timezone. UTC is a standard for pinpointing moments in time. UNIX timestamps (integer “seconds since the epoch”) fit this description.
Computers think in UTC. Humans think in timezones.
Date APIs bridge the gap. They can do the easy thing — store timestamps in UTC and format them in the requested timezone — or they can do something else.
So many APIs do something else. So, so many.
SQL may have landed in the worst place. I can’t find two SQL databases that treat
TIMESTAMP the same way.
- Microsoft SQL Server has
DATETIME2for UTC timestamps and
DATETIMEOFFSETfor the misguided idea, “timestamp in an offset” — but oops, offsets aren’t timezones.
- Oracle has
TIMESTAMP. It can be used for UTC. Then there’s
TIMESTAMP WITH TIME ZONE(stored as a local-time timestamp and an arbitrary timezone) and as almost an admission of defeat,
TIMESTAMP WITH LOCAL TIMEZONE(which stores values in something called the “database timezone” — which can be UTC, even though UTC isn’t a timezone — and displays values in another timezone).
- PostgreSQL is simply backwards. Its
TIMESTAMPis shorthand for
TIMESTAMP WITHOUT TIME ZONE… which actually means, “datetime” (“timestamp in a timezone but we don’t know which one”). Always write
TIMESTAMP WITH TIME ZONEbecause — prepare for your brain to melt —
WITH TIME ZONEforces the value to always be UTC. (Remember: UTC isn’t a timezone!)
- MySQL gets it right.
TIMESTAMPvalues are always UTC. MySQL has a
DATETIMEtype for developers with less-common requirements.
SQL promised a standard; but this is clearly not standardized.
It’s no wonder there’s no standard: the promise of storing timestamps in timezones leads to impossible math.
It is impossible to convert from “2010–11–07T05:30Z” to Eastern Time without extra instruction. Because of daylight savings, that UTC timestamp has two different Eastern datetimes. Similarly, it is impossible to convert “2010–03–14T02:30 Eastern” to UTC, because that local datetime does not exist. Date APIs invent unintuitive settings like, “when a datetime is ambiguous, pick the earlier;” or “when a datetime doesn’t exist, round it down.”
Nobody deserves this level of confusion. Always store timestamps in UTC.
Here are some tricks for avoiding wasted effort:
- In Python, never store a
datetime.datetimevalue. Attach the “UTC” timezone [sic — UTC is not a timezone] only in the one line of code prior to converting to the user’s timezone for display. (In Django, set
USE_TZ=Falseimmediately upon creating any project.)
- In Java, use
- In C, use
Impossible promise: our API hide timezones from users
Date stores a UTC date. So far so good.
Date. What does a reader in Los Angeles read?
Programmers get confused by this. They think they did something wrong. (Really, the API broke its promise.) Programmers think they can fix it with local-datetime math. Really, that’s impossible. date-fns-tz promises the impossible.
Date. Only a completely different library can convey the information, “this story is in Eastern time.” (The “Temporal” language proposal should eventually lead to a solution.)
So how do I store timezones along with timestamps?
The same way you store units alongside numbers: in two separate places in memory.
Here’s why: timezones and timestamps always come from different places. Every timestamp (UTC) comes from a global cabal of caesium-addled scientists. Every timezone comes from the context: the user’s settings, in an online calendar app; headquarters, in a medium-sized company; the writer, in the case of a news article; the user’s web browser, surprisingly rarely; and so on.
We humans always transmit timestamp and timezone together, fluently. We have an uncanny knack for inferring timezone from the context. I can tell my brother, “See you tomorrow at one” and he’ll know exactly what I mean.
Computers can’t infer timezones the way humans do. Don’t use date/time APIs that pretend otherwise.
Sometimes, we don’t know or care about timezone. Canada’s yearly population statistics are gathered in five timezones. Nobody knows or cares about the timezone — people ask for yearly population counts. Well, here come more broken promises….
Impossible promise: our Timestamp type can represent a Datetime
Date math and timestamp math are very different.
We compare dates using a calendar. Date math counts days, weeks, months, quarters and years. Different days have different lengths. (Think Daylight Savings.) Different months have different durations.
Date to store, well, a date, you can’t perform date math.
Date stores a timestamp, and most timestamps have two dates.
So they shouldn’t have called it
Workbench’s “Date” type stores, “days since 1970–01–01”. Workbench’s “Timestamp” type stores, “nanoseconds since the Epoch, UTC.”
What’s the difference between a timestamp and a datetime?
Most date APIs have a “datetime” type. Don’t confuse it with a timestamp.
A datetime is a tuple:
year, month, day, hour, minute, second, fraction.
Some date APIs call that a “naive datetime.” They might add a timezone and call it “datetime with timezone.”
Python’s datetime.datetime is a good example. It’s an object with nine fields, costing oodles of memory.
Datetimes have their place. They’re critical for converting to and from date and for parsing and formatting timestamps. They’re useful for calendar math — changing an appointment from Wednesday to Thursday, for instance. They’re useful if you’re moving your alarm clock into another timezone and you want it to wake you up at the “new” 7 p.m.
In other words: datetimes are important for date API creators.
But datetimes aren’t important for the rest of us. In most cases, datetimes are useless:
- To calculate number of hours between datetimes, you need timestamps. (Daylight savings means you don’t know how many hours are between 1am and 3am.)
- You can’t convert a future datetime to a timestamp. (Timezones are chosen by politicians; the conversion math depends on laws that haven’t been written yet.)
- You can’t pass datetimes between two programs using a standard storage format. (ISO8601 can’t store timezones or even indicate “naive datetime”.)
And datetimes are downright dangerous:
- Datetimes look and feel like timestamps.
Some tools, like Apache Parquet and SQL, let you store datetimes using 64-bit integers. This is clever — too clever, in my opinion. Sure, a datetime type is 2–4 bytes more efficient than storing date and time-of-day in two separate columns. But I’ll wager most people using datetime columns didn’t choose datetime to save bytes: they chose datetime because they were misled by documentation or names.
In my 21 years of programming experience, I have never seen an appropriate reason to store datetimes in a file or database, or to transmit datetimes over the Internet. (There are valid reasons; but they’re rare.) On the other hand, I cannot count the number of bugs I’ve seen in code — much of it mine — that accidentally used datetime instead of timestamp.
API designers, please: make it easy to use timestamps and make it hard to use datetimes. The java.time package got this right (much too late).
Impossible promise: our API can magically parse any date
I could write a long list of broken parsers here. Better is to write a list of parsers that can magically parse any date:
That’s right. It is impossible to magically parse any date.
A date-parsing library must state what formats accepts and what formats it does not accept. The most impressive aspect of a date-parsing library is not its speed or its ability to convert “anything” to a date: the impressive aspects are its settings, its documentation, and its ability to detect and report errors.
timeparser packs plenty of power in one string of settings and five paragraphs of documentation. It doesn’t cover everything; but what it does handle, it handles well.
strptimebehaves as advertised.
Storing moments in time? Use timestamps.
Don’t store timezones until you literally cannot solve your problems without timezones.
When you store a timezone alongside a timestamp, name it! Don’t call it
timezone. Call it
formattingTimezoneBecauseWeOnlyEverFormatThisTimestampInThisTimezone, or something equally descriptive.
Disagree with me about the dangers of datetimes? I’d love to hear why.