Date APIs and their impossible promises
When we brainstorm Workbench features, I have a mantra: don’t promise the impossible.
Last month we designed our date and time system. My mantra came up again and again. A lot of APIs promise the impossible.
TL;DR when you’re programming with dates and times, store data in two formats:
- “Timestamp” or “Instant”: a moment on Earth. Almost everyone should use UTC (Coordinated Universal Time) — not in a timezone.
- “Date” or “Calendar date”: a wishy-washy concept. We read dates in news articles and publish dates in annual reports. Store them in a format very different from a timestamp — not in UTC, and not in a timezone.
That’s it.
If you try and introduce timezones, you can easily promise the impossible.
Here are some impossible promises. Treat them as cautionary tales.
Impossible promise: our API can handle timestamps in local time
First, let’s clear up confusion. UTC, Coordinated Universal Time, is not a timezone. UTC is a standard for pinpointing moments in time. UNIX timestamps (integer “seconds since the epoch”) fit this description.
Computers think in UTC. Humans think in timezones.
Date APIs bridge the gap. They can do the easy thing — store timestamps in UTC and format them in the requested timezone — or they can do something else.
So many APIs do something else. So, so many.
SQL may have landed in the worst place. I can’t find two SQL databases that treat TIMESTAMP
the same way.
- Microsoft SQL Server has
DATETIME2
for UTC timestamps andDATETIMEOFFSET
for the misguided idea, “timestamp in an offset” — but oops, offsets aren’t timezones. - Oracle has
TIMESTAMP
. It can be used for UTC. Then there’sTIMESTAMP WITH TIME ZONE
(stored as a local-time timestamp and an arbitrary timezone) and as almost an admission of defeat,TIMESTAMP WITH LOCAL TIMEZONE
(which stores values in something called the “database timezone” — which can be UTC, even though UTC isn’t a timezone — and displays values in another timezone). - PostgreSQL is simply backwards. Its
TIMESTAMP
is shorthand forTIMESTAMP WITHOUT TIME ZONE
… which actually means, “datetime” (“timestamp in a timezone but we don’t know which one”). Always writeTIMESTAMP WITH TIME ZONE
because — prepare for your brain to melt —WITH TIME ZONE
forces the value to always be UTC. (Remember: UTC isn’t a timezone!) - MySQL gets it right.
TIMESTAMP
values are always UTC. MySQL has aDATETIME
type for developers with less-common requirements.
SQL promised a standard; but this is clearly not standardized.
It’s no wonder there’s no standard: the promise of storing timestamps in timezones leads to impossible math.
It is impossible to convert from “2010–11–07T05:30Z” to Eastern Time without extra instruction. Because of daylight savings, that UTC timestamp has two different Eastern datetimes. Similarly, it is impossible to convert “2010–03–14T02:30 Eastern” to UTC, because that local datetime does not exist. Date APIs invent unintuitive settings like, “when a datetime is ambiguous, pick the earlier;” or “when a datetime doesn’t exist, round it down.”
Nobody deserves this level of confusion. Always store timestamps in UTC.
Here are some tricks for avoiding wasted effort:
- In Python, never store a
tzinfo
in yourdatetime.datetime
value. Attach the “UTC” timezone [sic — UTC is not a timezone] only in the one line of code prior to converting to the user’s timezone for display. (In Django, setUSE_TZ=False
immediately upon creating any project.) - In Java, use
java.time.Instant
. - In C, use
gmtime
and notlocaltime
. - In JavaScript … sigh. JavaScript
Date
does store UTC as it should, but JavaScript made another impossible promise:
Impossible promise: our API hide timezones from users
A JavaScript Date
stores a UTC date. So far so good.
But it also “stores” a timezone. That’s always the user’s timezone — the user viewing the web page. JavaScript promises: “the user’s timezone is always the timezone you want.”
It isn’t.
Picture an online news article about a road accident in Connecticut. The journalist wrote a timeline — “8 p.m. Mary set off in her car…” and alongside that, a JavaScript-powered visual displays timestamps. The article text is HTML; the JavaScript timestamps are UTC Date
. What does a reader in Los Angeles read?
The answer is: in Los Angeles, the HTML will say “8 p.m.” and the JavaScript visual will say “11 p.m.”
Programmers get confused by this. They think they did something wrong. (Really, the API broke its promise.) Programmers think they can fix it with local-datetime math. Really, that’s impossible. date-fns-tz promises the impossible.
The solution: well, you can use some hacks with the Intl API. (Format dates, then parse them using a regular-expression library….) But truly, if you need timezones, avoid JavaScript’s Date
. Only a completely different library can convey the information, “this story is in Eastern time.” (The “Temporal” language proposal should eventually lead to a solution.)
JavaScript promised that timezones could be simple. That was impossible.
So how do I store timezones along with timestamps?
The same way you store units alongside numbers: in two separate places in memory.
Here’s why: timezones and timestamps always come from different places. Every timestamp (UTC) comes from a global cabal of caesium-addled scientists. Every timezone comes from the context: the user’s settings, in an online calendar app; headquarters, in a medium-sized company; the writer, in the case of a news article; the user’s web browser, surprisingly rarely; and so on.
We humans always transmit timestamp and timezone together, fluently. We have an uncanny knack for inferring timezone from the context. I can tell my brother, “See you tomorrow at one” and he’ll know exactly what I mean.
Computers can’t infer timezones the way humans do. Don’t use date/time APIs that pretend otherwise.
Sometimes, we don’t know or care about timezone. Canada’s yearly population statistics are gathered in five timezones. Nobody knows or cares about the timezone — people ask for yearly population counts. Well, here come more broken promises….
Impossible promise: our Timestamp type can represent a Datetime
Date math and timestamp math are very different.
We compare dates using a calendar. Date math counts days, weeks, months, quarters and years. Different days have different lengths. (Think Daylight Savings.) Different months have different durations.
If you use a JavaScript Date
to store, well, a date, you can’t perform date math. Date
stores a timestamp, and most timestamps have two dates.
So they shouldn’t have called it Date
, see!
Workbench’s “Date” type stores, “days since 1970–01–01”. Workbench’s “Timestamp” type stores, “nanoseconds since the Epoch, UTC.”
What’s the difference between a timestamp and a datetime?
Most date APIs have a “datetime” type. Don’t confuse it with a timestamp.
A datetime is a tuple: year, month, day, hour, minute, second, fraction
.
Some date APIs call that a “naive datetime.” They might add a timezone and call it “datetime with timezone.”
Python’s datetime.datetime is a good example. It’s an object with nine fields, costing oodles of memory.
Datetimes have their place. They’re critical for converting to and from date and for parsing and formatting timestamps. They’re useful for calendar math — changing an appointment from Wednesday to Thursday, for instance. They’re useful if you’re moving your alarm clock into another timezone and you want it to wake you up at the “new” 7 p.m.
In other words: datetimes are important for date API creators.
But datetimes aren’t important for the rest of us. In most cases, datetimes are useless:
- To calculate number of hours between datetimes, you need timestamps. (Daylight savings means you don’t know how many hours are between 1am and 3am.)
- You can’t convert a future datetime to a timestamp. (Timezones are chosen by politicians; the conversion math depends on laws that haven’t been written yet.)
- You can’t pass datetimes between two programs using a standard storage format. (ISO8601 can’t store timezones or even indicate “naive datetime”.)
And datetimes are downright dangerous:
- Datetimes look and feel like timestamps.
Some tools, like Apache Parquet and SQL, let you store datetimes using 64-bit integers. This is clever — too clever, in my opinion. Sure, a datetime type is 2–4 bytes more efficient than storing date and time-of-day in two separate columns. But I’ll wager most people using datetime columns didn’t choose datetime to save bytes: they chose datetime because they were misled by documentation or names.
In my 21 years of programming experience, I have never seen an appropriate reason to store datetimes in a file or database, or to transmit datetimes over the Internet. (There are valid reasons; but they’re rare.) On the other hand, I cannot count the number of bugs I’ve seen in code — much of it mine — that accidentally used datetime instead of timestamp.
Who can fault us? We’re only human. And date APIs from Postgres’ to Java’s to JavaScript’s swap names and invert meanings. It’s a jungle out there.
API designers, please: make it easy to use timestamps and make it hard to use datetimes. The java.time package got this right (much too late).
Impossible promise: our API can magically parse any date
I could write a long list of broken parsers here. Better is to write a list of parsers that can magically parse any date:
- …
That’s right. It is impossible to magically parse any date.
A date-parsing library must state what formats accepts and what formats it does not accept. The most impressive aspect of a date-parsing library is not its speed or its ability to convert “anything” to a date: the impressive aspects are its settings, its documentation, and its ability to detect and report errors.
Nice libraries:
- Go’s
time
parser packs plenty of power in one string of settings and five paragraphs of documentation. It doesn’t cover everything; but what it does handle, it handles well. - POSIX
strptime
behaves as advertised.
In summary
Storing moments in time? Use timestamps.
Don’t store timezones until you literally cannot solve your problems without timezones.
When you store a timezone alongside a timestamp, name it! Don’t call it timezone
. Call it parsedTimezone
, userTimezone
, companyTimezone
, serverTimezone
, originRecordTimezone
, formattingTimezoneBecauseWeOnlyEverFormatThisTimestampInThisTimezone
, or something equally descriptive.
Disagree with me about the dangers of datetimes? I’d love to hear why.