Adam Hooper
Jan 13, 2022

--

MySQL’s utf8_general_ci is misleading in two ways:

  • utf8 means, “proprietary encoding” (as described in this article)
  • general means, “proprietary sorting alrogithm”

MySQL corrected both problems with utf8mb4_unicode_ci … but that led to a third MySQL-specific problem:

  • Unicode text sort order changes between Unicode versions. There is no “Unicode sort order”: there’s a “Unicode 9.0.0 sort order.”

At the time I write this, you should collate using utf8mb4_0900_ai_ci. There are also language-specific options. See https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html.

Or you can just collate with binary.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Adam Hooper
Adam Hooper

Written by Adam Hooper

Journalist, ex software engineer

No responses yet

What are your thoughts?