Adam Hooper
Jan 13, 2022

--

MySQL’s utf8_general_ci is misleading in two ways:

  • utf8 means, “proprietary encoding” (as described in this article)
  • general means, “proprietary sorting alrogithm”

MySQL corrected both problems with utf8mb4_unicode_ci … but that led to a third MySQL-specific problem:

  • Unicode text sort order changes between Unicode versions. There is no “Unicode sort order”: there’s a “Unicode 9.0.0 sort order.”

At the time I write this, you should collate using utf8mb4_0900_ai_ci. There are also language-specific options. See https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html.

Or you can just collate with binary.

--

--