Jan 13, 2022
MySQL’s utf8_general_ci
is misleading in two ways:
utf8
means, “proprietary encoding” (as described in this article)general
means, “proprietary sorting alrogithm”
MySQL corrected both problems with utf8mb4_unicode_ci
… but that led to a third MySQL-specific problem:
- Unicode text sort order changes between Unicode versions. There is no “Unicode sort order”: there’s a “Unicode 9.0.0 sort order.”
At the time I write this, you should collate using utf8mb4_0900_ai_ci
. There are also language-specific options. See https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html.
Or you can just collate with binary
.