Skip to content

MySQL 8 Character set conversion

As explained during the ASDF session of March 30th 2023 during the DBOD Works & Deadlines 2023 talk, with the migration to MySQL version 8, the character sets utf8 or utf8mb3 become deprecated and will be removed in any future release. MySQL recommends to convert those character sets to utf8mb4. In fact, or even worse, utf8 that is currently an alias for the character set utf8mb3, will be an alias to utf8mb4 in a future release. So it is also recommended to use utf8mb4 in order to be unambiguous.

The good aspect of converting to utf8mb4 is that you can store more symbols (including emojis), new collations (specially for Asian languages), and it's generally faster [1][2] using utf8mb4 on MySQL 8 vs utf8mb3 on MySQL 5.7. utf8mb4 is also the new reference for testing, improvements, and optimizations for MySQL developers.

[1] - https://dev.mysql.com/blog-archive/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/

[2] - https://www.percona.com/blog/charset-and-collation-settings-impact-on-mysql-performance/

On the other hand, using utf8mb4 means that the maximum number of bytes that one character can take is four bytes instead of three, and this may affect the amount of information that you can store in your text columns and indexes (the length of a column or index in bytes will still be the same, but you could be using more bytes for the same amount of characters in your CHAR, VARCHAR, or TEXT columns and indexes)

So converting to utf8mb4 may require to change some column or index definitions:

  • increase column or key size, or use a larger data type to make space for up to 4 bytes/char.
  • if hitting limits, reduce column or key size (in char) if the application is still okay with it.

You can also read this Percona article about things to consider when migrating to utf8mb4 for more information.

So once you have migrated your instance from MySQL 5.7 to MySQL 8, migrating to utf8mb4 is not mandatory but it is very much recommended, specially if you don't want to get a bad surprise in a next minor release of MySQL 8.

Fortunately, we have automated this process to make migrations easier.

From your instance view in the DBOD web interface, if your instance is in version 8, you will see this icon:

instance view button

If you click in it, you will open the Character Set Conversion window:

character set conversion window

where you can select if you want to upgrade to utf8mb4 only the tables and columns defined with utf8/utf8mb3 or any other existing character set too, like latin, etc.

We strongly recommend you to try this first in a clone, to evaluate the changes and how these may affect your applications. You can also try first a 'dry run', in which case the job will only show you the changes that it would apply to your schema but nothing else.

Once you Confirm these changes in the window, a job will be executed to execute the conversion. Once it finishes, you can go to your CERNBox account and check a new folder called charset_conversion_reports under Shares --> Shared with me --> \<your_instance_name> folder.

CERNBox charset conversion reports folder

In this folder you will find three or five different files (depending on if you selected just a dry run or not).

files generated after charset conversion

  • the alter_statements-<timestamp>.sql file contains ALL the SQL statements that will be executed to perform the character set conversion of the instance.
  • the alter_statements_result-<timestamp>.log file constains the result of the execution of those statements.
  • the dump-before-charset_conversion-<timestamp>.sql file contains a DDL dump of your database objects before the character conversion.
  • the dump-after-charset_conversion-<timestamp>.sql file contains a DDL dump of your database objects after the character conversion.
  • finally, the dump-diff-<timestamp>.log file contains a diff of the previous two files to better identify what changed.