utf8mb4

Getting Emoji and multibyte characters on your Drupal 7 site with 7.50

Almost exactly a year ago, I wrote a blog post titled Solving the Emoji/character encoding problem in Drupal 7.

Since writing that post, Drupal 7 bugfixes and improvements have started to pick up steam as (a) many members of the community who were focused on launching Drupal 8 had time to take a breather and fix up some long-standing Drupal 7 bugs and improvements that hadn't yet been backported, and (b) there are two new D7 core maintainers. One of the patches I've been applying to many sites and hoping would get pulled into core for a long time was adding support for full UTF-8, which allows the entry of emojis, Asian symbols, and mathematical symbols that would break Drupal 7 sites running on MySQL previously.

My old blog post had a few steps that you could follow to make your Drupal 7 site 'mostly' support UTF-8, but there were some rough edges. Now that support is in core, the process for converting your existing site's database is more straightforward:

Solving the Emoji/character encoding problem in Drupal 7

Update: As of Drupal 7.50, Emoji/UTF-8 mb4 is now supported for MySQL (and other databases) in core! See the documentation page here for more information on how to configure it: Multi-byte UTF-8 support in Drupal 7. This blog post exists for historical purposes only—please see the Drupal.org documentation for the most up-to-date instructions!

On many Drupal 7 sites, I have encountered issues with Emoji (mostly) and other special characters (rarely) when importing content from social media feeds, during content migrations, and in other situations, so I finally decided to add a quick blog post about it.

Have you ever noticed an error in your logs complaining about incorrect string values, with an emoji or other special character, like the following:

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x98\x89" ...' for column 'body_value' at row 1: INSERT INTO {field_data_body} (entity_type, entity_id, revision_id, bundle, delta, language, body_value, body_summary, body_format) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); Array ( [:db_insert_placeholder_0] => node [:db_insert_placeholder_1] => 538551 [:db_insert_placeholder_2] => 538550 [:db_insert_placeholder_3] => story [:db_insert_placeholder_4] => 0 [:db_insert_placeholder_5] => und [:db_insert_placeholder_6] => <p>[EMOJI_HERE]</p> [:db_insert_placeholder_7] => [:db_insert_placeholder_8] => filtered_html ) in field_sql_storage_field_storage_write() (line 514 of /drupal/modules/field/modules/field_sql_storage/field_sql_storage.module).

(Note: Actual Emoji was removed from this summary post to prevent Drupal Planet's aggregator from barfing on the feed... due to this very issue!).

To fix this, you need to switch the affected MySQL table's encoding to utf8mb4, and also switch any table columns ('fields', in Drupal parlance) which will store Emojis or other exotic UTF-8 characters. This will allow these special characters to be stored in the database, and stop the PDOExceptions.