The Ultimate Guide to ICU Message Format

13 mins read

The Ultimate Guide to ICU Message Format

ICU message format is certainly one of the standards of translation messages in i18n. This article will cover the ICU i18n and ICU message format basics, and how Crowdin handles it, since ICU translation can be a bit tricky and preview is required here to make a proper translation.

What Does ICU Stand for?

As ICU documentation states, ICU means International Components for Unicode – a widely used set of C/C++ and Java libraries providing Unicode and globalization support for software and applications. ICU is released under a nonrestrictive open source license that is suitable for use with both commercial software and open source or free software.

Which Companies Are Using ICU and Why

Adobe, Amazon (Kindle), Apple, Dell, eBay, Google, HP, IBM Intel, Mozilla, Node.js, WMS Gaming, and many other companies and organizations are using ICU to simplify the process of localizing their software. The ICU library provides utilities for working with Unicode in Java, C/C++, and provides i18n functionality.

ICU format allows you to create user-friendly texts that combine the use of different plural, gender, date and time forms in one string, which will vary depending on who is the user. For example, “Sam sent you 2 messages” and “Emma sent you 1 message”.

If your goal is to maintain an application that supports a wide variety of languages, the International Components for Unicode (ICU) libraries are what you call for. ICU also has a sister project ICU4J that extends the internationalization capabilities of Java to a level similar to ICU. You can learn more about them by visiting the official ICU documentation page.

I18n Libraries That Support ICU Message Format

Many i18n libraries across programming languages and platforms have implemented ICU message format support. We’ll share the most popular with you, but remember that some of them implement different subsets of ICU message format. So be sure to read the documentation carefully to know what ICU features are supported by the library you choose.

C/C++

  • ICU4C – a Java library and a complete implementation of ICU.

Dart/Flutter

  • intl –the first-party Dart i18n package implements ICU message formatting.

  • Flutter i18 – Flutter’s first-party i18n library also uses ICU message formats.

Java

  • ICU4J – a complete implementation of ICU.

JavaScript

JavaScript doesn’t have an official, first-party i18n message format, but you may pick one of third-party libraries.

  • messageformat – built around the ICU MessageFormat standard and supports all the languages included in the Unicode CLDR.

  • i18next with ICU module – an official ICU extension for the i18next library.

PHP

  • Symfony – the web framework with ICU messages support. The Symfony documentation on how to translate messages using the ICU.

Python

  • PyICU – Python wrappers for the ICU C++ libraries.

The ICU Message Format and Module

Messages (i.e., strings) in applications are rarely completely static. They contain variables or other complex forms like pluralization. The ICU message format (hereinafter referred to as ICU) is an I18n format and a part of the formatting and parsing module in the ICU library. It allows you to easily format these “messages” according to language standards.

Here’s an example where ICU is used:

{username}, you have {messages, plural, one {have one} few {have # unread messages} other {have # unread messages}}

And the output you can receive:

Mark, you have 3 unread messages.

You can see that this string uses a variable for username ({username}) and plural for messages ({messages, plural, …}). In this article, we will talk about all of these and more types that are also called “arguments.”

ICU Usage and Translation

The basic usage of the ICU allows you to use placeholders (arguments) in your messages:

# translations/messages.en.yaml
say_hello: 'Hello {name}!'

Everything within the curly braces ({…}) is processed and replaced by its placeholder:

// prints "Hello Donald!"
echo $translator->trans('say_hello', ['name' => 'Donald']);
// prints "Hello Mary!"
echo $translator->trans('say_hello', ['name' => 'Mary']);

The curly-brace syntax allows to “modify” the output of the variable. For instance, implement the select function. It acts like PHP’s switch statement and allows you to use different strings based on the variable’s value. Typical usage of this is gender. Here’s the example:

ICU-example

The basic syntax for all functions is {variable_name, function_name, function_statement} (where, as you see later, function_statement is optional for some functions). In this case, the function name is select, and its statement contains the “cases” of this select (female, male, other). This function is applied over the organizer_gender variable:

// prints "Donald has invited you to his webinar!"
echo $translator->trans('invitation_title', [
    'organizer_name' => 'Donald',
    'organizer_gender' => 'male',
]);
// prints "Donald & Mary have invited you to their webinar!"
echo $translator->trans('invitation_title', [
    'organizer_name' => 'Donald & Mary',
    'organizer_gender' => 'not_applicable',
]);

Working with the ICU Message Format in Crowdin

Crowdin not only supports ICU Message syntax and arguments, but also enable a preview option for translators.

Crowdin supports ICU Message syntax for the following types of arguments:

  • Plurals
  • Numbers
  • Time
  • Dates
  • Select

This way, the syntax is highlighted, and the translation process stays convenient and easy. It lets translators minimize mistakes, review defined message structure, and maintain the quality of the translation outcome.

[Free E-book]Localize your content with Crowdin

Learn how to set up a continuous localization workflow to grow with multiple languages faster than with one. Experience and tips from 10+ localization experts.

ICU Translation in the Crowdin Editor

ICU syntax arguments are always highlighted in the translation Editor, so translators will see which part of the string shouldn’t be translated. It’s possible to change the position of arguments in a translation string to follow the natural word order in the target language.

You can work with ICU strings as parfait of your files, or filter them to translate separately. To begin your work with ICU, go to your project > open Editor > Advanced Filter > String Type > ICU.

One of the most handy features for working with ICU strings is a preview mode. With it, translators will see how the translation will be displayed in the final UI, thus making sure that all the translatable elements are translated.

ICU Message Syntax in the Crowdin Editor

Watch this 4-minute video to learn about Crowdin

Plural

Plural type arguments are used to handle plural category variations, as each language has its own rules for handling plurals. Pluralization uses CLDR rules for the given locale.

What is CLDR? Well, let’s change the subject for a bit and discover. CLDR stands for Common Locale Data Repository, and it’s the official Unicode collection of l10n data. The CLDR can give you the locale’s script for a given locale. It’s preferred calendar, number system, date formats, pluralization rules, and more. The CLDR is used by the main ICU project and other libraries that implement ICU features.

Back to plurals. Some languages have two forms of plurals, like English. Some languages have only a single form, and others have multiple forms.

Plural categories include:

  • zero
  • one (singular)
  • two (dual)
  • few (paucal)
  • many (also used for fractions if they have a separate class)
  • other (required—general plural form—also used if the language only has a single form)

In the Crowdin Editor, you don’t have to manually add or delete plural categories to the translations you are making. To stay sure that your translators won’t break the code, click Copy Source, and the source string will be copied to the translation field with the number of plural categories suitable for the current target language. Be sure to view a list of Language Plural Rules.

ICU Translation:Plurals at Crowdin

Select

The Select type is often used to represent the appropriate gender-based inflections or verb conjugations for a given message.

ICU Translation:Select at Crowdin

Number

The number type displays different values such as percentages, decimal numbers, and currency based on the locale conventions. This enables adjustment of the message output to the formats used for different locales.

ICU:Number at Crowdin

Date and Time

Date and time types show date and time values according to the preferred formatting for a given locale. ICU message syntax supports predefined custom date formatting.

There are 4 predefined date formats:

  • Short
  • Medium
  • Long
  • Full

ICU:Date and Time at Crowdin

Define Mistakes Easier with Syntax Error Detection

Syntax error detection is one of the quality assurance (QA) check parameters in Crowdin. It helps you efficiently handle different language-specific aspects in translations. They will help detect common mistakes easily and quickly fix them before building your project and downloading translations.

Syntax error detection significantly reduces confusion during translation of ICU Message syntax, as the Crowdin platform automatically identifies potential mistakes in the translation and shows them on the preview part of the Editor. To enable detection, go to project > Home > QA > ICU Syntax. If a syntax error is found, translators will see a “Syntax error” message suggesting what should be fixed, like in the example below.

ICU:Define Mistakes Easier with Syntax Error Detection

Translation errors, however small, can have a big impact on your product’s success. Try Crowdin QA checks to omit them. With the help of QA checks, you can ensure that there are no missed commas, extra spaces, or typos and translations are formatted the same way as the source strings, thus fitting the UI just as well. Learn how to ensure quality translations with Crowdin.

Localize with Crowdin

When you’re working on localization, a few things will make you more efficient, and the Crowdin platform is one of them. Crowdin supports the ICU translation and gives your translators automatic ICU syntax checking and highlighting.

Along with it, with Crowdin, you can automate localization, track l10n progress, give your team an intuitive UI to work in, and provide developers with the ability to sync translation files through the CLI. You can release several multilingual versions of your app or software simultaneously, and much more. To discover more Crowdin features, book a demo or watch an on-demand one.

Localize your product with Crowdin

Automate content updates, boost team collaboration, and reach new markets faster.
Diana Voroniak

Link
Previous Post
What’s New at Crowdin: March 2022
Next Post
Unity Game Localization with Crowdin Plugin