published
26 December 2024
by
Ray Morgan
updated
3 January 2025

International Components for Unicode

International Components for Unicode (ICU) is a set of libraries and tools that provide robust support for Unicode and globalization. It enables software to handle text, dates, numbers, and other locale-sensitive data in a way that works seamlessly across languages and cultures.

ICU is widely used in applications and systems requiring internationalization and localization. It forms the foundation for Unicode handling in many programming languages and platforms, including Java, C/C++, and PHP.

Key Features of ICU

  1. Unicode Support — Handles complex text processing with full Unicode compliance, including support for UTF-8, UTF-16, and UTF-32. Provides normalization and collation for consistent text comparison and sorting.
  2. Locale Support — Supports thousands of locales, enabling applications to cater to specific linguistic, cultural, and regional preferences.
  3. Date and Time Formatting — Formats and parses dates and times according to locale-specific conventions, including calendars like Gregorian, Islamic, and more.
  4. Number and Currency Formatting — Formats numbers, percentages, and currencies based on locale-specific rules.
  5. Text Processing — Includes utilities for case conversion, string comparison, and bidirectional (BiDi) text processing for languages like Arabic and Hebrew.
  6. Collation — Provides locale-aware sorting, allowing strings to be compared and ordered correctly in different languages.
  7. Transliteration — Converts text from one script to another, such as from Cyrillic to Latin or Devanagari to Latin.
  8. Resource Bundles — Manages locale-specific resources like strings, making it easier to localize applications.
  9. Pluralization Rules — Handles language-specific plural forms, crucial for generating grammatically correct text in different languages.
  10. Regular Expressions — Enhances regex handling with Unicode-aware matching and searching.

ICU in Programming

ICU includes two main components:

  • ICU4C: The C and C++ library for Unicode and globalization.
  • ICU4J: The Java library for Unicode and globalization.

It is often used via libraries or APIs in various programming languages:

  1. PHP — ICU functionality is accessible through the Intl extension, which provides classes like NumberFormatter, DateFormatter, and Normalizer.
  2. Java — Built into Java as part of the standard library through classes like java.text.Collator and java.util.Locale.
  3. C++ — Provides direct access to ICU4C for high-performance applications.
  4. Python — ICU features are available via libraries like PyICU.

Advantages of ICU

  1. Consistency Across Platforms — Ensures uniform behavior for Unicode and localization tasks, regardless of the platform.
  2. Extensive Locale Data — Backed by the Unicode Common Locale Data Repository (CLDR), the most comprehensive and up-to-date source of locale data.
  3. Open Source — Actively maintained and freely available, making it a reliable choice for global applications.
  4. Scalable and Reliable — Designed for large-scale applications, ICU powers systems like Java, Android, and many server-side technologies.

Common Use Cases

  1. Web Applications — Handling multilingual content, normalizing text, and ensuring locale-aware formatting of dates, numbers, and currencies.
  2. Databases — Collation and indexing for multilingual text.
  3. Mobile Applications — Supporting users in diverse languages and locales, including bidirectional text.
  4. Enterprise Systems — Ensuring consistent behavior across a company’s global IT infrastructure.

Limitations of ICU

  1. Complexity — With so many features, the library can feel overwhelming for simple tasks.
  2. Resource-Intensive — ICU is powerful but can be heavy in terms of memory and performance for smaller systems.
  3. Learning Curve — Requires understanding Unicode and internationalization concepts to use effectively.

Getting Started

Installing ICU

  • ICU is often pre-installed on many operating systems. For custom installations:
  • On Ubuntu: sudo apt install libicu-dev
  • On macOS (via Homebrew): brew install icu4c

Example in PHP (Intl Extension)

use NumberFormatter;
$locale = 'fr_FR'; // French (France)
$formatter = new NumberFormatter($locale, NumberFormatter::CURRENCY);
echo $formatter->formatCurrency(12345.67, 'EUR'); // Outputs: 12 345,67 €