published
26 December 2024
by
Ray Morgan
updated
3 January 2025

Normalization in PHP

PHP provides functions for Unicode normalization through the Intl (Internationalization) extension, which includes the Normalizer class. This class provides methods for normalizing strings in different Unicode normalization forms.

Normalizer::normalize()

Normalizes a string to the specified normalization form.

Syntax:

string Normalizer::normalize(string $input, int $form = Normalizer::FORM_C)

Parameters:

$input: The string to be normalized.

$form: The desired normalization form. Constants available:

Normalizer::FORM_C (Canonical Composition, NFC)

Normalizer::FORM_D (Canonical Decomposition, NFD)

Normalizer::FORM_KC (Compatibility Composition, NFKC)

Normalizer::FORM_KD (Compatibility Decomposition, NFKD)

Return Value:

The normalized string, or false on failure.

Example:

use Normalizer;
$str = "e\u0301"; // "e" + combining acute accent
$normalized = Normalizer::normalize($str, Normalizer::FORM_C);
echo $normalized; // Outputs "é"


Normalizer::isNormalized()

Checks if a string is already in the specified normalization form.

Syntax:

bool Normalizer::isNormalized(string $input, int $form = Normalizer::FORM_C)

Parameters:

$input: The string to check.

$form: The normalization form to check against.

Return Value:

true if the string is normalized, false otherwise.

Example:

use Normalizer;
$str = "e\u0301"; // "e" + combining acute accent
if (!Normalizer::isNormalized($str, Normalizer::FORM_C)) {
    echo "String is not normalized.\n";
    $normalized = Normalizer::normalize($str, Normalizer::FORM_C);
    echo "Normalized string: $normalized\n"; // Outputs "é"
}


Requirements

The intl extension must be enabled in PHP.

To check if it’s installed:

var_dump(extension_loaded("intl"));


Installation

To install the extension (on Ubuntu, for example):

sudo apt install php-intl


Use Cases

Text Processing — Useful for comparing user input or handling text from different sources where Unicode composition can vary.

Database Storage — Normalize strings before storing to ensure consistency.

Multilingual Applications — Normalize text in internationalization workflows to handle diacritics or combining characters.


Limitations

Performance — Normalization can be computationally intensive for large datasets.

Availability — The intl extension must be enabled, which might not be the case in some shared hosting environments.