Normalization in PHP
PHP provides functions for Unicode normalization through the Intl (Internationalization) extension, which includes the Normalizer class. This class provides methods for normalizing strings in different Unicode normalization forms.
Normalizer::normalize()
Normalizes a string to the specified normalization form.
Syntax:
string Normalizer::normalize(string $input, int $form = Normalizer::FORM_C)
Parameters:
$input:
The string to be normalized.
$form:
The desired normalization form. Constants available:
Normalizer::FORM_C
(Canonical Composition, NFC)
Normalizer::FORM_D
(Canonical Decomposition, NFD)
Normalizer::FORM_KC
(Compatibility Composition, NFKC)
Normalizer::FORM_KD
(Compatibility Decomposition, NFKD)
Return Value:
The normalized string, or false
on failure.
Example:
use Normalizer; $str = "e\u0301"; // "e" + combining acute accent $normalized = Normalizer::normalize($str, Normalizer::FORM_C); echo $normalized; // Outputs "é"
Normalizer::isNormalized()
Checks if a string is already in the specified normalization form.
Syntax:
bool Normalizer::isNormalized(string $input, int $form = Normalizer::FORM_C)
Parameters:
$input:
The string to check.
$form:
The normalization form to check against.
Return Value:
true
if the string is normalized, false
otherwise.
Example:
use Normalizer; $str = "e\u0301"; // "e" + combining acute accent if (!Normalizer::isNormalized($str, Normalizer::FORM_C)) { echo "String is not normalized.\n"; $normalized = Normalizer::normalize($str, Normalizer::FORM_C); echo "Normalized string: $normalized\n"; // Outputs "é" }
Requirements
The intl
extension must be enabled in PHP.
To check if it’s installed:
var_dump(extension_loaded("intl"));
Installation
To install the extension (on Ubuntu, for example):
sudo apt install php-intl
Use Cases
Text Processing — Useful for comparing user input or handling text from different sources where Unicode composition can vary.
Database Storage — Normalize strings before storing to ensure consistency.
Multilingual Applications — Normalize text in internationalization workflows to handle diacritics or combining characters.
Limitations
Performance — Normalization can be computationally intensive for large datasets.
Availability — The intl extension must be enabled, which might not be the case in some shared hosting environments.