Normalization in JavaScript
The .normalize()
method of String
objects normalizes strings into the standard form you specify. This is particularly useful for ensuring that strings containing characters with diacritical marks (like é or ê) or combining characters are treated consistently in comparisons or processing.
.normalize()
is useful for text comparison (normalizing strings to ensure diacritics or variations in Unicode representation don't cause unexpected results, searching and indexing (normalizing text before indexing or searching to ensure consistency), data cleaning (cleaning user-inputted text data to remove unintended encoding variations).
Syntax: str.normalize([form])
Parameters:
form
(optional): Specifies the Unicode normalization form. Possible values are:
- NFC (default): Normalization Form Canonical Composition
- NFD: Normalization Form Canonical Decomposition
- NFKC: Normalization Form Compatibility Composition
- NFKD: Normalization Form Compatibility Decomposition
Return Value: A new string in the specified normalization form.
Examples:
// Basic normalization: const str = "\u1E9B\u0323"; // "ẛ̣" (S with dot below and combining dot above) console.log(str.normalize("NFC")); // "ṩ" (precomposed single character) console.log(str.normalize("NFD")); // "ẛ̣" (split into components)
Use Case: Comparing strings; two strings may look identical but differ in their internal Unicode representation. .normalize() ensures consistency for accurate comparisons.
// Normalizing for comparison: const str1 = "e\u0301"; // e + combining acute accent const str2 = "é"; // single precomposed character console.log(str1 === str2); // false console.log(str1.normalize() === str2); // true
// Normalizing for compatibility: const str = "①"; console.log(str.normalize("NFKC")); // "1"