published
28 December 2024
by
Ray Morgan
updated
3 January 2025

Code Point Planes

By the end of this lesson, you should understand: - How Unicode organizes over 1 million code points into 17 planes for efficient processing and referencing. - The role of the Basic Multilingual Plane (BMP) in supporting most common characters and ensuring compatibility with older systems. - The purpose of supplementary planes for specialized, historical, and rarely used scripts and symbols. - The flexibility provided by private use planes for creating custom characters and symbols without conflicting with standard Unicode assignments. - The practical importance of Unicode Planes in ensuring scalability, compatibility, and adaptability in text processing and software development.

In Unicode, code points are divided into 17 planes to organize the vast number of characters efficiently and facilitate easier referencing and processing. Each plane contains 65,536 code points, making the total possible code points 1,114,112 (0x0000 to 0x10FFFF).

Dividing code points into planes allows Unicode encoding forms (like UTF-8 and UTF-16) to handle characters of varying ranges more efficiently. Software systems and text processing algorithms can use the division into planes to optimize processing. For example, checking if a character belongs to the BMP or a supplementary plane might dictate different handling.

The division of code points into planes is a pragmatic design choice that organizes Unicode into manageable sections, maintains compatibility with legacy systems, and supports future growth. It balances simplicity for common cases (most characters in BMP) with flexibility for more extensive and specialized character sets.

Logical Organization

Unicode planes group characters by type or purpose. For example:

Plane 0, the Basic Multilingual Plane (BMP), contains the most commonly used characters, including scripts for many modern languages. The BMP includes code points that are addressable using 16-bit values, making it compatible with older systems and UTF-16's two-byte encoding without requiring surrogate pairs. Characters in the BMP can be represented more compactly in UTF-16, while characters in supplementary planes require additional surrogate pairs.

Plane 1, the Supplementary Multilingual Plane (SMP), is used for historic and rare scripts, and some symbols.

Plane 2, the Supplementary Ideographic Plane (SIP), is reserved primarily for additional CJK (Chinese, Japanese, Korean) ideographs.

Other supplementary planes: Reserved for less common or specialized uses, such as musical notation, emoji, and other symbols.

Planes 15 and 16, the Private Use Planes, are reserved for private use or specific applications, allowing organizations or industries to define their characters without conflicting with the standard.

Ease of Extension

New planes allow Unicode to expand without disrupting existing assignments, ensuring backward compatibility. For example, if new symbols or scripts are discovered or needed, they can be added to higher planes without rearranging existing code points.

Private Use Areas

Private use area are ranges of code points set aside specifically for users or organizations to define their own characters, symbols, or glyphs without conflicting with officially assigned Unicode characters. These area are part of the Unicode standard but are intentionally left unassigned to allow for custom extensions.

Unicode includes three private use areas (PUAs):

  • In the Basic Multilingual Plane (Plane 0), the range of 6,400 code points from U+E000 to U+F8FF is a reserved as a “Private Use Area.”
  • Entire Plane 15 (U+F0000 to U+FFFFD) — Supplementary Private Use Area-A (PUA-A)
  • Entire Plane 16 (U+100000 to U+10FFFD) — Supplementary Private Use Area-B (PUA-B)

What Are They Used For?

Private use planes offer a flexible way to add custom characters and symbols to Unicode text, catering to specialized or proprietary needs. Their use requires coordination and thorough documentation to maintain consistency and usability within a specific context, as they are intended for cases where the required elements are not part of the official Unicode standard.

Common use cases include:

Corporate Logos and Symbols  Organizations might encode their logos, icons, or proprietary symbols for internal use.

Custom Script Support  Linguists or cultural groups might define characters for historical or lesser-known scripts not yet included in Unicode.

Font-Specific Glyphs  Fonts often use PUA code points to define glyphs not present in Unicode, such as alternate letterforms, ligatures, or stylistic variants.

Software and System Symbols  Operating systems or applications might use private use characters for non-standard icons, UI elements, or control codes.

Gaming or Specialized Content  Game developers might encode unique symbols or pictograms for in-game use.

Experimental Characters  Researchers testing new character sets or scripts can use PUAs before submitting them for inclusion in Unicode.

Iconography (e.g., Emoji-like Symbols)  Some applications encode private-use emoji-like symbols for custom sets.

How Are They Used?

Font Implementation — A font maps specific glyphs to private use code points. The association is internal to the font and works seamlessly when the font is used.

Custom Encoding — Developers map private use code points to application-specific meanings or characters. For example, an application might interpret U+E001 as a particular symbol or action.

Interchange Agreements — When exchanging text that uses private use characters, all parties must agree on their meaning since they have no universal semantics.

Avoiding Conflicts — To prevent clashes, organizations and developers typically reserve specific ranges for their use and document the mappings.

Fallback Handling — If a font or system doesn't recognize private use characters, they may display as empty boxes, question marks, or other generic placeholders.

Limitations of Private Use Planes

No Standardized Meaning — Unlike officially assigned Unicode characters, private use code points have no defined semantics, which can cause interoperability issues.

Font and System Dependency — The glyphs associated with private use code points are specific to the font or system. Without the correct font or application, the characters may not render correctly.

Not Suitable for Public Standards — Private use areas are meant for internal or agreed-upon use only. Characters that need broad adoption should be proposed for inclusion in Unicode.