A computing standard for consistent encoding and handling of text in most world writing systems.

What Is Unicode? Definition & Examples

What is Unicode?

Unicode is a universal computing standard for consistent encoding, representation, and handling of text in most of the world's writing systems. In the domain industry, Unicode enables Internationalized Domain Names (IDNs) containing non-Latin characters such as Chinese, Arabic, Cyrillic, and other scripts. Unicode assigns a unique code point to every character across all languages, ensuring consistent representation across different systems.

Unicode in Domain Names

IDN Support

Unicode enables domains like:

例え.jp (Japanese)
مثال.مصر (Arabic)
пример.рф (Russian Cyrillic)
例子.中国 (Chinese)

Punycode Conversion

DNS uses ASCII, so Unicode domains convert to Punycode:

Unicode: münchen.de Punycode: xn--mnchen-3ya.de Unicode: 北京.中国

Punycode: xn--1lq90i.xn--fiqs8s

Unicode Code Points

Structure

Format: U+XXXX (hexadecimal)

Examples:
A = U+0041 (Latin A)
а = U+0430 (Cyrillic a)
中 = U+4E2D (Chinese character)

Character Blocks

Block	Range	Script
Basic Latin	U+0000-007F	English/ASCII
Cyrillic	U+0400-04FF	Russian, etc.
Arabic	U+0600-06FF	Arabic
CJK	U+4E00-9FFF	Chinese/Japanese/Korean

Security Concerns

Homoglyph Attacks

Similar-looking characters from different scripts:

Latin 'a' (U+0061) vs Cyrillic 'а' (U+0430)
Latin 'o' (U+006F) vs Cyrillic 'о' (U+043E)

Attack: аpple.com (Cyrillic 'а') looks like apple.com

Browser Protections

Browsers may display Punycode for suspicious mixed-script domains.

Unicode Normalization

Different ways to represent same character:

é = U+00E9 (precomposed)
é = U+0065 + U+0301 (decomposed: e + combining accent)

Normalization forms: NFC, NFD, NFKC, NFKD

Unicode is fundamental to global internet accessibility, enabling users worldwide to register and access domain names in their native scripts and languages.

Unicode