Punycode

Domain Fundamentals
An encoding system that converts Unicode domain names to ASCII-compatible format, enabling internationalized domain names to work with DNS.
← Back to Glossary

What is Punycode?

Punycode is an encoding syntax defined in RFC 3492 that transforms Unicode strings into the limited ASCII character set supported by DNS. It's the technical foundation that makes Internationalized Domain Names (IDNs) possible, allowing domain names in any language to work with the existing DNS infrastructure.

The Problem Punycode Solves

The Domain Name System was designed in the 1980s with only ASCII characters in mind. DNS labels can only contain:

This limitation excluded billions of internet users who don't primarily use the Latin alphabet. Punycode bridges this gap by encoding any Unicode string into valid ASCII.

How Punycode Encoding Works

Punycode uses a clever algorithm that preserves ASCII characters while encoding non-ASCII characters into a compact ASCII representation.

The Encoding Process

1. Separate Characters: Split into ASCII and non-ASCII characters

2. Copy ASCII: Keep all ASCII characters in their original positions

3. Encode Non-ASCII: Use a generalized variable-length integer encoding

4. Add Prefix: Prepend "xn--" to indicate Punycode encoding

Examples

Original (Unicode)Encoded (Punycode)
münchenxn--mnchen-3ya
北京xn--fiqs8s
münchen.dexn--mnchen-3ya.de
中文.comxn--fiq228c.com
café.comxn--caf-dma.com

The "xn--" Prefix

The "xn--" prefix is called the ACE (ASCII Compatible Encoding) prefix. It signals to DNS resolvers and applications that the label contains Punycode-encoded content. This prefix:

Punycode in Practice

Browser Handling

Modern browsers automatically handle Punycode:

User types: 中文.com

Browser sends: xn--fiq228c.com (to DNS)

Browser displays: 中文.com (in address bar)

Developer Implementation

JavaScript (Node.js):
const punycode = require('punycode/');

// Encode to Punycode

const encoded = punycode.toASCII('münchen.de');

// Result: xn--mnchen-3ya.de

// Decode from Punycode

const decoded = punycode.toUnicode('xn--mnchen-3ya.de');

// Result: münchen.de

Python:
domain = 'münchen.de'

encoded = domain.encode('idna').decode('ascii')

# Result: xn--mnchen-3ya.de

URL Handling

When working with URLs containing IDNs:

// URL API handles Punycode automatically

const url = new URL('https://中文.com/path');

console.log(url.hostname); // xn--fiq228c.com

console.log(url.href); // https://xn--fiq228c.com/path

Security Implications

Punycode's ability to represent any Unicode character creates security risks:

Visual Spoofing

Attackers can register domains that look identical to legitimate sites:

аррlе.com (Cyrillic 'а' and 'р')

apple.com (Latin letters)

Both display identically in some fonts but are different domains.

Browser Protections

To combat spoofing, browsers implement protections:

1. Mixed Script Detection: Display Punycode for domains mixing scripts suspiciously

2. Confusable Detection: Flag domains using characters that look like ASCII

3. Whitelisting: Allow Unicode display only for well-known TLDs

Working with Punycode in APIs

When building domain tools:

Always Store Punycode: Use the ASCII form internally for consistency and database indexing. Accept Both Forms: Let users input either Unicode or Punycode, converting as needed. Display Unicode: Show the human-readable form in user interfaces.
function normalizeDomain(input) {

const punycode = require('punycode/');

// Convert to lowercase Punycode for internal use

return punycode.toASCII(input.toLowerCase());

}

Punycode is transparent to most users but essential knowledge for developers building internationalized web applications.

Put This Knowledge to Work

Use DomScan's API to check domain availability, health, and more.