Homoglyph Substitutions

When characters look alike but are technically different, confusion and typos are inevitable.

What Are Homoglyphs?

Homoglyphs are characters that look similar or identical to each other but are technically different characters. In the context of domain names, homoglyph substitutions occur when visually similar characters are swapped, creating domains that appear legitimate but are actually different.

These substitutions are particularly concerning because they're often difficult for users to detect visually, making them a common technique in phishing attacks and brand impersonation.

Common Homoglyph Pairs
Characters that are frequently confused with each other
0 ↔ O

Zero and uppercase letter O

Example: google.com → g00gle.com
1 ↔ l ↔ I

Number one, lowercase L, uppercase i

Example: paypal.com → paypa1.com
5 ↔ S

Number five and uppercase S

Example: chase.com → cha5e.com
8 ↔ B

Number eight and uppercase B

Example: bitcoin.com → 8itcoin.com
m ↔ rn

Lowercase m and lowercase r+n

Example: amazon.com → arnazon.com
vv ↔ w

Double lowercase v and lowercase w

Example: www.site.com → vvvv.site.com
cl ↔ d

Lowercase c+l and lowercase d

Example: discord.com → cliscord.com
nn ↔ m

Double lowercase n and lowercase m

Example: gmail.com → gnnail.com
Homoglyph Typosquatting
How homoglyphs are used in malicious domains

Homoglyph substitution is a common technique used in typosquatting and phishing attacks. By creating domains that look visually identical to legitimate sites, attackers can:

  • Phishing Attacks

    Create fake login pages that appear legitimate to steal credentials.

    Example: paypa1.com instead of paypal.com
  • Malware Distribution

    Trick users into downloading malicious software from what appears to be a trusted source.

    Example: g00gle.com instead of google.com
  • Brand Impersonation

    Create fake websites that mimic legitimate brands to damage reputation or steal traffic.

    Example: arnazon.com instead of amazon.com

Homoglyph attacks are particularly effective because they're difficult for users to detect visually.

Homoglyph Detection
How our system identifies potential homoglyph typos

Our typo analysis engine uses a comprehensive database of homoglyph pairs to identify potential substitutions in domain names. We assign a probability score of 55% to homoglyph substitutions, reflecting their deliberate nature.

Homoglyph Detection Algorithm

  1. Map each character in the domain to potential homoglyphs
  2. Generate all possible combinations of homoglyph substitutions
  3. Filter out implausible combinations
  4. Rank by likelihood based on character frequency and position
  5. Prioritize substitutions that maintain visual similarity

This approach allows us to identify the most likely homoglyph typos for any domain name.

Protecting Against Homoglyph Attacks

Defending against homoglyph-based typosquatting requires a proactive approach to domain registration and monitoring. Our typo generator can help you identify potential homoglyph variations of your domain that might be used in phishing attacks.

International Homoglyphs

The problem of homoglyphs extends beyond the Latin alphabet. With the introduction of Internationalized Domain Names (IDNs), characters from different scripts can appear visually identical to Latin characters, creating even more opportunities for confusion.

For example, the Cyrillic letter 'о' looks identical to the Latin letter 'o', but they are different Unicode characters. This has led to sophisticated phishing attacks known as "IDN homograph attacks."

Modern browsers implement various protections against these attacks, such as displaying the Punycode representation of IDNs that mix scripts, but users should remain vigilant when clicking on links or typing domain names.