Homoglyph Substitutions
When characters look alike but are technically different, confusion and typos are inevitable.
What Are Homoglyphs?
Homoglyphs are characters that look similar or identical to each other but are technically different characters. In the context of domain names, homoglyph substitutions occur when visually similar characters are swapped, creating domains that appear legitimate but are actually different.
These substitutions are particularly concerning because they're often difficult for users to detect visually, making them a common technique in phishing attacks and brand impersonation.
Zero and uppercase letter O
Number one, lowercase L, uppercase i
Number five and uppercase S
Number eight and uppercase B
Lowercase m and lowercase r+n
Double lowercase v and lowercase w
Lowercase c+l and lowercase d
Double lowercase n and lowercase m
Homoglyph substitution is a common technique used in typosquatting and phishing attacks. By creating domains that look visually identical to legitimate sites, attackers can:
- Phishing Attacks
Create fake login pages that appear legitimate to steal credentials.
Example: paypa1.com instead of paypal.com - Malware Distribution
Trick users into downloading malicious software from what appears to be a trusted source.
Example: g00gle.com instead of google.com - Brand Impersonation
Create fake websites that mimic legitimate brands to damage reputation or steal traffic.
Example: arnazon.com instead of amazon.com
Homoglyph attacks are particularly effective because they're difficult for users to detect visually.
Our typo analysis engine uses a comprehensive database of homoglyph pairs to identify potential substitutions in domain names. We assign a probability score of 55% to homoglyph substitutions, reflecting their deliberate nature.
Homoglyph Detection Algorithm
- Map each character in the domain to potential homoglyphs
- Generate all possible combinations of homoglyph substitutions
- Filter out implausible combinations
- Rank by likelihood based on character frequency and position
- Prioritize substitutions that maintain visual similarity
This approach allows us to identify the most likely homoglyph typos for any domain name.
Protecting Against Homoglyph Attacks
Defending against homoglyph-based typosquatting requires a proactive approach to domain registration and monitoring. Our typo generator can help you identify potential homoglyph variations of your domain that might be used in phishing attacks.
International Homoglyphs
The problem of homoglyphs extends beyond the Latin alphabet. With the introduction of Internationalized Domain Names (IDNs), characters from different scripts can appear visually identical to Latin characters, creating even more opportunities for confusion.
For example, the Cyrillic letter 'о' looks identical to the Latin letter 'o', but they are different Unicode characters. This has led to sophisticated phishing attacks known as "IDN homograph attacks."
Modern browsers implement various protections against these attacks, such as displaying the Punycode representation of IDNs that mix scripts, but users should remain vigilant when clicking on links or typing domain names.