Skip to content
Greg Toombs edited this page Oct 24, 2015 · 12 revisions

The mimic user is essentially able to write text that, by way of its appearance, means one thing to the victim seeing it, and quite another thing to the victim's computer. This has several implications.

Scamming / spoofing

In situations where text alone identifies a third party - such as an email address or a domain name - an attacker could use a mimicked identifier to be an imposter of that third party.

Plagiarism

Someone could theoretically mimic a stolen copy of text to evade auto-detection software, as that software would not likely consider it to be a match to any original source.

Spamming

In a context where anti-spam software bases detection on matches of common phrases, a spammer could mimic spam text in a different way every time so that the anti-spam software never considers it a match to known spam.

Evasion of indexing

If someone, for either just or malevolent reasons, wants to evade text being meaningfully indexed by a search engine, they could mimic it. Even though the search engine would successfully index it, no usual search terms would subsequently succeed in finding a match.

Where do we go from here?

In contexts sensitive to the above issues, several approaches could be taken:

  • As mimic is already able to do, check for unusual or suspicious characters that are in unexpected Unicode ranges.
  • Again, as mimic is already able to do, attempt to replace such characters with characters that are considered more conventional.
  • Maintain and utilize an index of known popular "truth" terms (e.g. google.com), and warn if a potential homoglyph attack is attempting to spoof such a truth term.
  • As a heavier, more general solution, apply a round-trip render->OCR algorithm to check for discrepancies.
Clone this wiki locally