Skip to content
Anuj Shilpakar edited this page Feb 6, 2022 · 4 revisions

How to deal with translations

I have seen some discussions around translating the responses in this and other ports of dropbox-zxcvbn and I would like to raise some concerns and suggest some solutions.

First and foremost everyone agrees that a reduced number of dependencies is desirable. Besides picking one translation tool instead of any other (say i18n or gettext) is also far from the best approach.

After that, most of the messages are fixed (constants) which allow for easy translation by any tool and the output format is very predictable including the time_estimates which do a simple interpolation. Even an ad-hoc solution is not hard to implement using hashes using the original messages as keys.

This is an example using i18n: https://github.com/18F/identity-idp/blob/a13d7652c74f74f1c1fe3d35bd0e364d4df2f0f4/app/javascript/packs/pw-strength.js#L47-L74 Gettext implementations are even more straightforward using their _(string) method.

The last aspect that is worth mentioning is the scoring itself. This and most of the other zxcvbn implementations (including the original dropbox's one) use a frequency list of words in English. (https://github.com/formigarafa/zxcvbn-rb/blob/master/lib/zxcvbn/frequency_lists.rb) You may be able to augment that list with words in other languages but it may be tricky to get a good list in the first place. Also, an increase in the size of such dictionary may (it will, that would be the point) affect the performance and the resulting score of the algorithm. Although this step is optional, I think it is important to keep that in mind.

Translatable strings

This is a complete and exhaustive list of all the messages zxcvbn may emit on its results classified by the hash key it would appear in.

Just copy them and use to create the translation on your tool of choice.

suggestions:
  "Use a few words, avoid common phrases"
  "No need for symbols, digits, or uppercase letters"
  "Add another word or two. Uncommon words are better."
  "Use a longer keyboard pattern with more turns"
  "Avoid repeated words and characters"
  "Avoid sequences"
  "Avoid recent years"
  "Avoid years that are associated with you"
  "Avoid dates and years that are associated with you"
  "Capitalization doesn't help very much"
  "All-uppercase is almost as easy to guess as all-lowercase"
  "Reversed words aren't much harder to guess"
  "Predictable substitutions like '@' instead of 'a' don't help very much"

warnings:
  "Straight rows of keys are easy to guess"
  "Short keyboard patterns are easy to guess"
  'Repeats like "aaa" are easy to guess'
  'Repeats like "abcabcabc" are only slightly harder to guess than "abc"'
  "Sequences like abc or 6543 are easy to guess"
  "Recent years are easy to guess"
  "Dates are often easy to guess"
  "This is a top-10 common password"
  "This is a top-100 common password"
  "This is a very common password"
  "This is similar to a commonly used password"
  "A word by itself is easy to guess" if is_sole_match
  "Names and surnames by themselves are easy to guess"
  "Common names and surnames are easy to guess"

time_estimates: # with pluralization depending on an interpolated `base` number.
  "less than a second"
  "#{base} second"
  "#{base} minute"
  "#{base} hour"
  "#{base} day"
  "#{base} month"
  "#{base} year"
  "centuries"
Clone this wiki locally