rblib: Validate.uses_mixed_capitals should be made unicode-aware #12

mhl · 2014-01-02T12:24:46Z

Currently uses_mixed_capitals uses the regular expressions /[A-Z]/ and /[a-z]/ to detect upper and lower case letters. This doesn't take into account non-ASCII upper case and lower case letters. In fixing this, case needs to be taken to preserve Ruby 1.8.7 compatibility, which doesn't have support for Unicode character classes in its regular expressions, meaning that one couldn't just use /[[:upper:]]/, for example.

The text was updated successfully, but these errors were encountered:

mhl · 2014-01-02T13:12:06Z

Elsewhere, Alaveteli uses literal character classes to fake Unicode character classes, e.g. here, although that's hugely incomplete. One could generate a correct character classes similarly, corresponding to [[:upper:]] and [[:lower:]], but there are over 1000 characters in each category, and they don't nicely collapse into ranges.

To see all upper and lower cases letters in Unicode, grouped into ranges of contiguous integers, you can use this script, which produces the output below.

Probably the pragmatic solution is to deal with the commonest ranges under Ruby 1.8.7 (checking they include those used by redeployers of our software) and use the POSIX character classes under Ruby 1.9 and later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rblib: Validate.uses_mixed_capitals should be made unicode-aware #12

rblib: Validate.uses_mixed_capitals should be made unicode-aware #12

mhl commented Jan 2, 2014

mhl commented Jan 2, 2014

rblib: Validate.uses_mixed_capitals should be made unicode-aware #12

rblib: Validate.uses_mixed_capitals should be made unicode-aware #12

Comments

mhl commented Jan 2, 2014

mhl commented Jan 2, 2014