Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rblib: Validate.uses_mixed_capitals should be made unicode-aware #12

Open
mhl opened this issue Jan 2, 2014 · 1 comment
Open

rblib: Validate.uses_mixed_capitals should be made unicode-aware #12

mhl opened this issue Jan 2, 2014 · 1 comment

Comments

@mhl
Copy link
Contributor

mhl commented Jan 2, 2014

Currently uses_mixed_capitals uses the regular expressions /[A-Z]/ and /[a-z]/ to detect upper and lower case letters. This doesn't take into account non-ASCII upper case and lower case letters. In fixing this, case needs to be taken to preserve Ruby 1.8.7 compatibility, which doesn't have support for Unicode character classes in its regular expressions, meaning that one couldn't just use /[[:upper:]]/, for example.

@mhl
Copy link
Contributor Author

mhl commented Jan 2, 2014

Elsewhere, Alaveteli uses literal character classes to fake Unicode character classes, e.g. here, although that's hugely incomplete. One could generate a correct character classes similarly, corresponding to [[:upper:]] and [[:lower:]], but there are over 1000 characters in each category, and they don't nicely collapse into ranges.

To see all upper and lower cases letters in Unicode, grouped into ranges of contiguous integers, you can use this script, which produces the output below.

Probably the pragmatic solution is to deal with the commonest ranges under Ruby 1.8.7 (checking they include those used by redeployers of our software) and use the POSIX character classes under Ruby 1.9 and later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant