Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best-Fit Mappings: Test core Python string encoding APIs #2

Open
cweb opened this issue Aug 2, 2013 · 0 comments
Open

Best-Fit Mappings: Test core Python string encoding APIs #2

cweb opened this issue Aug 2, 2013 · 0 comments

Comments

@cweb
Copy link
Owner

cweb commented Aug 2, 2013

See: http://websec.github.io/unicode-security-guide/character-transformations/#best-fit

Identify Python core string encoding APIs, and test major Python versions to document:

  • best-fit mapping behavior - does the API best-fit characters by default?
  • override options - can default be overridden?

One way to test this might be to brute force a large set of Unicode characters by converting them to a target encoding and seeing if they convert to anything 128-bit ASCII.

// Loop through all available encodings
for each available encoding {
  // Loop through first 65,535 code points, starting at 0x80 to avoid 
  // using 128-bit ASCII as the source, because we want to test
  // if ASCII is the outcome!
  for each Unicode character 0x080 to 0xffff {
    convert the Unicode character from UTF-8 or UTF-16 to the target encoding (e.g. shift_jis, ISO-8859-1, etc)
    test if the target character is ASCII 0x00 to 0x80 after the conversion
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant