Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new options for convert to allow/disallow accepting different input and invalids with replacement strings #12358

Closed
ScottPJones opened this issue Jul 29, 2015 · 1 comment
Labels
unicode Related to unicode characters and encodings

Comments

@ScottPJones
Copy link
Contributor

Currently, there is a single convert method that allows conversion to UTF8String with a argument for a replacement string, but that is only defined for an input of Vector{UInt8}, and not from the very many different possible arguments to convert(UTF8String, data).
There are other places in the code where a fixed replacement, 0xfffd, is hard-coded.

  1. There needs to be consistent handling for converting strings from vectors of code units (UInt8, UInt16, and UInt32) to the (current) 4 types of strings ASCIIString,UTF8String,UTF16String,UTF32String, that allows specifying the handling of Modified UTF-8, CESU-8, overly long encodings, and also whether invalid sequences should cause an exception, be replaced by a default replacement character (0xfffd for Unicode, 0x1a [SUB] for ASCII), or replaced by a user supplied string.
  2. This also needs to work with fixes for convert function for UTF-16 from an AbstractArray{UInt8} problems #11501 and Problems with convert from AbstractArray{UInt8} to UTF32String #11502
  3. As laid out in RFC: Roadmap for improving string support in Julia #11558, make sure only valid strings are produced (for security reasons and in the future performance reasons)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

3 participants