Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONC parser fails to correctly parse non-BMP escape sequences #31

Open
KiloJuliett opened this issue Sep 2, 2022 · 1 comment
Open
Labels
bug Something isn't working

Comments

@KiloJuliett
Copy link

KiloJuliett commented Sep 2, 2022

In accordance with RFC 8258 § 7, the non-BMP character 𝄞 (U+1D11E) should be escaped as the escaped surrogate pair \uD834\uDD1E. Therefore, I expect the following Rust code to compile and run successfully:

use jsonc_parser::JsonValue;
use jsonc_parser::parse_to_value;

fn main() {
    let src = r#""\uD834\uDD1E""#;
    let v = parse_to_value(src, &Default::default()).unwrap().unwrap();
    if let JsonValue::String(s) = v {
        assert_eq!("\u{1D11E}", s)
    }
    else {
        panic!();
    }
}

However, on the latest version of jsonc-parser (as of writing, this is version 0.21.0), this code panics at the unwrap on line 6 with the message "Invalid unicode escape sequence. 'D834' is not a valid UTF8 character".

@dsherret dsherret added the bug Something isn't working label Sep 2, 2022
@polarathene
Copy link

Not entirely sure, but this recently merged RFC might be relevant.

Ron has adopted it in their v0.9 release instead of base64 for properly supporting roundtripping with byte strings. serde_json didn't have the issue though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants