Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON parser does not match the JSON specification (Unicode escapes) #6

Open
mvangeest opened this issue May 27, 2016 · 2 comments
Open

Comments

@mvangeest
Copy link
Contributor

mvangeest commented May 27, 2016

The JSON specification allows escapes of the form \uxxxx, which is \u followed by four hexadecimal digits. Considering that the newline character is Unicode code point U+000A, I expect parseJSON "\"\\u000A\"" to yield Right "\n" (note that this is the Showed representation), but in fact it yields a Left containing

(line 1, column 3):
unexpected "u"
expecting space, "&" or escape code

Furthermore, it is also allowed to use surrogate pairs to encode a code point past U+FFFF. A fully conforming parseJSON would parse "\"\\uD835\\uDD4C\"" into "𝕌", a string containing the code point U+1D54C.

This incompatibility is caused by the use of Parsec's stringLiteral parser, which follows Haskell's syntax rules for string literals. This parser also supports a large number of escapes that are not valid in JSON, such as \a and \&.

This issue is not critical for Communicate (certainly not important enough to delay ideas-1.5), but we expect to start working with JSON generated by many different encoders, some of which are likely to use this feature.

@BastiaanHeeren
Copy link
Member

I am considering the aeson package for future releases. I guess that would solve this issue.

@mvangeest
Copy link
Contributor Author

Yes, I just tried my examples and aeson gets all of this right.

It looks like the dependency philosophy of aeson doesn't seem to match that of Ideas very well. Aeson uses Attoparsec and ByteString, i.e. the more "advanced" and performance-focused text tools, and has four dependencies outside the Haskell Platform (dlist, fail, tagged, semigroups - fortunately no build problems on Windows). Ideas, on the other hand, uses the "basic" String for almost all operations, and implements many things internally (XML parsing, UTF-8 encoding), taking very few dependencies. I don't know if this is intentional (or historical?) and whether you would want to switch to using more external systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants