JSON parser does not match the JSON specification (Unicode escapes) #6

mvangeest · 2016-05-27T12:03:13Z

The JSON specification allows escapes of the form \uxxxx, which is \u followed by four hexadecimal digits. Considering that the newline character is Unicode code point U+000A, I expect parseJSON "\"\\u000A\"" to yield Right "\n" (note that this is the Showed representation), but in fact it yields a Left containing

(line 1, column 3):
unexpected "u"
expecting space, "&" or escape code

Furthermore, it is also allowed to use surrogate pairs to encode a code point past U+FFFF. A fully conforming parseJSON would parse "\"\\uD835\\uDD4C\"" into "𝕌", a string containing the code point U+1D54C.

This incompatibility is caused by the use of Parsec's stringLiteral parser, which follows Haskell's syntax rules for string literals. This parser also supports a large number of escapes that are not valid in JSON, such as \a and \&.

This issue is not critical for Communicate (certainly not important enough to delay ideas-1.5), but we expect to start working with JSON generated by many different encoders, some of which are likely to use this feature.

The text was updated successfully, but these errors were encountered:

BastiaanHeeren · 2016-05-27T12:11:30Z

I am considering the aeson package for future releases. I guess that would solve this issue.

mvangeest · 2016-05-27T14:42:04Z

Yes, I just tried my examples and aeson gets all of this right.

It looks like the dependency philosophy of aeson doesn't seem to match that of Ideas very well. Aeson uses Attoparsec and ByteString, i.e. the more "advanced" and performance-focused text tools, and has four dependencies outside the Haskell Platform (dlist, fail, tagged, semigroups - fortunately no build problems on Windows). Ideas, on the other hand, uses the "basic" String for almost all operations, and implements many things internally (XML parsing, UTF-8 encoding), taking very few dependencies. I don't know if this is intentional (or historical?) and whether you would want to switch to using more external systems.

mvangeest mentioned this issue Aug 29, 2019

Decode Unicode in XML requests earlier. #47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON parser does not match the JSON specification (Unicode escapes) #6

JSON parser does not match the JSON specification (Unicode escapes) #6

mvangeest commented May 27, 2016 •

edited

Loading

BastiaanHeeren commented May 27, 2016

mvangeest commented May 27, 2016

JSON parser does not match the JSON specification (Unicode escapes) #6

JSON parser does not match the JSON specification (Unicode escapes) #6

Comments

mvangeest commented May 27, 2016 • edited Loading

BastiaanHeeren commented May 27, 2016

mvangeest commented May 27, 2016

mvangeest commented May 27, 2016 •

edited

Loading