Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode escapes #11

Closed
wants to merge 1 commit into from
Closed

Fix unicode escapes #11

wants to merge 1 commit into from

Conversation

masklinn
Copy link

@masklinn masklinn commented Jan 16, 2017

  • Added Rust
  • Python's \u only takes 4 hex digits, \U takes 8 (and can thus encode astral characters)
  • Javascript, JSON and Java \u escape only takes 4 hex digits, for astral characters the surrogate pairs must be provided explicitly

The Python, JS, JSON and Java snippets encoded U+1F59 {GREEK CAPITAL LETTER UPSILON WITH DASIA} followed by U+0036 {DIGIT SIX} rather than U+1F596 {RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS}.

It is possible that other languages have the same issue. According to CPPReference, C++ unicode escapes work the same way Python's do, and the C11 draft I have (ISO/IEC 9899:201x — April 12, 2011 § 6.4.3) specifies essentially the same thing: \u is followed by 4 hex digits, \U is followed by 8:

6.4.3 Universal character names

Syntax

universal-character-name:

\u hex-quad
\U hex-quad hex-quad

hex-quad:

hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

* Added Rust
* Python's `\u` only takes 4 hex digits, `\U` takes 8 (and can thus encode astral characters)
* Javascript, JSON and Java `\u` escape only takes 4 hex digits, for astral characters the surrogate pairs must be provided explicitly

The Python, JS, JSON and Java snippets encoded U+1F59 {GREEK CAPITAL LETTER UPSILON WITH DASIA} followed by U+0036 {DIGIT SIX} rather than U+1F596 {RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS}.

it is possible that other languages have the same issue. In fact I find it unlikely that C would have unicode escapes at all.
@masklinn
Copy link
Author

Ah I now see that #5 fixes the exact same issues.

@masklinn masklinn closed this Jan 16, 2017
@masklinn masklinn deleted the patch-2 branch January 16, 2017 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant