Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encourage denoting character-attributable errors by the REPLACEMENT CHARACTER #819

Open
hsivonen opened this issue Feb 2, 2024 · 0 comments
Labels
editorial Changes that do not affect how the standard is understood topic: idna

Comments

@hsivonen
Copy link
Member

hsivonen commented Feb 2, 2024

What is the issue with the URL Standard?

The URL Standard gives advice about URL rendering:
https://url.spec.whatwg.org/#ref-for-concept-domain-to-unicode%E2%91%A0

It also in the https://url.spec.whatwg.org/#concept-host-parser section says: "Alternatively UTF-8 decode without BOM or fail can be used, coupled with an early return for failure, as domain to ASCII fails on U+FFFD (�).", which is the opposite remark of what I'm asking for here.

UTS 46 says: "Implementations may make further modifications to the resulting Unicode string when showing it to the user. For example, it is recommended that disallowed characters be replaced by a U+FFFD to make them visible to the user."

It would be useful for the URL Standard to highlight this technique and to include a Note to encourage letting U+FFFD from UTF-8 decode flow through the processing and to replace erroneous code points during UTS 46 processing and forbidden domain code point processing with U+FFFD so that errors that are attributable to specific things in the domain are visualized to the user. Since U+FFFD is itself a disallowed character, this technique preserves the overall failure status of the domain.

@hsivonen hsivonen added topic: idna editorial Changes that do not affect how the standard is understood labels Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial Changes that do not affect how the standard is understood topic: idna
Development

No branches or pull requests

1 participant