Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statusandheaders: if setting a header fails on Headers object, retry with encoded header #83

Merged
merged 5 commits into from
Nov 11, 2024

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Nov 7, 2024

instead of skipping invalid headers:

@ikreymer
Copy link
Member Author

ikreymer commented Nov 7, 2024

Trying to decide if we want to go with encodeURI or with latin-1 encoding here... I think latin-1 is technically more 'correct', though encodeURI() more portable / easier to understand. As mentioned in #81, fetch() does the equivalent of latin-1 / iso-8859-1, so maybe we do go with that?

@tw4l
Copy link
Member

tw4l commented Nov 7, 2024

Trying to decide if we want to go with encodeURI or with latin-1 encoding here... I think latin-1 is technically more 'correct', though encodeURI() more portable / easier to understand. As mentioned in #81, fetch() does the equivalent of latin-1 / iso-8859-1, so maybe we do go with that?

I can see the arguments either way but my vote would be for ISO-8859-1, since that's what the HTTP spec seems to have originally supported and is consistent with fetch.

@ikreymer
Copy link
Member Author

ikreymer commented Nov 8, 2024

Trying to decide if we want to go with encodeURI or with latin-1 encoding here... I think latin-1 is technically more 'correct', though encodeURI() more portable / easier to understand. As mentioned in #81, fetch() does the equivalent of latin-1 / iso-8859-1, so maybe we do go with that?

I can see the arguments either way but my vote would be for ISO-8859-1, since that's what the HTTP spec seems to have originally supported and is consistent with fetch.

Updated it to the latin-1 encoding again.

Copy link
Member

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion to take or leave re: encoding, otherwise looks good and thanks for the tests

Comment on lines 213 to 214
buf.forEach((x) => (str += String.fromCharCode(x)));
//buf.forEach(x => str += x < 128 ? String.fromCharCode(x) : `\\x${x}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using buffer.transcode() might be a good idea here, as there might be edge cases where TextEncoder().encode(value) might encode certain characters as two bytes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, though buffer is not available in the browser, so would have to use the equivalent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true, so used to working with node I forgot! It is also an edge case within an edge case, so I trust how much attention you want to give this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we change this, it will break this assertion from the test case:

function decodeLatin1(buf) {
  let str = "";
  for (let i = 0; i < buf.length; i++) {
    str += String.fromCharCode(buf[i]);
  }
  return str;
}
d = decodeLatin1(new TextEncoder().encode("https://wiki.archlinux.jp/index.php/Arch_User_Repository?rdfrom=https%3A%2F%2Fwiki.archlinux.org%2Findex.php%3Ftitle%3DAUR_Metadata_%28%25E6%2597%25A5%25E6%259C%25AC%25E8%25AA%259E%29%26redirect%3Dno#AUR_メタデータ"));

// this matches the headers from node fetch:
a = await fetch("https://wiki.archlinux.org/title/AUR_Metadata_(%E6%97%A5%E6%9C%AC%E8%AA%9E)", {redirect: "manual"})
assert(a.headers.get("location") === d)

If we do: decodeLatin1(Buffer.from("https://wiki.archlinux.jp/index.php/Arch_User_Repository?rdfrom=https%3A%2F%2Fwiki.archlinux.org%2Findex.php%3Ftitle%3DAUR_Metadata_%28%25E6%2597%25A5%25E6%259C%25AC%25E8%25AA%259E%29%26redirect%3Dno#AUR_メタデータ", "ascii")); it will give a different result than what node fetch is doing..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved this in the most correct way, I think. Still encoding utf8-latin1 to store in Headers object, but then reencoding back when serializing

- add a UTF8ToLatin1 and Latin1ToUTF8 conversions for using Headers
- when using Headers, convert UTF8 to Latin1, set 'needsReencode' flag
- when serializing, if reencoding flag set, be sure to reencode Latin1 back to UTF8
@ikreymer ikreymer requested a review from tw4l November 11, 2024 20:45
@ikreymer ikreymer merged commit d20880d into main Nov 11, 2024
6 checks passed
@ikreymer ikreymer deleted the encode-non-ascii-headers branch November 11, 2024 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Headers with non-ASCII characters silently disappear
2 participants