statusandheaders: if setting a header fails on Headers object, retry with encoded header #83

ikreymer · 2024-11-07T06:06:35Z

instead of skipping invalid headers:

encode header value as latin-1 for use with Headers object
encode back to original when serializing
fixes Headers with non-ASCII characters silently disappear #81

…with encodeURIComponent-encoded value instead of just skipping the header fixes #81

ikreymer · 2024-11-07T20:55:19Z

Trying to decide if we want to go with encodeURI or with latin-1 encoding here... I think latin-1 is technically more 'correct', though encodeURI() more portable / easier to understand. As mentioned in #81, fetch() does the equivalent of latin-1 / iso-8859-1, so maybe we do go with that?

tw4l · 2024-11-07T21:06:47Z

Trying to decide if we want to go with encodeURI or with latin-1 encoding here... I think latin-1 is technically more 'correct', though encodeURI() more portable / easier to understand. As mentioned in #81, fetch() does the equivalent of latin-1 / iso-8859-1, so maybe we do go with that?

I can see the arguments either way but my vote would be for ISO-8859-1, since that's what the HTTP spec seems to have originally supported and is consistent with fetch.

ikreymer · 2024-11-08T02:07:07Z

Trying to decide if we want to go with encodeURI or with latin-1 encoding here... I think latin-1 is technically more 'correct', though encodeURI() more portable / easier to understand. As mentioned in #81, fetch() does the equivalent of latin-1 / iso-8859-1, so maybe we do go with that?

I can see the arguments either way but my vote would be for ISO-8859-1, since that's what the HTTP spec seems to have originally supported and is consistent with fetch.

Updated it to the latin-1 encoding again.

tw4l

One suggestion to take or leave re: encoding, otherwise looks good and thanks for the tests

tw4l · 2024-11-08T16:24:24Z

src/lib/statusandheaders.ts

+  buf.forEach((x) => (str += String.fromCharCode(x)));
+  //buf.forEach(x => str += x < 128 ? String.fromCharCode(x) : `\\x${x}`);


I wonder if using buffer.transcode() might be a good idea here, as there might be edge cases where TextEncoder().encode(value) might encode certain characters as two bytes?

Hmm, yeah, though buffer is not available in the browser, so would have to use the equivalent.

Ah true, so used to working with node I forgot! It is also an edge case within an edge case, so I trust how much attention you want to give this

I guess if we change this, it will break this assertion from the test case:

function decodeLatin1(buf) { let str = ""; for (let i = 0; i < buf.length; i++) { str += String.fromCharCode(buf[i]); } return str; } d = decodeLatin1(new TextEncoder().encode("https://wiki.archlinux.jp/index.php/Arch_User_Repository?rdfrom=https%3A%2F%2Fwiki.archlinux.org%2Findex.php%3Ftitle%3DAUR_Metadata_%28%25E6%2597%25A5%25E6%259C%25AC%25E8%25AA%259E%29%26redirect%3Dno#AUR_メタデータ")); // this matches the headers from node fetch: a = await fetch("https://wiki.archlinux.org/title/AUR_Metadata_(%E6%97%A5%E6%9C%AC%E8%AA%9E)", {redirect: "manual"}) assert(a.headers.get("location") === d)

If we do: decodeLatin1(Buffer.from("https://wiki.archlinux.jp/index.php/Arch_User_Repository?rdfrom=https%3A%2F%2Fwiki.archlinux.org%2Findex.php%3Ftitle%3DAUR_Metadata_%28%25E6%2597%25A5%25E6%259C%25AC%25E8%25AA%259E%29%26redirect%3Dno#AUR_メタデータ", "ascii")); it will give a different result than what node fetch is doing..

Resolved this in the most correct way, I think. Still encoding utf8-latin1 to store in Headers object, but then reencoding back when serializing

- add a UTF8ToLatin1 and Latin1ToUTF8 conversions for using Headers - when using Headers, convert UTF8 to Latin1, set 'needsReencode' flag - when serializing, if reencoding flag set, be sure to reencode Latin1 back to UTF8

ikreymer added 2 commits November 6, 2024 22:03

statusandheaders: if setting a header fails on Headers object, retry …

e03bca9

…with encodeURIComponent-encoded value instead of just skipping the header fixes #81

use encodeURI instead of encodeURIComponent, update test

7eb05ce

ikreymer mentioned this pull request Nov 7, 2024

Headers with non-ASCII characters silently disappear #81

Closed

ikreymer requested a review from tw4l November 7, 2024 20:54

back to encodeLatin1 for consistency

26f7444

tw4l approved these changes Nov 8, 2024

View reviewed changes

ikreymer added 2 commits November 8, 2024 13:27

fix lossless conversion of headers:

0fd5b85

- add a UTF8ToLatin1 and Latin1ToUTF8 conversions for using Headers - when using Headers, convert UTF8 to Latin1, set 'needsReencode' flag - when serializing, if reencoding flag set, be sure to reencode Latin1 back to UTF8

use set of headers that need reencoding

aac0a56

ikreymer requested a review from tw4l November 11, 2024 20:45

tw4l approved these changes Nov 11, 2024

View reviewed changes

ikreymer merged commit d20880d into main Nov 11, 2024
6 checks passed

ikreymer deleted the encode-non-ascii-headers branch November 11, 2024 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statusandheaders: if setting a header fails on Headers object, retry with encoded header #83

statusandheaders: if setting a header fails on Headers object, retry with encoded header #83

ikreymer commented Nov 7, 2024 •

edited

Loading

ikreymer commented Nov 7, 2024

tw4l commented Nov 7, 2024

ikreymer commented Nov 8, 2024

tw4l left a comment

tw4l Nov 8, 2024

ikreymer Nov 8, 2024

tw4l Nov 8, 2024

ikreymer Nov 8, 2024

ikreymer Nov 8, 2024

		buf.forEach((x) => (str += String.fromCharCode(x)));
		//buf.forEach(x => str += x < 128 ? String.fromCharCode(x) : `\\x${x}`);

statusandheaders: if setting a header fails on Headers object, retry with encoded header #83

statusandheaders: if setting a header fails on Headers object, retry with encoded header #83

Conversation

ikreymer commented Nov 7, 2024 • edited Loading

ikreymer commented Nov 7, 2024

tw4l commented Nov 7, 2024

ikreymer commented Nov 8, 2024

tw4l left a comment

Choose a reason for hiding this comment

tw4l Nov 8, 2024

Choose a reason for hiding this comment

ikreymer Nov 8, 2024

Choose a reason for hiding this comment

tw4l Nov 8, 2024

Choose a reason for hiding this comment

ikreymer Nov 8, 2024

Choose a reason for hiding this comment

ikreymer Nov 8, 2024

Choose a reason for hiding this comment

ikreymer commented Nov 7, 2024 •

edited

Loading