Add "decompress" response utility #3423

kettanaito · 2024-07-26T17:01:37Z

This relates to...

Fixes Expose "Content-Encoding" handling publicly #3412
Related to feat: add decompress interceptor #3274

Rationale

Exposing the response decompression logic (i.e. the handling of the Content-Encoding header) will allow other Node.js libraries to reuse it, resulting in a consistent experience for everyone.

See more detailed reasoning in the referenced issue.

Changes

Features

Add a new decompress utility, abstracting the Content-Type handling from the existing lib/fetch/index.js.
Use the decompress utility in the onHeaders callback.
Add unit tests for decompress.
Add unit tests for decompressStream.
Export the decompress utility publicly from undici.
Export the decompressStream utility publicly from undici.
Add types to the types module
Add type tests
Export the functions from under the Util namespace

Bug Fixes

Breaking Changes and Deprecations

Status

KhafraDev · 2024-07-26T17:25:14Z

it'd be nice if this worked with the rest of undici too, not just fetch

kettanaito · 2024-07-26T17:41:33Z

@KhafraDev, can you point me to other usages where this should work?

I admit, we mostly need it for fetch, since http does not handle response encoding by design (it returns it as-is). I presume Undici has other use cases. I can see what I can do there. That being said, keeping a (request, response) call signature would be nice publicly.

KhafraDev · 2024-07-26T18:02:43Z

can you point me to other usages where this should work?

undici.request, stream, etc. #1155

kettanaito · 2024-07-26T18:07:55Z

Added a basic set of unit tests for the decompress function.

✔ ignores responses without the "Content-Encoding" header (32.383292ms)
✔ ignores responses with empty "Content-Encoding" header (0.780042ms)
✔ ignores redirect responses (0.439708ms)
✔ ignores HEAD requests (0.456958ms)
﹣ ignores CONNECT requests (0.0775ms) # SKIP
✔ ignores responses with unsupported encoding (0.219292ms)
✔ decompresses responses with "gzip" encoding (15.465084ms)

Still need to cover multiple Content-Encoding values, like gzip, br, etc.

kettanaito · 2024-07-26T18:21:02Z

@KhafraDev, thinking of tailoring this to a generic body decompression purposes, I can imagine it being used this way:

function createDecompressionStream(args): TransformStream 

// Then, usage:
anyCompressedBodyStream.pipe(createDecompressionStream(args))

The most challenging part is figuring out the args here.

KhafraDev · 2024-07-26T19:18:38Z

I think for arguments a stream and a header value would be fine

kettanaito · 2024-07-26T19:37:09Z

Got it. Added a decompressStream() utility, reused it in the decompress().

Now, decompressStream can be used with any ReadableStream as input, while decompress operates on Fetch API request and response arguments (also decides additional logic like redirects, etc.)

metcoder95

So far, it LGTM;

I believe the decompressStream can be used within an interceptor if we want to aim for that in the other PR as well.

lib/web/fetch/decompress.js

ronag · 2024-07-28T10:26:39Z

lib/web/fetch/decompress.js

+      decoders.length = 0
+      break


Suggested change

decoders.length = 0

break

return null

Maybe return null or otherwise there is no way to know if it "failed".

Also decompressStream should probably not live under /web/

@ronag, can it fail though? The intention was that, if there's no known compression provided, the stream is returned as-is.

You can still handle failures as you normally would:

decompressStream(myStream).on('error', callback)

if there's no known compression provided

The logic seems to be that if an unknown coding is sent, the body isn't decompressed at all. I'm not sure if that sounds right either.

Content-Encoding: gzip, zstd, br -> ignored (zstd is not supported)
Content-Encoding: gzip, br -> decompressed

As a consumer, I expect the body stream to be returned as-is if:

It has no Content-Encoding set (no codings provided as an argument);

It has unknown Content-Encoding set or an unknown coding(s) is provided as an argument.

I wouldn't expect it to throw because that's a disruptive behavior that implies I need to add a guard before trying to call decompress/decompressStream. At least, not at a level this low.

If the API is potentially disruptive, I need to have an extra check in place to prevent scenarios where decompression is impossible. This puts extra work on me as a consumer but also results in a more frail API since the list of supported encodings is baked in Undici and may change across releases.

I strongly suggest to return the body stream as-is in those two cases I described above. decompressString() must never error, save for the streams errors itself (and those are irrelevant in the context of this function).

@metcoder95 @KhafraDev do you agree with my reasoning in the post above?

Hah, so there's an existing expectation that decompression is skipped if encountered unknown encoding:

✖ should skip decompression if unsupported codings

I suppose that answers the question.

throwing an error will make fetch slower

Regarding the decoders customization discussion, I don't think it justifies the cost of making decompressStream more complex. The only thing you need is the list of decodings, which you can get using a one-liner:

const codings = contentEncoding .toLowerCase() .split(',') .map((coding) => coding.trim())

The purpose of decompressStream is to grab the exact behavior Undici has under the hood. If you need a different behavior, you should (1) parse the Content-Encoding header; (2) map encodings to decompression streams with custom options.

I can see how you'd want to extend or override certain things from the default decompression, and for that we, perhaps, can consider exporting those options separately?

module.exports = { gzipDecompressionOptions: { flush: zlib.constants.Z_SYNC_FLUSH, finishFlush: zlib.constants.Z_SYNC_FLUSH } }

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Regarding erroring or throwing; As you suggest @kettanaito, I'd like to fail faster and inform the implementer the encoding is not supported without even disturbing the input stream, so fallbacks can be applied if they want to.

For fetch, as @KhafraDev points out, the throwing might be slower, so we can disable by default, and not throw at all, but rather ignore the non-supported encodings.

The issue I'd like to avoid, is putting the implementer in the spot of all or nothing.

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Agree, let's do that 👍

kettanaito · 2024-07-31T10:57:58Z

I've added unit tests for x-gzip and gzip, br encodings. Both seem to pass 🎉

Also added test cases for deflate and deflate, gzip.

kettanaito · 2024-07-31T13:58:51Z

The CI is failing with some issues download Node.js binary, it seems. Doesn't appear to be related to the changes.

kettanaito · 2024-07-31T14:09:02Z

@KhafraDev, what do you think if decompressStream becomes createDecompressStream, dropping the input stream argument and allowing any stream to be piped through it?

myStream.pipe(createDecompressStream(codings))

I'd love to learn from you about the difference/implications here. Thanks.

kettanaito · 2024-08-01T13:06:58Z

Update

I've added remaining unit tests for the decompressStream utility, all are passing locally.

Based on the conclusion of the discussion around unknown encodings, and taking into the consideration that there's an explicit existing test suite that unknown encodings must be ignored when decompressing the fetch() response body, looks like the way forward is to leave the implementation as-is. Please let me know if I misunderstood the conclusion.

I suggest adding the throwing logic to decompressStream once we discover a use case for that (decompression in the interceptor may be that use case). From then, we can open a new pull request to extend the decompressStream utility with options or anything else we find proper.

With this, I believe this pull request is ready for review cc @KhafraDev @mcollina

tsctx · 2024-08-01T13:53:43Z

lib/web/fetch/decompress.js

+ * @param {Response} response
+ * @returns {ReadableStream<Uint8Array> | null}
+ */
+function decompress (request, response) {


I think decompressStream is sufficient. It should already be redirected.

I'd like to have a designated utility that performs the decompression including the handling of redirect responses, etc. decompressStream implies you handle that yourself.

It looks to me like you are targeting the Fetch api response.
In this case, have you already been redirected?

@tsctx, I'm moving the existing content encoding logic to this utility function. In the existing logic, Undici does accept the response, which is the redirect response, not the redirected response. This is correct. This utility must also decide if decompression is unnecessary if the response is a redirect response.

I would recommend adding 'Web' to the suffix since it targets the web api.

tsctx · 2024-08-01T13:57:51Z

lib/web/fetch/index.js

+            body: decompress(request, {
+              status,
+              statusText,
+              headers: headersList,


headersList is not an instance of Headers

Technically, not, but its APIs are compatible, from what I can see. At least, the entire test suite hasn't proven me wrong.

I can construct Headers instance out of headersList but it may have performance implications. Would you advise me to do that?

index.js

metcoder95

LGTM, just having the considerations of @tsctx about the util namespace and the following thread: https://github.com/nodejs/undici/pull/3423/files#r1694215822

mcollina · 2024-08-02T16:15:46Z

Can you please add the types for this and the related type tests?

kettanaito · 2024-08-07T15:24:17Z

On a related subject, I've recently learned about DecompressionStream global API, which is also available in Node.js as of now. Can it be utilize to the same effect as the API being proposed here?

response.body.pipeTo(new DecompressionStream('gzip'))

One practical downside of this is that there's no connection between the response content encoding and the decompression used. You can also provide only one decompression format (gzip, deflate, deflate-raw), so in case of response streams encoded with multiple codings, you'd have to create a pipe of decompression streams.

Does anybody know if there's any difference in DecompressionStream and the content encoding handling Undici has as of now?

I'd still much like to be consistent with Unidici here, but knowing more would be good.

index.js

kettanaito · 2024-08-07T15:45:02Z

Moved the tests to the root level of test, and now random tests are failing:

✖ should include encodedBodySize in performance entry (1.82966ms)
  TypeError [Error]: webidl.converters.USVString is not a function
      at fetch (/Users/kettanaito/Projects/contrib/undici/index.js:121:13)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async Server.<anonymous> (/Users/kettanaito/Projects/contrib/undici/test/fetch/resource-timing.js:68:18)

The tests are in their own modules, no existing tests were edited. The errors themselves don't seem to be related to the changes.

Does anybody know what's causing this?

mcollina · 2024-09-11T16:40:39Z

@kettanaito are you still considering finishing this?

kettanaito · 2024-09-15T08:34:30Z

@mcollina, yes. I've fallen off lately, doing other things. This is still on my todo list, and I'd love to see this merged. Still got those test shenanigans, no idea what causing seemingly unrelated tests to fail. I would appreciate your patience with me here, it may take me time to get back to this.

kettanaito · 2024-10-14T10:59:23Z

I've rebased the branch against the latest main, and have one question:

Is the rightmost err stream for the body necessary just to have 3+ streams here, or does it serve a different purpose? I suspect stream errors will already propagate as the error event on the stream itself.

-body: decoders.length
-  ? pipeline(this.body, ...decoders, (err) => {
-    if (err) {
-      this.onError(err)
-    }
-  }).on('error', onError)
-  : this.body.on('error', onError)

+body: decompress(request, {
+  status,
+  statusText,
+  headers: headersList,
+  body: this.body
+}).on('error', onError)

Basically, my rebase removed the (err) => {} part of the pipeline. Please let me know if that's okay.

Also, since decompress() now returns the body stream as-is if there are no transformations found to decompress that stream, there's no need to have a conditional body assignment.

kettanaito · 2024-10-14T11:09:55Z

On code formatting

I undid the code formatting to keep the diff sane, but I highly encourage you to investigate your Prettier setup. Running npm run lint:fix produces a lot of formatting diff. That's an indication that either it's not set up properly, or the contributors are ignoring that and pushing changes with their own formatter settings instead of using those set up in the project.

kettanaito · 2024-10-14T11:17:10Z

Can I please get some help with these failing tests? They don't seem to be related to the changes I'm introducing. Looks more like I need to build something or update something...

✖ Buffer will be returned as string via text() (0.754584ms)
  TypeError [Error]: webidl.converters.USVString is not a function
      at webidl.converters.RequestInfo (/Users/kettanaito/Projects/contrib/undici/lib/web/fetch/request.js:996:30)
      at new Request (/Users/kettanaito/Projects/contrib/undici/lib/web/fetch/request.js:104:31)
      at fetch (/Users/kettanaito/Projects/contrib/undici/lib/web/fetch/index.js:145:21)
      at TestContext.<anonymous> (/Users/kettanaito/Projects/contrib/undici/test/mock-interceptor.js:376:30)
      at Test.runInAsyncScope (node:async_hooks:206:9)
      at Test.run (node:internal/test_runner/test:631:25)
      at Suite.processPendingSubtests (node:internal/test_runner/test:374:18)
      at Test.postRun (node:internal/test_runner/test:715:19)
      at Test.run (node:internal/test_runner/test:673:12)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Test run

Why is webidl.converters.USVString suddenly became undefined? Would appreciate assistance on this so I could proceed with adding type tests and having this merged.

cc @KhafraDev @mcollina

KhafraDev · 2024-10-14T13:53:51Z

It's a circular require, lib/web/fetch/util.js is importing lib/core/util.js and vice versa.

kettanaito · 2024-10-14T13:59:41Z

lib/core/util.js

+let createInflate
+
+function lazyCreateInflate (zlibOptions) {
+  createInflate ??= require('../web/fetch/util').createInflate


@KhafraDev, wouldn't it make more sense to lift createInflate to ../core/util in this case? If multiple modules depend on it, perhaps it's a sign to make it a generic utility.

From what I can see, it was collocated with fetch only because it was used for the body decompression. Now that the decompression logic is a core util, maybe it should be there, too.

I went for the path of least resistance, I have no objections in either case.

metcoder95 · 2024-10-15T06:50:11Z

test/decompress.js

+/**
+ * @note Undici does not support the "CONNECT" request method.
+ */
+test.skip('ignores CONNECT requests', async () => {


We should, or what do you mean exactly?

CONNECT is a special method which opens a tunnel.

Sure, I get that but we do support it (maybe fetch doesn’t, and that’s the part that i missed?)

mcollina

Should this utility get publicly exposed?

kettanaito · 2024-10-15T10:20:41Z

@mcollina, yes, that's the intention. We would like to reuse it in MSW to have a consistent body decompression across the board. As suggested by prior reviewers, we can expose this from utils, but I suppose that still means the root export of the package.

kettanaito · 2024-10-15T10:21:29Z

@KhafraDev, thank you so much for your support on this one!

I believe there are still type tests remaining on my end, and making sure we export this correctly (perhaps @mcollina has some thoughts on this).

mcollina · 2024-10-15T12:31:58Z

@kettanaito add the export from the root package and related docs.

kettanaito · 2024-10-15T16:57:21Z

Will do! I should have some time tomorrow to finalize this.

KhafraDev · 2024-10-16T14:00:27Z

Is this needed after mswjs/interceptors#661? I think there are some semver issues to consider along with the general weirdness of exposing internals for public use

kettanaito · 2024-10-17T13:50:40Z

@KhafraDev, that's a good question. I've come to realize that we wouldn't be able to reuse this compression straight away, and would have to either ship undici alongside Interceptors, or inline its decompression logic anyway.

I would rather not ship undici because it creates two separate universes of the bundled Undici in the user's Node, and explicit Undici they install with MSW. Not a good thing.

I was surprised to learn about the Compression Streams API. Sadly, it doesn't support Brotli, and I'm researching the possible ways to move forward here.

That being said, I still find the decompress logic being exposed as beneficial to Undici and its consumers. I believe there were issues that linked to its need as well.

Uzlopak · 2024-10-17T14:02:11Z

I think that moving the code from fetch to core is wrong. decompress expects Request and Response from fetch and not the core Request and Response instances. So the decompression needs to be moved back to the fetch folder. Or do I understand that wrong?

Add "decompress" response utility

498a4c0

kettanaito mentioned this pull request Jul 26, 2024

Expose "Content-Encoding" handling publicly #3412

Open

Use "decompress" in "onHeaders"

0a15555

Add unit tests for "decompress"

a6ca426

Add "decompressStream" utility

3e89cd4

metcoder95 reviewed Jul 28, 2024

View reviewed changes

lib/web/fetch/decompress.js Outdated Show resolved Hide resolved

ronag reviewed Jul 28, 2024

View reviewed changes

Add missing compression tests

1b06bf1

kettanaito force-pushed the response-decompress branch from 788a1df to 1b06bf1 Compare July 31, 2024 10:57

kettanaito added 2 commits July 31, 2024 13:02

Add "deflate" and "deflate, gzip" unit tests

e386ac2

Improves unit tests

2bfbc44

kettanaito added 2 commits August 1, 2024 14:55

Export "decompress" and "decompressStream" publicly

2382184

Add unit tests for "decompressStream"

3137c6e

kettanaito marked this pull request as ready for review August 1, 2024 13:04

tsctx reviewed Aug 1, 2024

View reviewed changes

index.js Outdated Show resolved Hide resolved

metcoder95 reviewed Aug 2, 2024

View reviewed changes

kettanaito added 2 commits August 7, 2024 17:29

Move utilities to "util.js"

22cc0da

Fix missing imports in tests

b17059b

kettanaito commented Aug 7, 2024

View reviewed changes

index.js Outdated Show resolved Hide resolved

kettanaito mentioned this pull request Aug 11, 2024

Request by got + compression hangs forever mswjs/msw#2200

Closed

4 tasks

Merge branch 'main' into response-decompress

5895ad0

Undo code formatting

c2b6953

fixup

f92c6cb

kettanaito commented Oct 14, 2024

View reviewed changes

KhafraDev approved these changes Oct 14, 2024

View reviewed changes

Merge branch 'main' into response-decompress

2b5906a

KhafraDev approved these changes Oct 14, 2024

View reviewed changes

metcoder95 reviewed Oct 15, 2024

View reviewed changes

mcollina reviewed Oct 15, 2024

View reviewed changes

fix up

4cd64f3

Uzlopak force-pushed the response-decompress branch from cc18ceb to 4cd64f3 Compare October 15, 2024 11:23

Add "decompress" response utility #3423

Are you sure you want to change the base?

Add "decompress" response utility #3423

Conversation

kettanaito commented Jul 26, 2024 • edited Loading

This relates to...

Rationale

Changes

Features

Bug Fixes

Breaking Changes and Deprecations

Status

KhafraDev commented Jul 26, 2024

kettanaito commented Jul 26, 2024 • edited Loading

KhafraDev commented Jul 26, 2024

kettanaito commented Jul 26, 2024

kettanaito commented Jul 26, 2024

KhafraDev commented Jul 26, 2024

kettanaito commented Jul 26, 2024 • edited Loading

metcoder95 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kettanaito Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kettanaito commented Jul 31, 2024 • edited Loading

kettanaito commented Jul 31, 2024

kettanaito commented Jul 31, 2024

kettanaito commented Aug 1, 2024 • edited Loading

Update

tsctx Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metcoder95 left a comment

Choose a reason for hiding this comment

mcollina commented Aug 2, 2024

kettanaito commented Aug 7, 2024

kettanaito commented Aug 7, 2024

mcollina commented Sep 11, 2024

kettanaito commented Sep 15, 2024

kettanaito commented Oct 14, 2024

kettanaito commented Oct 14, 2024

On code formatting

kettanaito commented Oct 14, 2024

KhafraDev commented Oct 14, 2024

kettanaito Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcollina left a comment

Choose a reason for hiding this comment

kettanaito commented Oct 15, 2024

kettanaito commented Oct 15, 2024

mcollina commented Oct 15, 2024

kettanaito commented Oct 15, 2024

KhafraDev commented Oct 16, 2024

kettanaito commented Oct 17, 2024

Uzlopak commented Oct 17, 2024

kettanaito commented Jul 26, 2024 •

edited

Loading

kettanaito commented Jul 26, 2024 •

edited

Loading

kettanaito commented Jul 26, 2024 •

edited

Loading

kettanaito Jul 31, 2024 •

edited

Loading

kettanaito commented Jul 31, 2024 •

edited

Loading

kettanaito commented Aug 1, 2024 •

edited

Loading

tsctx Aug 1, 2024 •

edited

Loading

kettanaito Oct 14, 2024 •

edited

Loading