Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "decompress" response utility #3423

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

kettanaito
Copy link
Contributor

@kettanaito kettanaito commented Jul 26, 2024

This relates to...

Rationale

Exposing the response decompression logic (i.e. the handling of the Content-Encoding header) will allow other Node.js libraries to reuse it, resulting in a consistent experience for everyone.

See more detailed reasoning in the referenced issue.

Changes

Features

  • Add a new decompress utility, abstracting the Content-Type handling from the existing lib/fetch/index.js.
  • Use the decompress utility in the onHeaders callback.
  • Add unit tests for decompress.
  • Add unit tests for decompressStream.
  • Export the decompress utility publicly from undici.
  • Export the decompressStream utility publicly from undici.
  • Add types to the types module
  • Add type tests
  • Export the functions from under the Util namespace

Bug Fixes

Breaking Changes and Deprecations

Status

@KhafraDev
Copy link
Member

it'd be nice if this worked with the rest of undici too, not just fetch

@kettanaito
Copy link
Contributor Author

kettanaito commented Jul 26, 2024

@KhafraDev, can you point me to other usages where this should work?

I admit, we mostly need it for fetch, since http does not handle response encoding by design (it returns it as-is). I presume Undici has other use cases. I can see what I can do there. That being said, keeping a (request, response) call signature would be nice publicly.

@KhafraDev
Copy link
Member

can you point me to other usages where this should work?

undici.request, stream, etc. #1155

@kettanaito
Copy link
Contributor Author

Added a basic set of unit tests for the decompress function.

✔ ignores responses without the "Content-Encoding" header (32.383292ms)
✔ ignores responses with empty "Content-Encoding" header (0.780042ms)
✔ ignores redirect responses (0.439708ms)
✔ ignores HEAD requests (0.456958ms)
﹣ ignores CONNECT requests (0.0775ms) # SKIP
✔ ignores responses with unsupported encoding (0.219292ms)
✔ decompresses responses with "gzip" encoding (15.465084ms)

Still need to cover multiple Content-Encoding values, like gzip, br, etc.

@kettanaito
Copy link
Contributor Author

@KhafraDev, thinking of tailoring this to a generic body decompression purposes, I can imagine it being used this way:

function createDecompressionStream(args): TransformStream 

// Then, usage:
anyCompressedBodyStream.pipe(createDecompressionStream(args))

The most challenging part is figuring out the args here.

@KhafraDev
Copy link
Member

I think for arguments a stream and a header value would be fine

@kettanaito
Copy link
Contributor Author

kettanaito commented Jul 26, 2024

Got it. Added a decompressStream() utility, reused it in the decompress().

Now, decompressStream can be used with any ReadableStream as input, while decompress operates on Fetch API request and response arguments (also decides additional logic like redirects, etc.)

Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, it LGTM;

I believe the decompressStream can be used within an interceptor if we want to aim for that in the other PR as well.

lib/web/fetch/decompress.js Outdated Show resolved Hide resolved
Comment on lines 80 to 81
decoders.length = 0
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decoders.length = 0
break
return null

Maybe return null or otherwise there is no way to know if it "failed".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also decompressStream should probably not live under /web/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronag, can it fail though? The intention was that, if there's no known compression provided, the stream is returned as-is.

You can still handle failures as you normally would:

decompressStream(myStream).on('error', callback)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's no known compression provided

The logic seems to be that if an unknown coding is sent, the body isn't decompressed at all. I'm not sure if that sounds right either.

Content-Encoding: gzip, zstd, br -> ignored (zstd is not supported)
Content-Encoding: gzip, br -> decompressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a consumer, I expect the body stream to be returned as-is if:

  1. It has no Content-Encoding set (no codings provided as an argument);
  2. It has unknown Content-Encoding set or an unknown coding(s) is provided as an argument.

I wouldn't expect it to throw because that's a disruptive behavior that implies I need to add a guard before trying to call decompress/decompressStream. At least, not at a level this low.

If the API is potentially disruptive, I need to have an extra check in place to prevent scenarios where decompression is impossible. This puts extra work on me as a consumer but also results in a more frail API since the list of supported encodings is baked in Undici and may change across releases.

I strongly suggest to return the body stream as-is in those two cases I described above. decompressString() must never error, save for the streams errors itself (and those are irrelevant in the context of this function).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metcoder95 @KhafraDev do you agree with my reasoning in the post above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, so there's an existing expectation that decompression is skipped if encountered unknown encoding:

 ✖ should skip decompression if unsupported codings

I suppose that answers the question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throwing an error will make fetch slower

Copy link
Contributor Author

@kettanaito kettanaito Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the decoders customization discussion, I don't think it justifies the cost of making decompressStream more complex. The only thing you need is the list of decodings, which you can get using a one-liner:

 const codings = contentEncoding
    .toLowerCase()
    .split(',')
    .map((coding) => coding.trim())

The purpose of decompressStream is to grab the exact behavior Undici has under the hood. If you need a different behavior, you should (1) parse the Content-Encoding header; (2) map encodings to decompression streams with custom options.

I can see how you'd want to extend or override certain things from the default decompression, and for that we, perhaps, can consider exporting those options separately?

module.exports = {
  gzipDecompressionOptions: {
    flush: zlib.constants.Z_SYNC_FLUSH,
    finishFlush: zlib.constants.Z_SYNC_FLUSH
  }
}

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding erroring or throwing; As you suggest @kettanaito, I'd like to fail faster and inform the implementer the encoding is not supported without even disturbing the input stream, so fallbacks can be applied if they want to.

For fetch, as @KhafraDev points out, the throwing might be slower, so we can disable by default, and not throw at all, but rather ignore the non-supported encodings.

The issue I'd like to avoid, is putting the implementer in the spot of all or nothing.

Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.

Agree, let's do that 👍

@kettanaito
Copy link
Contributor Author

kettanaito commented Jul 31, 2024

I've added unit tests for x-gzip and gzip, br encodings. Both seem to pass 🎉

Also added test cases for deflate and deflate, gzip.

@kettanaito
Copy link
Contributor Author

The CI is failing with some issues download Node.js binary, it seems. Doesn't appear to be related to the changes.

@kettanaito
Copy link
Contributor Author

@KhafraDev, what do you think if decompressStream becomes createDecompressStream, dropping the input stream argument and allowing any stream to be piped through it?

myStream.pipe(createDecompressStream(codings))

I'd love to learn from you about the difference/implications here. Thanks.

@kettanaito kettanaito marked this pull request as ready for review August 1, 2024 13:04
@kettanaito
Copy link
Contributor Author

kettanaito commented Aug 1, 2024

Update

I've added remaining unit tests for the decompressStream utility, all are passing locally.

Based on the conclusion of the discussion around unknown encodings, and taking into the consideration that there's an explicit existing test suite that unknown encodings must be ignored when decompressing the fetch() response body, looks like the way forward is to leave the implementation as-is. Please let me know if I misunderstood the conclusion.

I suggest adding the throwing logic to decompressStream once we discover a use case for that (decompression in the interceptor may be that use case). From then, we can open a new pull request to extend the decompressStream utility with options or anything else we find proper.

With this, I believe this pull request is ready for review cc @KhafraDev @mcollina

* @param {Response} response
* @returns {ReadableStream<Uint8Array> | null}
*/
function decompress (request, response) {
Copy link
Member

@tsctx tsctx Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think decompressStream is sufficient. It should already be redirected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have a designated utility that performs the decompression including the handling of redirect responses, etc. decompressStream implies you handle that yourself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you are targeting the Fetch api response.
In this case, have you already been redirected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsctx, I'm moving the existing content encoding logic to this utility function. In the existing logic, Undici does accept the response, which is the redirect response, not the redirected response. This is correct. This utility must also decide if decompression is unnecessary if the response is a redirect response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding 'Web' to the suffix since it targets the web api.

body: decompress(request, {
status,
statusText,
headers: headersList,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

headersList is not an instance of Headers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, not, but its APIs are compatible, from what I can see. At least, the entire test suite hasn't proven me wrong.

I can construct Headers instance out of headersList but it may have performance implications. Would you advise me to do that?

index.js Outdated Show resolved Hide resolved
Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just having the considerations of @tsctx about the util namespace and the following thread: https://github.com/nodejs/undici/pull/3423/files#r1694215822

@mcollina
Copy link
Member

mcollina commented Aug 2, 2024

Can you please add the types for this and the related type tests?

@kettanaito
Copy link
Contributor Author

On a related subject, I've recently learned about DecompressionStream global API, which is also available in Node.js as of now. Can it be utilize to the same effect as the API being proposed here?

response.body.pipeTo(new DecompressionStream('gzip'))

One practical downside of this is that there's no connection between the response content encoding and the decompression used. You can also provide only one decompression format (gzip, deflate, deflate-raw), so in case of response streams encoded with multiple codings, you'd have to create a pipe of decompression streams.

Does anybody know if there's any difference in DecompressionStream and the content encoding handling Undici has as of now?

I'd still much like to be consistent with Unidici here, but knowing more would be good.

index.js Outdated Show resolved Hide resolved
@kettanaito
Copy link
Contributor Author

Moved the tests to the root level of test, and now random tests are failing:

 should include encodedBodySize in performance entry (1.82966ms)
  TypeError [Error]: webidl.converters.USVString is not a function
      at fetch (/Users/kettanaito/Projects/contrib/undici/index.js:121:13)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async Server.<anonymous> (/Users/kettanaito/Projects/contrib/undici/test/fetch/resource-timing.js:68:18)

The tests are in their own modules, no existing tests were edited. The errors themselves don't seem to be related to the changes.

Does anybody know what's causing this?

@mcollina
Copy link
Member

@kettanaito are you still considering finishing this?

@kettanaito
Copy link
Contributor Author

@mcollina, yes. I've fallen off lately, doing other things. This is still on my todo list, and I'd love to see this merged. Still got those test shenanigans, no idea what causing seemingly unrelated tests to fail. I would appreciate your patience with me here, it may take me time to get back to this.

@kettanaito
Copy link
Contributor Author

I've rebased the branch against the latest main, and have one question:

Is the rightmost err stream for the body necessary just to have 3+ streams here, or does it serve a different purpose? I suspect stream errors will already propagate as the error event on the stream itself.

-body: decoders.length
-  ? pipeline(this.body, ...decoders, (err) => {
-    if (err) {
-      this.onError(err)
-    }
-  }).on('error', onError)
-  : this.body.on('error', onError)

+body: decompress(request, {
+  status,
+  statusText,
+  headers: headersList,
+  body: this.body
+}).on('error', onError)

Basically, my rebase removed the (err) => {} part of the pipeline. Please let me know if that's okay.

Also, since decompress() now returns the body stream as-is if there are no transformations found to decompress that stream, there's no need to have a conditional body assignment.

@kettanaito
Copy link
Contributor Author

On code formatting

I undid the code formatting to keep the diff sane, but I highly encourage you to investigate your Prettier setup. Running npm run lint:fix produces a lot of formatting diff. That's an indication that either it's not set up properly, or the contributors are ignoring that and pushing changes with their own formatter settings instead of using those set up in the project.

@kettanaito
Copy link
Contributor Author

Can I please get some help with these failing tests? They don't seem to be related to the changes I'm introducing. Looks more like I need to build something or update something...

✖ Buffer will be returned as string via text() (0.754584ms)
  TypeError [Error]: webidl.converters.USVString is not a function
      at webidl.converters.RequestInfo (/Users/kettanaito/Projects/contrib/undici/lib/web/fetch/request.js:996:30)
      at new Request (/Users/kettanaito/Projects/contrib/undici/lib/web/fetch/request.js:104:31)
      at fetch (/Users/kettanaito/Projects/contrib/undici/lib/web/fetch/index.js:145:21)
      at TestContext.<anonymous> (/Users/kettanaito/Projects/contrib/undici/test/mock-interceptor.js:376:30)
      at Test.runInAsyncScope (node:async_hooks:206:9)
      at Test.run (node:internal/test_runner/test:631:25)
      at Suite.processPendingSubtests (node:internal/test_runner/test:374:18)
      at Test.postRun (node:internal/test_runner/test:715:19)
      at Test.run (node:internal/test_runner/test:673:12)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Why is webidl.converters.USVString suddenly became undefined? Would appreciate assistance on this so I could proceed with adding type tests and having this merged.

cc @KhafraDev @mcollina

@KhafraDev
Copy link
Member

It's a circular require, lib/web/fetch/util.js is importing lib/core/util.js and vice versa.

lib/core/util.js Outdated
let createInflate

function lazyCreateInflate (zlibOptions) {
createInflate ??= require('../web/fetch/util').createInflate
Copy link
Contributor Author

@kettanaito kettanaito Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KhafraDev, wouldn't it make more sense to lift createInflate to ../core/util in this case? If multiple modules depend on it, perhaps it's a sign to make it a generic utility.

From what I can see, it was collocated with fetch only because it was used for the body decompression. Now that the decompression logic is a core util, maybe it should be there, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for the path of least resistance, I have no objections in either case.

/**
* @note Undici does not support the "CONNECT" request method.
*/
test.skip('ignores CONNECT requests', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, or what do you mean exactly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONNECT is a special method which opens a tunnel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I get that but we do support it (maybe fetch doesn’t, and that’s the part that i missed?)

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this utility get publicly exposed?

@kettanaito
Copy link
Contributor Author

@mcollina, yes, that's the intention. We would like to reuse it in MSW to have a consistent body decompression across the board. As suggested by prior reviewers, we can expose this from utils, but I suppose that still means the root export of the package.

@kettanaito
Copy link
Contributor Author

@KhafraDev, thank you so much for your support on this one!

I believe there are still type tests remaining on my end, and making sure we export this correctly (perhaps @mcollina has some thoughts on this).

@mcollina
Copy link
Member

@kettanaito add the export from the root package and related docs.

@kettanaito
Copy link
Contributor Author

Will do! I should have some time tomorrow to finalize this.

@KhafraDev
Copy link
Member

Is this needed after mswjs/interceptors#661? I think there are some semver issues to consider along with the general weirdness of exposing internals for public use

@kettanaito
Copy link
Contributor Author

@KhafraDev, that's a good question. I've come to realize that we wouldn't be able to reuse this compression straight away, and would have to either ship undici alongside Interceptors, or inline its decompression logic anyway.

I would rather not ship undici because it creates two separate universes of the bundled Undici in the user's Node, and explicit Undici they install with MSW. Not a good thing.

I was surprised to learn about the Compression Streams API. Sadly, it doesn't support Brotli, and I'm researching the possible ways to move forward here.

That being said, I still find the decompress logic being exposed as beneficial to Undici and its consumers. I believe there were issues that linked to its need as well.

@Uzlopak
Copy link
Contributor

Uzlopak commented Oct 17, 2024

I think that moving the code from fetch to core is wrong. decompress expects Request and Response from fetch and not the core Request and Response instances. So the decompression needs to be moved back to the fetch folder. Or do I understand that wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose "Content-Encoding" handling publicly
7 participants