Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTTPCLIENT-1843] - Delegate compression handling to Apache Commons Compress #580

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

arturobernalg
Copy link
Member

This PR addresses HTTPCLIENT-1843, allowing Apache HttpClient to delegate compression handling to Apache Commons Compress. The goal is to provide broader support for various compression algorithms while reducing custom implementation and maintenance.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.27.1 is the current version of Commons Compress.

@arturobernalg
Copy link
Member Author

1.27.1

changed.


private static final Map<String, String> COMPRESSION_ALIASES;
static {
final Map<String, String> aliases = new HashMap<>();
Copy link
Member

@garydgregory garydgregory Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arturobernalg
Why just those stream types?
How about using:

  • org.apache.commons.compress.compressors.CompressorStreamProvider.getInputStreamCompressorNames()
  • org.apache.commons.compress.compressors.CompressorStreamProvider.getOutputStreamCompressorNames()
  • Should I add an API to return org.apache.commons.compress.compressors.CompressorStreamFactory.ALL_NAMES?

public InputStream getContent() throws IOException {
if (super.isStreaming()) {
if (content == null) {
content = getDecompressingStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not lock, we should document this class as not thread-safe.

@arturobernalg
Copy link
Member Author

HI @garydgregory
Please do another pass.
TY

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arturobernalg
Thank you for your update.

  • I have comments embedded throughout.
  • There is some lazy-initialization code here and there, this should be called out in the class Javadoc making the class thread-unsafe, we have an annotation for that: org.apache.hc.core5.annotation.ThreadingBehavior.

final GzipDecompressingEntity gunzipe = new GzipDecompressingEntity(out);
Assertions.assertEquals("some kind of text", EntityUtils.toString(gunzipe, StandardCharsets.US_ASCII));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra blank lines, the comments suffice.

@arturobernalg
Copy link
Member Author

Thank you for your update.

* I have comments embedded throughout.

* There is some lazy-initialization code here and there, this should be called out in the class Javadoc making the class thread-unsafe, we have an annotation for that: `org.apache.hc.core5.annotation.ThreadingBehavior`.

HI @garydgregory
I believe I’ve covered all the changes you requested.
TY

* @since 5.5
*/
@Contract(threading = ThreadingBehavior.UNSAFE)
public class DecompressEntity extends HttpEntityWrapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class name is mismatched with the "CompressingEntity" name, either they should both be "...ing" names, or not.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arturobernalg

I git-hub resolved most of the comment I wrote. I think we should consistently name ALL our new classes either with or without the "ing" word part, either [Compress|Decompress]Entity or [Compressing|Decompressing]Entity. I think "ing" is better and in keeping with our existing other HttpEntity implementations like IncomingHttpEntity and DecompressingEntity (conflicting).
I also think that all this new compression/decompression code should live in its own package, maybe org.apache.hc.client5.http.entity.compress.

@arturobernalg
Copy link
Member Author

Hi @arturobernalg

I git-hub resolved most of the comment I wrote. I think we should consistently name ALL our new classes either with or without the "ing" word part, either [Compress|Decompress]Entity or [Compressing|Decompressing]Entity. I think "ing" is better and in keeping with our existing other HttpEntity implementations like IncomingHttpEntity and DecompressingEntity (conflicting). I also think that all this new compression/decompression code should live in its own package, maybe org.apache.hc.client5.http.entity.compress.

Please @garydgregory take another look.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arturobernalg
Thank you for the updates! 😄
I added more comments throughout.
We're very close I think 😄
TY!

*/
public String getFormattedName(final String name) {
if (name == null || name.isEmpty()) {
LOG.warn("Compression name is null or empty");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API should not do any logging, especially at such a high level. This is just a string-to-string conversion API, it has no business rasing warnings IMO.

Args.notNull(entity, "Entity");
Args.notNull(contentEncoding, "Content Encoding");
if (!isSupported(contentEncoding, false)) {
LOG.warn("Unsupported decompression type: {}", contentEncoding);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should not do any logging IMO, especially warnings, it should throw an illegal argument exception or let the call site deal with a null result.

private Set<String> fetchAvailableInputProviders() {
final Set<String> inputNames = compressorStreamFactory.getInputStreamCompressorNames();
return inputNames.stream()
.map(String::toLowerCase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have some odd results depending on the locale context (https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/) if you do not use a Locale like the PR does above using ROOT.

private Set<String> fetchAvailableOutputProviders() {
final Set<String> outputNames = compressorStreamFactory.getOutputStreamCompressorNames();
return outputNames.stream()
.map(String::toLowerCase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

*
* @return a set of available output stream compression providers in lowercase.
*/
private Set<String> fetchAvailableOutputProviders() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "fetch" method name prefix is unusual and surprising to me, I would follow the "get" naming convention.

@@ -38,7 +38,9 @@
* Common base class for decompressing {@link HttpEntity} implementations.
*
* @since 4.4
* @deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc tag is missing its comment to redirect reader.

try {
compressorStream = CompressingFactory.INSTANCE.getCompressorOutputStream(contentEncoding, outStream);
} catch (final CompressorException e) {
throw new IOException("Error initializing decompression stream", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean "Error initializing compression stream"?

// Write compressed data
super.writeTo(compressorStream);
// Close the compressor stream after writing
compressorStream.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this method be rewritten to use a try-with-resources block?

}

/**
* Returns the formatted name of the provided compression format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this Javadoc means. What's a "formatted" name? Maybe the API is misnamed? Is the job of the method to map from a content type value to a Commons Compress key? Is it something else? I think the comment should at least make it clear, which should guide us to a better API name.

final String formattedName = getFormattedName(name);
return isSupported(formattedName, false)
? createCompressorInputStream(formattedName, inputStream, noWrap)
: inputStream;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc and implementation don't match:

  • Javadoc states decompression is performed
  • The code doesn't decompress if the format isn't supported.

Either the Javadoc needs updating, or, more likely, we should throw an exception indicating the operation is not supported for that format type name.

I must say I find the API and Javadoc confusing, we are creating a "Compressor" input stream in order to "decompress" with a "compression" format! What?

  • Maybe calling the API getDecompressorInputStream would help because it "decompresses". I expect a "Compressor" to compress, not the other way around.
  • Maybe using the term "format type name" instead "compression format" would remove the need for the reader to constantly parse "compress" vs. "decompress" in the text.
  • Maybe only using one term "decompress" or "compress" for one method (Javadoc and code) would alleviate the issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @garydgregory . Agreed—the Javadoc and implementation don’t match. We’ll:

  • Throw an exception on unsupported formats instead of returning the original stream, aligning with Javadoc expectations.

  • Rename to getDecompressorInputStream to reflect decompression.

  • Standardize to “format type” instead of “compression format” to clarify.

These changes should make both the API and docs clearer. Thanks for flagging this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @garydgregory
Can you do another pass??
Thank you.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arturobernalg

  • TY for the work 👍
  • Minor: For the Javadoc since tags, I think the next feature release will be 5.5 since the current release is 5.4.1 and the git master POM already has 5.5 in it.
  • Side note: In Commons Compress, CompressorException usually wraps an IOException, so it's too bad that we wrap it in another IOException here ;-) Since compression and decompression is IO, I'll see about making CompressorException extend IOException.

…se 'ing' form (e.g., CompressingEntity) to align with existing conventions like DecompressingEntity. Moved compression/decompression classes to org.apache.hc.client5.http.entity.compress package
@arturobernalg
Copy link
Member Author

Hi @arturobernalg

* TY for the work 👍

* Minor: For the Javadoc `since` tags, I think the next feature release will be `5.5` since the current release is `5.4.1` and the git master POM already has 5.5 in it.

* Side note: In Commons Compress, `CompressorException` usually wraps an `IOException`, so it's too bad that we wrap it in another IOException here ;-) Since compression and decompression is IO, I'll see about making CompressorException extend IOException.

Hi @garydgregory ,

Thanks for the feedback!. I've updated all the @since tags to reflect version 5.5 as suggested.  Regarding the `CompressorException` wrapping `IOException`, I’ll wait for updates on whether it will extend IOException in Commons Compress. This will determine the best course of action here.

Let me know if there’s anything else to address.

@garydgregory
Copy link
Member

Hi @arturobernalg,

What do you think about apache/commons-compress#605

@arturobernalg
Copy link
Member Author

Hi @arturobernalg,

What do you think about apache/commons-compress#605

Hi @arturobernalg,

What do you think about apache/commons-compress#605

Hi @garydgregory
The changes make sense. Thank you.

@garydgregory
Copy link
Member

@arturobernalg

Thank you for the review on apache/commons-compress#605, I merged that PR.

@arturobernalg
Copy link
Member Author

@arturobernalg

Thank you for the review on apache/commons-compress#605, I merged that PR.

All right. Let me know if we need to do another change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants