Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11482. EC Checksum throws IllegalArgumentException because the buffer limit is negative #7230

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aswinshakil
Copy link
Member

@aswinshakil aswinshakil commented Sep 23, 2024

What changes were proposed in this pull request?

The parity checksum bytes calculation for the EC stripe checksum should be based on each stripe's first chunk's length. But right now it is calculated only on the first stripe's chunk length which results in incorrect results and, in many cases causes the byte buffer limit to become negative. Further, the checksum calculation has been simplified in this patch.

The checksum calculation for the last stripe is a little tricky, We have two cases for it.

  1. If the last stripe has only one chunk at replica index 1, we can get the chunk length to calculate checksum.
  2. If the last stripe has more than one chunk, Then the last chunk will be partial.

For EC, We store the stripe checksum for the entire stripe in replica index 1 and the parity blocks. We use this stripe checksum to calculate the checksum during reading. The chunks we get during checksum calculation will always be from replica index 1 or parity blocks. And the chunk length will always be the stripe's first chunk's length.

Hence, we don't get the length of the last chunk which may be partial in most of the cases. We calculate the last chunk's size from the block size.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11482

How was this patch tested?

Updated the existing test.
Test fails without the fix:https://github.com/aswinshakil/ozone/actions/runs/11002573679/job/30550687378
Test passed with the fix: https://github.com/aswinshakil/ozone/actions/runs/11002650413/job/30551032116

Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aswinshakil for the fix, Looks correct to me, I also verified the patch against the unit tests that I had written to repro this and it works.

long remainingChunkSize = Math.min(keySize, chunkSize);
while (byteWrap.hasRemaining() && remainingChunkSize > 0) {
final int checksumData = byteWrap.getInt();
blockCrcComposer.update(checksumData, Math.min(bytesPerCrc, remainingChunkSize));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this should handle the case when chunk size is not a multiple of bytes.per.crc right ? as we have removed the code pertaining to this part above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand this section.

Also, is this routine called on a block? If an EC key has multiple blocks, then first the checksums are combined for block 1 in a call to computeMd5Crc and then a subsequent call will be made for block 2 to combine its checksums into the value for block 1? How does the keySize work then across multiple blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel Thanks for the catch. It should be blockSize instead of the keySize as we are calculating checksum per block here.

@aswinshakil
Copy link
Member Author

@sadanand48 @sodonnel Can you take another look at this? Thank you!

@aswinshakil aswinshakil requested review from sadanand48 and sodonnel and removed request for sodonnel September 30, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants