Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

properly strip trailing null characters (\0) from tag data #636

Conversation

gtiwari333
Copy link

@gtiwari333 gtiwari333 commented Nov 21, 2023

@gtiwari333
Copy link
Author

This PR will fix the extraction of picture control name that's coming up with trailing 0s.

image

@drewnoakes
Copy link
Owner

Thanks for following up here. I intentionally changed your last PR before merging to avoid the approach here though. Let's see if there's a different approach that'll work for you.

When pulling strings from bytes, we generally store StringValue rather than String. This allows the consumer to reinterpret the underlying data in different encodings. If we convert to string, that data is lost.

The problem you're seeing is that StringValue is not trimming trailing null bytes. I wonder if there's a better way to deal with that.

Copy link
Owner

@drewnoakes drewnoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now. I introduced a bug in your PR — sorry about that! I should have waited for your input before merging, but you said you wanted a release so I pushed ahead.

The values read from this format are null terminated and should be trimmed. There's already a method we can use in that case! Instead of adding getNullTerminatedStringAndSkipToNextPosition, we can call getNullTerminatedStringValue . That will simplify the fix here a bit, and it'll keep the StringValue so the value can be re-encoded by the consumer if needed.

I'll get another release out with your fix when it's ready.

Thanks!

Comment on lines +393 to +406
/**
* Read until the null terminated byte and automatically move the end of the requested position.
* @param maxLengthBytes
* @param charset
* @return
* @throws IOException
*/
public StringValue getNullTerminatedStringAndSkipToNextPosition(int maxLengthBytes, Charset charset) throws IOException {
byte[] bytes = this.getNullTerminatedBytes(maxLengthBytes);
if (bytes.length < maxLengthBytes - 1) {
this.trySkip(maxLengthBytes - bytes.length - 1);
}
return new StringValue(bytes, charset);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add a new method. We can call getNullTerminatedStringValue instead.

Comment on lines +81 to +83
directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to preserve the StringValue this can be:

Suggested change
directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringValue(4, Charsets.UTF_8));
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));
directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));

Comment on lines +89 to +91
directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto:

Suggested change
directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringValue(4, Charsets.UTF_8));
directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));
directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));

public void testGetNullTerminatedStringCursorPositionTest() throws IOException
{
public void testGetNullTerminatedStringCursorPositionTest() throws IOException {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid reformatting code in PRs that also include functional changes. Formatting changes are appreciated when they change code towards the predominant style in the repo. We don't need the getNullTerminatedStringAndSkipToNextPosition method, nor its unit test. The toString calls below don't seem to do anything, and the formatting changes are unrelated to the PR. I think this file can be exluded from the PR altogether.

@drewnoakes
Copy link
Owner

I ported the fix to .NET as well here: drewnoakes/metadata-extractor-dotnet#349

@drewnoakes
Copy link
Owner

And CI is fixed again. Sorry about that.

@gtiwari333
Copy link
Author

Instead of adding getNullTerminatedStringAndSkipToNextPosition, we can call getNullTerminatedStringValue

There's a problem with getNullTerminatedStringValue. After reading the requested number of bytes from input, it doesn't move the cursor to the end. My test testGetNullTerminatedStringCursorPositionTest explains what's wrong with getNullTerminatedStringValue method. Only way to get around it is to check the size of text returned and manually skip the remaining bytes.

For input AB\0\0\0CD\0ABCDEF

If you ask for getNullTerminatedString(5) it will return AB but keep the cursor position at index 3.

The new method getNullTerminatedStringAndSkipToNextPosition reads the AB and move the cursor at C.

If you compare testGetNullTerminatedStringCursorPositionTest and testGetNullTerminatedStringAndSkipToNextPosition you will see the improvement the new method makes.

@gtiwari333
Copy link
Author

gtiwari333 commented Nov 22, 2023

Did few more testing and found an easy way to deal with it. Let me know what would you prefer?

The getStringValue reads the all 20 bytes and moves the cursor to the end. If we trim the value using String.trim(), it will remove the trailing null characters.

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getStringValue(20, Charsets.UTF_8).toString())

Change this to:

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getStringValue(20, Charsets.UTF_8).toString().trim())

@drewnoakes
Copy link
Owner

drewnoakes commented Jan 22, 2024

I ended up fixing this in #647 with an approach that mirrors what we did in the .NET version, so that they're as close as possible (which helps porting other changes in future). It also avoids adding toString calls to reduce allocations and allow consumers to reinterpret the StringValue in other encodings if needed.

@drewnoakes drewnoakes closed this Jan 22, 2024
@drewnoakes
Copy link
Owner

Oops. Thanks!

@Nadahar
Copy link
Contributor

Nadahar commented Jan 22, 2024

@drewnoakes I deleted my comment because I assumed you would edit yours - but ended up making it make no sense at all 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants