properly strip trailing null characters (\0) from tag data #636

gtiwari333 · 2023-11-21T19:33:05Z

A change from https://github.com/drewnoakes/metadata-extractor/pull/634/files#diff-7df436f0258b5156fd15792db94b4cab6119180ad7d605533ffb53213f048ba2R400 got missed in earlier merge.

#635

gtiwari333 · 2023-11-21T19:53:27Z

This PR will fix the extraction of picture control name that's coming up with trailing 0s.

drewnoakes · 2023-11-21T23:46:12Z

Thanks for following up here. I intentionally changed your last PR before merging to avoid the approach here though. Let's see if there's a different approach that'll work for you.

When pulling strings from bytes, we generally store StringValue rather than String. This allows the consumer to reinterpret the underlying data in different encodings. If we convert to string, that data is lost.

The problem you're seeing is that StringValue is not trimming trailing null bytes. I wonder if there's a better way to deal with that.

drewnoakes

I understand now. I introduced a bug in your PR — sorry about that! I should have waited for your input before merging, but you said you wanted a release so I pushed ahead.

The values read from this format are null terminated and should be trimmed. There's already a method we can use in that case! Instead of adding getNullTerminatedStringAndSkipToNextPosition, we can call getNullTerminatedStringValue . That will simplify the fix here a bit, and it'll keep the StringValue so the value can be re-encoded by the consumer if needed.

I'll get another release out with your fix when it's ready.

Thanks!

drewnoakes · 2023-11-22T03:27:55Z

Source/com/drew/lang/SequentialReader.java

+    /**
+     * Read until the null terminated byte and automatically move the end of the requested position.
+     * @param maxLengthBytes
+     * @param charset
+     * @return
+     * @throws IOException
+     */
+    public StringValue getNullTerminatedStringAndSkipToNextPosition(int maxLengthBytes, Charset charset) throws IOException {
+        byte[] bytes = this.getNullTerminatedBytes(maxLengthBytes);
+        if (bytes.length < maxLengthBytes - 1) {
+            this.trySkip(maxLengthBytes - bytes.length - 1);
+        }
+        return new StringValue(bytes, charset);
+    }


We don't need to add a new method. We can call getNullTerminatedStringValue instead.

drewnoakes · 2023-11-22T03:29:56Z

Source/com/drew/metadata/exif/makernotes/NikonPictureControl1Directory.java

+        directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());
+        directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
+        directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());


In order to preserve the StringValue this can be:

Suggested change

directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());

directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());

directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringValue(4, Charsets.UTF_8));

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));

directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));

drewnoakes · 2023-11-22T03:30:31Z

Source/com/drew/metadata/exif/makernotes/NikonPictureControl2Directory.java

+        directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());
+        directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());
+        directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());


Ditto:

Suggested change

directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringAndSkipToNextPosition(4, Charsets.UTF_8).toString());

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());

directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringAndSkipToNextPosition(20, Charsets.UTF_8).toString());

directory.setString(TAG_PICTURE_CONTROL_VERSION, reader.getNullTerminatedStringValue(4, Charsets.UTF_8));

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));

directory.setString(TAG_PICTURE_CONTROL_BASE, reader.getNullTerminatedStringValue(20, Charsets.UTF_8));

drewnoakes · 2023-11-22T03:36:40Z

Tests/com/drew/lang/SequentialAccessTestBase.java

-    public void testGetNullTerminatedStringCursorPositionTest() throws IOException
-    {
+    public void testGetNullTerminatedStringCursorPositionTest() throws IOException {


Please avoid reformatting code in PRs that also include functional changes. Formatting changes are appreciated when they change code towards the predominant style in the repo. We don't need the getNullTerminatedStringAndSkipToNextPosition method, nor its unit test. The toString calls below don't seem to do anything, and the formatting changes are unrelated to the PR. I think this file can be exluded from the PR altogether.

drewnoakes · 2023-11-22T03:44:22Z

I ported the fix to .NET as well here: drewnoakes/metadata-extractor-dotnet#349

drewnoakes · 2023-11-22T03:44:47Z

And CI is fixed again. Sorry about that.

gtiwari333 · 2023-11-22T15:34:27Z

Instead of adding getNullTerminatedStringAndSkipToNextPosition, we can call getNullTerminatedStringValue

There's a problem with getNullTerminatedStringValue. After reading the requested number of bytes from input, it doesn't move the cursor to the end. My test testGetNullTerminatedStringCursorPositionTest explains what's wrong with getNullTerminatedStringValue method. Only way to get around it is to check the size of text returned and manually skip the remaining bytes.

For input AB\0\0\0CD\0ABCDEF

If you ask for getNullTerminatedString(5) it will return AB but keep the cursor position at index 3.

The new method getNullTerminatedStringAndSkipToNextPosition reads the AB and move the cursor at C.

If you compare testGetNullTerminatedStringCursorPositionTest and testGetNullTerminatedStringAndSkipToNextPosition you will see the improvement the new method makes.

gtiwari333 · 2023-11-22T16:47:35Z

Did few more testing and found an easy way to deal with it. Let me know what would you prefer?

The getStringValue reads the all 20 bytes and moves the cursor to the end. If we trim the value using String.trim(), it will remove the trailing null characters.

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getStringValue(20, Charsets.UTF_8).toString())

Change this to:

directory.setString(TAG_PICTURE_CONTROL_NAME, reader.getStringValue(20, Charsets.UTF_8).toString().trim())

drewnoakes · 2024-01-22T11:21:37Z

I ended up fixing this in #647 with an approach that mirrors what we did in the .NET version, so that they're as close as possible (which helps porting other changes in future). It also avoids adding toString calls to reduce allocations and allow consumers to reinterpret the StringValue in other encodings if needed.

drewnoakes · 2024-01-22T20:49:04Z

Oops. Thanks!

Nadahar · 2024-01-22T21:49:04Z

@drewnoakes I deleted my comment because I assumed you would edit yours - but ended up making it make no sense at all 😊

fix trailing null characters (\0) from tag

997e515

drewnoakes reviewed Nov 22, 2023

View reviewed changes

gtiwari333 mentioned this pull request Nov 22, 2023

Fix NikonPictureControl1Directory parsing drewnoakes/metadata-extractor-dotnet#349

Merged

drewnoakes closed this Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

properly strip trailing null characters (\0) from tag data #636

properly strip trailing null characters (\0) from tag data #636

gtiwari333 commented Nov 21, 2023 •

edited

Loading

gtiwari333 commented Nov 21, 2023

drewnoakes commented Nov 21, 2023

drewnoakes left a comment

drewnoakes Nov 22, 2023

drewnoakes Nov 22, 2023

drewnoakes Nov 22, 2023

drewnoakes Nov 22, 2023

drewnoakes commented Nov 22, 2023

drewnoakes commented Nov 22, 2023

gtiwari333 commented Nov 22, 2023

gtiwari333 commented Nov 22, 2023 •

edited

Loading

drewnoakes commented Jan 22, 2024 •

edited

Loading

drewnoakes commented Jan 22, 2024

Nadahar commented Jan 22, 2024

properly strip trailing null characters (\0) from tag data #636

properly strip trailing null characters (\0) from tag data #636

Conversation

gtiwari333 commented Nov 21, 2023 • edited Loading

gtiwari333 commented Nov 21, 2023

drewnoakes commented Nov 21, 2023

drewnoakes left a comment

Choose a reason for hiding this comment

drewnoakes Nov 22, 2023

Choose a reason for hiding this comment

drewnoakes Nov 22, 2023

Choose a reason for hiding this comment

drewnoakes Nov 22, 2023

Choose a reason for hiding this comment

drewnoakes Nov 22, 2023

Choose a reason for hiding this comment

drewnoakes commented Nov 22, 2023

drewnoakes commented Nov 22, 2023

gtiwari333 commented Nov 22, 2023

gtiwari333 commented Nov 22, 2023 • edited Loading

drewnoakes commented Jan 22, 2024 • edited Loading

drewnoakes commented Jan 22, 2024

Nadahar commented Jan 22, 2024

gtiwari333 commented Nov 21, 2023 •

edited

Loading

gtiwari333 commented Nov 22, 2023 •

edited

Loading

drewnoakes commented Jan 22, 2024 •

edited

Loading