Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the mini-stream size in the root property #731

Closed
wants to merge 1 commit into from

Conversation

ebourg
Copy link
Member

@ebourg ebourg commented Nov 21, 2024

This is a follow up to #182, which didn't cover all possibles cases of gaps in the allocation table of the mini stream.

This PR contains a unit test generating a file with the following layout:

  • Mini FAT Sector 1 is unallocated
  • Mini FAT Sector 2 has 8 unallocated mini sectors at the beginning
  • Mini FAT Sector 3 has 4 unallocated mini sectors in the middle
  • Mini FAT Sector 4 is unallocated
  • Mini FAT Sector 5 has 32 unallocated mini sectors at the end
  • Mini FAT Sector 6 has 64 unallocated sectors at the beginning and 16 unallocated mini sectors at the end
  • Mini FAT Sector 7 is unallocated
  • Mini FAT Sector 8 is unallocated

With this layout POI 5.3 computes a mini stream size of 29696 bytes, that is the sum of the "occupied" bytes (as per BATBlock.getOccupiedSize()) for each sector : (0 + 128 + 128 + 0 + (128-32) + (128-16) + 0 + 0) x 64.

After signing the file with signtool:

  • The first 2 mini sectors of the first mini FAT sector have been allocated (for the MsiDigitalSignatureEx entry)
  • The empty mini FAT sectors 4, 7 and 8 were not removed
  • The size field of the root entry is 48128 bytes, that's (5 x 128 + (128 - 16)) x 64

So the right method to compute the size is to find the absolute index of the last allocated mini sector, add one and multiply by 64.

@asfgit asfgit closed this in 0d63f9a Nov 25, 2024
@ebourg
Copy link
Member Author

ebourg commented Nov 25, 2024

Thank you!

@pjfanning
Copy link
Contributor

I had to revert this. There were memory issues in our stress tests and this appears to be the most likely cause.

https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/

There were other failures in other builds too - anyone looking to browse around are welcome to look at build jobs in https://ci-builds.apache.org/job/POI/

@ebourg
Copy link
Member Author

ebourg commented Nov 26, 2024

That's weird, because the modified calculation simply loops through the mini FAT sectors and checks the allocation of each mini sector, that's hardly different from the previous method. I'll investigate.

@pjfanning
Copy link
Contributor

Unfortunately, the poi-integration module and its stress tests are a real pain. They take all the files in test-data and run basic scenarios. Some files are explicitly excluded. Unfortunately, many files are checked in specifically because they are problematic.

The evidence is of failures across many CI runs over the last number of hours - and the CI is much more stable since I reverted the change. There are some edge case CI runs (eg OpenJDK) that are already broken - but I'm talking about CI plans that pass normally.

@ebourg
Copy link
Member Author

ebourg commented Nov 27, 2024

I haven't been able to run the integration tests, Gradle keeps throwing error messages, but looking at the CI log I see two failures:

org.apache.poi.stress.TestAllFiles > handleFile(String, FileHandlerKnown, String, Class, String) #2892 xmldsign/ms-office-2010-signed.pptx XSLF PASSED (2.3s)
org.apache.poi.stress.TestAllFiles > handleFile(String, FileHandlerKnown, String, Class, String) #2382 spreadsheet/StructuredRefs-lots-with-lookups.xlsx XSSF FAILED (1h 3m)

  org.opentest4j.AssertionFailedError: spreadsheet/StructuredRefs-lots-with-lookups.xlsx - failed for handler XSSFFileHandler:  ==> Unexpected exception thrown: java.lang.IllegalArgumentException: Self-suppression not permitted
      at org.apache.poi.stress.TestAllFiles.verify(TestAllFiles.java:298)
  Caused by: java.lang.IllegalArgumentException: Self-suppression not permitted
      at org.apache.poi.stress.TestAllFiles.lambda$handleFile$1(TestAllFiles.java:226)
  Caused by: java.lang.OutOfMemoryError: Java heap space

org.apache.poi.stress.TestAllFiles > handleExtracting(String, FileHandlerKnown, String, Class, String) Extracting - #2382 spreadsheet/StructuredRefs-lots-with-lookups.xlsx XSSF PASSED (1h 22m)
org.apache.poi.stress.TestAllFiles > handleFile(String, FileHandlerKnown, String, Class, String) #2729 spreadsheet/poc-shared-strings.xlsx XSSF PASSED (1h 21m)

and:

org.apache.poi.stress.TestAllFiles > handleAdditional(String, FileHandlerKnown, String, Class, String) Additional - #2781 spreadsheet/styles-3563.xls HSSF PASSED
org.apache.poi.stress.TestAllFiles > handleAdditional(String, FileHandlerKnown, String, Class, String) Additional - #2729 spreadsheet/poc-shared-strings.xlsx XSSF PASSED (1m 7s)
JUnit Jupiter executionError FAILED

  org.junit.platform.commons.JUnitException: TestEngine with ID 'junit-jupiter' failed to execute tests
  Caused by: org.junit.platform.commons.JUnitException: Error executing tests for engine junit-jupiter
  Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError
  Caused by: java.lang.OutOfMemoryError
  Caused by: java.lang.OutOfMemoryError
  Caused by: java.lang.OutOfMemoryError: Java heap space

org.apache.poi.stress.TestAllFiles > handleExtracting(String, FileHandlerKnown, String, Class, String) Extracting - #2383 spreadsheet/StructuredRefs-lots-with-lookups.xlsx OPC SKIPPED
org.apache.poi.stress.TestAllFiles > handleExtracting(String, FileHandlerKnown, String, Class, String) Extracting - #1963 spreadsheet/57893-many-merges.xlsx OPC SKIPPED
org.apache.poi.stress.TestAllFiles > handleExtracting(String, FileHandlerKnown, String, Class, String) Extracting - #2707 spreadsheet/mv-calculator-final-2-20-2013.xlsm XSSF SKIPPED
org.apache.poi.stress.TestAllFiles > handleExtracting(String, FileHandlerKnown, String, Class, String) Extracting - #1728 spreadsheet/51585.xlsx XSSF SKIPPED

The first one is related to an OOXML file (spreadsheet/StructuredRefs-lots-with-lookups.xlsx). It's not clear which file was being processed on the second failure, but the error occurs in the middle of OOXML files.

If I'm not mistaken OOXML files are completely unrelated to the old OLE/COM/CFB files, so I fail to see why the change to the calculation of the size of the root property triggered these failures. Maybe the memory usage changed slightly and the garbage collector kicked in at a different time, which in a tight memory context may lead to an OOM. Is it possible to increase the memory allocated to the integration tests?

@pjfanning
Copy link
Contributor

pjfanning commented Nov 27, 2024

POIFS is used all over the poi-ooxml code base.
I think it is mainly associated in poi-ooxml with encrypting/decrypting password protected files.

@ebourg
Copy link
Member Author

ebourg commented Nov 27, 2024

I've pushed a couple of changes to the PR branch:

  • POIFSMiniStore now reuses BATBlock.getOccupiedSize() to compute the size of the root property. Thus the block array is accessed directly to compute the size instead of going through BATBlock.getValueAt() which does a boundary check on each read. (ebourg@dda38f0)
  • BATBlock.getOccupiedSize() has been slightly optimized, the usedSectors variable was redundant with the loop variable and was removed, that's one less integer allocation per mini FAT sector. (ebourg@a54dcaf)

I don't know if that's enough for the integration tests to pass, but it can't hurt. If the integration tests still fail then the remaining difference with the previous code is either the double loop over the mini FAT sectors, one forward to write them, and one backward to compute the size, or the different size computed which is always larger or equal than previously and may lead to different memory allocations down the road.

@pjfanning
Copy link
Contributor

I added back the original commit and increased the memory in gradle build. The first run went ok and the GitHUB CI was already passing with and without this change. https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/1171/

@ebourg Can you take any new improvements in this branch and create a new PR?

@ebourg
Copy link
Member Author

ebourg commented Nov 27, 2024

the different size computed which is always larger or equal than previously and may lead to different memory allocations down the road.

One extreme case comes to mind, for example a file with n mini FAT sectors, all unallocated except the last one with only one allocated mini sector at the beginning. In this case the old method computes a size of 1 sector (64 bytes) and the new method computes a size of 128 x (n - 1) + 1 sectors. For a file using 4096 byte sectors and 10 mini FAT sectors, the computed size of the mini FAT would grow from 64 bytes to 576KB.

@ebourg
Copy link
Member Author

ebourg commented Nov 27, 2024

I added back the original commit and increased the memory in gradle build.

Thank you

Can you take any new improvements in this branch and create a new PR?

Sure, see #735

Might be nice to migrate POI to Git btw ;) Integrating GitHub PRs would be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants