-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added feature to save pdfs to ram or files dynamically #1788
Conversation
Signed-off-by: GitHub Action <[email protected]>
Update translation files
This comment was marked as outdated.
This comment was marked as outdated.
@@ -135,4 +139,20 @@ public Predicate<Path> processOnlyFiles() { | |||
} | |||
}; | |||
} | |||
|
|||
@Bean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to make a new settings file requirement here.
This code also Requires the entry
currently the PR fails with
Assertion Failed: Expected application/octet-stream but got application/json. Response content: b'{"timestamp":"2024-09-03T09:09:03.423+00:00","status":500,"error":"Internal Server Error","exception":"java.lang.NullPointerException","trace":"java.lang.NullPointerException: Cannot invoke \\"stirling.software.SPDF.config.memoryConfig$MemorySettings.getRamThresholdGB()\\" because the return value of \\"stirling.software.SPDF.config.memoryConfig.getMemory()\\" is null\\n\\tat stirling.software.SPDF.utils.memoryUtils.shouldUseFileBasedStorage(memoryUtils.java:17)\\n\\tat
Please add this to the actual settings.yml instead
https://github.com/Stirling-Tools/Stirling-PDF/blob/main/src/main/resources/settings.yml.template
and
https://github.com/Stirling-Tools/Stirling-PDF/blob/main/src/main/java/stirling/software/SPDF/model/ApplicationProperties.java
sure thank you |
@@ -63,7 +60,7 @@ public ResponseEntity<byte[]> extractImages(@ModelAttribute PDFExtractImagesRequ | |||
throws IOException, InterruptedException, ExecutionException { | |||
MultipartFile file = request.getFileInput(); | |||
String format = request.getFormat(); | |||
boolean allowDuplicates = request.isAllowDuplicates(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont remove!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multithreading approach for image extraction was revised to improve thread safety, especially after errors were encountered during testing after adding a new feature(saving PDFs to temp files. Despite these improvements, the output still shows inconsistency, with a variable number of images being extracted (between 60 and 68) from the same test PDF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at code one issue is imageIndex needs to be atomicInteger to be thread safe
also instead of sync block maybe just ConcurrentHashSet ? (ConcurrentHashMap.newKeySet();)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have included atomic integer- image Index. I will include the second one too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! lemme know if it helps at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the addition of atomic locks and using a ConcurrentHashSet for image extraction, the issue of inconsistent image output persists. The number of extracted images continues to vary, ranging from 65 to 68. I am not aware if it is really thread safety issue or a formatting issue
I am seeing the following output for skipped images : 14:13:57.276 [pool-3-thread-2] ERROR s.s.S.c.a.m.ExtractImagesController - Error processing image from page 2
java.io.IOException: ICCBased colorspace array must have a stream as the second element
…newKeySet();) and removed the synchronised block for processed images
|
||
@RestController | ||
@RequestMapping("/api/v1/misc") | ||
@Tag(name = "Misc", description = "Miscellaneous APIs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also been removed
if (processedImages.stream() | ||
.anyMatch(hash -> Arrays.equals(hash, imageHash))) { | ||
try { | ||
if (!allowDuplicates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplicate logic doesnt seem to work anymore looking at the github PR runs
shows that the extract test extracts 5 images instead of 2 (when 3 are dups and should be removed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. I will check, test and update
This still doesn't pass the tests, please either resolve the issue or close PR, remerging is just taking up CPU cycles |
Description
I tried implementing the feature for single Java file - ExtractImagesController
Closes #(1775)
Checklist: