Added feature to save pdfs to ram or files dynamically #1788

HimaGirija99 · 2024-09-02T07:22:28Z

Description

Added Memory Limit Configuration: Integrated memory limit settings into the custom_settings.yml file within the configs directory.
Configured Memory Settings Loader: Set up and loaded memory settings in the memoryConfig file using a YAML parser
Implemented Logic in MemoryUtils: Developed logic within the memoryUtils class to determine whether to use file-based storage or in-memory processing based on the configured memory thresholds and available system resources.
Utilized MemoryUtils in PDF Processing: Applied the memoryUtils logic within a Java class responsible for saving PDFs, ensuring efficient memory usage during file processing.
Enhanced Exception Handling: Added comprehensive exception handling in the image extraction method to manage errors more effectively during PDF processing.

I tried implementing the feature for single Java file - ExtractImagesController

Closes #(1775)

Checklist:

[x ] I have read the Contribution Guidelines
[x ] I have performed a self-review of my own code
[x ] I have commented my code, particularly in hard-to-understand areas
[x ] My changes generate no new warnings

Signed-off-by: GitHub Action <[email protected]>

Update translation files

Frooodle · 2024-09-04T19:25:12Z

src/main/java/stirling/software/SPDF/config/AppConfig.java

@@ -135,4 +139,20 @@ public Predicate<Path> processOnlyFiles() {
            }
        };
    }
+
+    @Bean


no need to make a new settings file requirement here.
This code also Requires the entry
currently the PR fails with

Assertion Failed: Expected application/octet-stream but got application/json. Response content: b'{"timestamp":"2024-09-03T09:09:03.423+00:00","status":500,"error":"Internal Server Error","exception":"java.lang.NullPointerException","trace":"java.lang.NullPointerException: Cannot invoke \\"stirling.software.SPDF.config.memoryConfig$MemorySettings.getRamThresholdGB()\\" because the return value of \\"stirling.software.SPDF.config.memoryConfig.getMemory()\\" is null\\n\\tat stirling.software.SPDF.utils.memoryUtils.shouldUseFileBasedStorage(memoryUtils.java:17)\\n\\tat

Please add this to the actual settings.yml instead
https://github.com/Stirling-Tools/Stirling-PDF/blob/main/src/main/resources/settings.yml.template
and
https://github.com/Stirling-Tools/Stirling-PDF/blob/main/src/main/java/stirling/software/SPDF/model/ApplicationProperties.java

HimaGirija99 · 2024-09-05T01:17:19Z

sure thank you

…into svMem

Frooodle · 2024-09-07T18:29:18Z

src/main/java/stirling/software/SPDF/controller/api/misc/ExtractImagesController.java

@@ -63,7 +60,7 @@ public ResponseEntity<byte[]> extractImages(@ModelAttribute PDFExtractImagesRequ
            throws IOException, InterruptedException, ExecutionException {
        MultipartFile file = request.getFileInput();
        String format = request.getFormat();
-        boolean allowDuplicates = request.isAllowDuplicates();


Please dont remove!

The multithreading approach for image extraction was revised to improve thread safety, especially after errors were encountered during testing after adding a new feature(saving PDFs to temp files. Despite these improvements, the output still shows inconsistency, with a variable number of images being extracted (between 60 and 68) from the same test PDF.

Looking at code one issue is imageIndex needs to be atomicInteger to be thread safe

also instead of sync block maybe just ConcurrentHashSet ? (ConcurrentHashMap.newKeySet();)

I have included atomic integer- image Index. I will include the second one too.

Nice! lemme know if it helps at all

Despite the addition of atomic locks and using a ConcurrentHashSet for image extraction, the issue of inconsistent image output persists. The number of extracted images continues to vary, ranging from 65 to 68. I am not aware if it is really thread safety issue or a formatting issue
I am seeing the following output for skipped images : 14:13:57.276 [pool-3-thread-2] ERROR s.s.S.c.a.m.ExtractImagesController - Error processing image from page 2
java.io.IOException: ICCBased colorspace array must have a stream as the second element

…into svMem

…newKeySet();) and removed the synchronised block for processed images

…into svMem

Frooodle · 2024-09-08T17:21:46Z

src/main/java/stirling/software/SPDF/controller/api/misc/ExtractImagesController.java


 @RestController
 @RequestMapping("/api/v1/misc")
-@Tag(name = "Misc", description = "Miscellaneous APIs")


This also been removed

…into svMem

Frooodle · 2024-09-12T11:49:54Z

src/main/java/stirling/software/SPDF/controller/api/misc/ExtractImagesController.java

-                        if (processedImages.stream()
-                                .anyMatch(hash -> Arrays.equals(hash, imageHash))) {
+                try {
+                    if (!allowDuplicates) {


The duplicate logic doesnt seem to work anymore looking at the github PR runs
shows that the extract test extracts 5 images instead of 2 (when 3 are dups and should be removed)

You can test with https://github.com/Stirling-Tools/Stirling-PDF/blob/main/cucumber/exampleFiles/images.pdf

sure. I will check, test and update

…into svMem

Frooodle · 2024-09-23T10:49:12Z

This still doesn't pass the tests, please either resolve the issue or close PR, remerging is just taking up CPU cycles

HimaGirija99 added 9 commits August 7, 2024 14:40

Added multithreaded feature for image extraction

8a16c5d

added memory settings

5b1e828

error corrected in path

c60a906

error corrected in path

6c7166a

tried to put memory related config but haven't changed anything

884c3fb

created memory utils

2fc7fab

rearranged

ad3cf29

saved pdf to files

6850913

removed unused variables

e4e8ae9

HimaGirija99 requested a review from Frooodle as a code owner September 2, 2024 07:22

github-actions bot added Java Pull requests that update Java code API API-related issues or pull requests labels Sep 2, 2024

HimaGirija99 and others added 7 commits September 3, 2024 10:52

Merge branch 'main' into svMem

7fa0460

Update translation files

1c75470

Signed-off-by: GitHub Action <[email protected]>

last commit

3a4587c

last commit

63b3aae

Describe your changes here

1cb61aa

commit

2761260

Merge pull request #1 from HimaGirija99/update_translation_files

5355a4a

Update translation files

github-actions bot added the Translation label Sep 3, 2024

This comment was marked as outdated.

Sign in to view

Frooodle requested changes Sep 4, 2024

View reviewed changes

HimaGirija99 and others added 7 commits September 5, 2024 09:21

Merge branch 'main' into svMem

cd956f4

replaced yaml mapper

5267b5f

settings.yml

0d2e5be

commit

380c091

commit l

0a84374

removed unused variables

b7f0286

removed unused variables

a5ac4f8

HimaGirija99 and others added 4 commits September 7, 2024 23:43

commit 2 slight modification

a89ccbc

commit 3 slight modification

11b39ba

Merge branch 'svMem' of https://github.com/HimaGirija99/Stirling-PDF …

0db294f

…into svMem

Merge branch 'main' into svMem

f965057

Frooodle requested changes Sep 7, 2024

View reviewed changes

HimaGirija99 and others added 11 commits September 8, 2024 00:25

didn't remove kept as it is

908ed9b

---resumed

44941ff

Merge branch 'svMem' of https://github.com/HimaGirija99/Stirling-PDF …

3d7325f

…into svMem

Merge branch 'main' into svMem

06c6634

modified the code by including AtomicInteger-imageIndex as parameter

eac1512

Merge branch 'svMem' of https://github.com/HimaGirija99/Stirling-PDF …

9a34743

…into svMem

Merge branch 'main' into svMem

9751c52

changed concurrent hashMap to ConcurrentHashSet ? (ConcurrentHashMap.…

8e27e18

…newKeySet();) and removed the synchronised block for processed images

Merge branch 'svMem' of https://github.com/HimaGirija99/Stirling-PDF …

8e9e12c

…into svMem

Merge branch 'main' into svMem

39786e2

Merge branch 'main' into svMem

caa527c

Frooodle requested changes Sep 12, 2024

View reviewed changes

HimaGirija99 added 2 commits September 12, 2024 16:51

added @tag

5a78e15

Merge branch 'svMem' of https://github.com/HimaGirija99/Stirling-PDF …

616b98c

…into svMem

Frooodle requested changes Sep 12, 2024

View reviewed changes

HimaGirija99 and others added 7 commits September 12, 2024 17:36

Merge branch 'main' into svMem

fb4a92b

Merge branch 'main' into svMem

e2f48f0

added a line of code processedImages.add(imageHash);

d754080

Merge branch 'svMem' of https://github.com/HimaGirija99/Stirling-PDF …

9404053

…into svMem

Merge branch 'main' into svMem

75a9416

Merge branch 'main' into svMem

7876a72

Merge branch 'main' into svMem

b669765

HimaGirija99 closed this Sep 23, 2024

HimaGirija99 deleted the svMem branch September 23, 2024 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added feature to save pdfs to ram or files dynamically #1788

Added feature to save pdfs to ram or files dynamically #1788

HimaGirija99 commented Sep 2, 2024

This comment was marked as outdated.

Frooodle Sep 4, 2024

HimaGirija99 commented Sep 5, 2024

Frooodle Sep 7, 2024

HimaGirija99 Sep 7, 2024

HimaGirija99 Sep 7, 2024

Frooodle Sep 7, 2024 •

edited

Loading

HimaGirija99 Sep 8, 2024

Frooodle Sep 8, 2024

HimaGirija99 Sep 8, 2024

Frooodle Sep 8, 2024

Frooodle Sep 12, 2024

Frooodle Sep 12, 2024

HimaGirija99 Sep 14, 2024

Frooodle commented Sep 23, 2024

@@ @@ -135,4 +139,20 @@ public Predicate<Path> processOnlyFiles() { @@
                           }
                       };
                   }
+                  @Bean

Added feature to save pdfs to ram or files dynamically #1788

Added feature to save pdfs to ram or files dynamically #1788

Conversation

HimaGirija99 commented Sep 2, 2024

Description

Checklist:

This comment was marked as outdated.

Choose a reason for hiding this comment

HimaGirija99 commented Sep 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Frooodle Sep 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Frooodle commented Sep 23, 2024

Frooodle Sep 7, 2024 •

edited

Loading