Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files in "tmp" should not be too large resp. removed #122

Closed
dr0i opened this issue Aug 22, 2023 · 20 comments
Closed

Files in "tmp" should not be too large resp. removed #122

dr0i opened this issue Aug 22, 2023 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@dr0i
Copy link
Member

dr0i commented Aug 22, 2023

There is one really big file in tmp:

~/git/metafacture-playground/tmp$ l -Shal |head
total 39G
-rw-r--r--  1 sol sol  39G Aug 22 09:55 metafix1405254507378886918.txt

We should somehow restrict log files to be too big.

@katauber
Copy link
Member

This is no log file. It's result file of a workflow. But I don't understand why this isn't deleted after returning the result to the client. I will have a look what is going on there.

@katauber
Copy link
Member

functional review: @dr0i
code review: @fsteeg

@katauber katauber moved this from Backlog to Ready in Metafacture Sep 21, 2023
@katauber katauber added the bug Something isn't working label Sep 21, 2023
@katauber
Copy link
Member

The files are not deleted at the moment.

@dr0i and I discussed offline, that we first need a mechanism to delete these files after a successful processed workflow and a configurable limit how big a file may be.

@Phu2
Copy link
Contributor

Phu2 commented Sep 28, 2023

The size of these result files depends on the size of the input, right?
I think we need to set a limit on the input files. too.

@katauber katauber moved this from Ready to Selected in Metafacture Nov 29, 2023
@katauber katauber moved this from Selected to Working in Metafacture Nov 29, 2023
@blackwinter
Copy link
Member

Regarding the input file not being deleted: Flux.main() opens a FileInputStream via ResourceUtil.getStream() that isn't closed afterwards. Could this be it?

Possible fix:

diff --git metafacture-runner/src/main/java/org/metafacture/runner/Flux.java metafacture-runner/src/main/java/org/metafacture/runner/Flux.java
index 9d889b45..5acdab6f 100644
--- metafacture-runner/src/main/java/org/metafacture/runner/Flux.java
+++ metafacture-runner/src/main/java/org/metafacture/runner/Flux.java
@@ -25,6 +25,7 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.io.InputStream;
 import java.util.HashMap;
 import java.util.Map;
 import java.util.regex.Matcher;
@@ -82,7 +83,9 @@ public static void main(final String[] args) throws IOException, RecognitionExce
             }
 
             // run parser and builder
-            FluxCompiler.compile(ResourceUtil.getStream(fluxFile), vars).start();
+            try (InputStream inputStream = ResourceUtil.getStream(fluxFile)) {
+                FluxCompiler.compile(inputStream, vars).start();
+            }
         }
     }

@fsteeg
Copy link
Member

fsteeg commented Dec 11, 2023

Hm, but that stream is for the Flux file, not the input file, right?

Still looks like a good change fixing a resource leak.

@blackwinter
Copy link
Member

Hm, but that stream is for the Flux file, not the input file, right?

D'oh :) Sorry, you're absolutely right!

@blackwinter
Copy link
Member

blackwinter commented Dec 11, 2023

Then it's the FileOpener?

diff --git metafacture-io/src/main/java/org/metafacture/io/FileOpener.java metafacture-io/src/main/java/org/metafacture/io/FileOpener.java
index b88b9aab..a2a8eeac 100644
--- metafacture-io/src/main/java/org/metafacture/io/FileOpener.java
+++ metafacture-io/src/main/java/org/metafacture/io/FileOpener.java
@@ -154,8 +154,8 @@ public Reader open(final InputStream stream) throws IOException {
 
     @Override
     public void process(final String file) {
-        try {
-            getReceiver().process(open(file));
+        try (Reader reader = open(file)) {
+            getReceiver().process(reader);
         }
         catch (final IOException e) {
             throw new MetafactureException(e);

(Breaks some tests, though, since they try to read from the reader a second time in order to verify its contents.)

@fsteeg
Copy link
Member

fsteeg commented Dec 11, 2023

Oh, yes, that looks good!

(Seems the FileInputStream is only closed if something goes wrong.)

@katauber
Copy link
Member

I move this ticket back to backlog and it becomes ready, when the problem with the FileInputStream is fixed in Fix

@katauber katauber moved this from Working to Backlog in Metafacture Dec 13, 2023
@blackwinter
Copy link
Member

blackwinter commented Dec 13, 2023

But can you confirm that this is indeed the issue here? We've identified two unclosed resources, one of which still allows deleting the file (FileInputStream on Flux file) and the other one - apparently - doesn't (Reader on input file).

@katauber
Copy link
Member

I think I can confirm it (or not), when these unclosed resources are closed. I think the probability is very high that one of these two is the reason that I can't delete the input file, isn't it?

@blackwinter
Copy link
Member

Well, let me put it this way: If you could confirm beforehand that properly closing the Reader on the input file would indeed allow it to be deleted, then we could justify investing the effort into fixing the test cases for this change.

So would it be possible for you to apply the fix locally and observe its effect in the Playground?

@katauber
Copy link
Member

I'll try 👍

@katauber
Copy link
Member

katauber commented Dec 18, 2023

It works with the fix in Fix :)

Image

@blackwinter
Copy link
Member

Great, thank you for confirming! Then we'll have to see how we can fix it (in -core, not -fix) while still preserving testability.

@blackwinter
Copy link
Member

I've opened metafacture/metafacture-core#514 to address the unclosed resources.

@dr0i dr0i moved this from Backlog to Review in Metafacture Jan 5, 2024
@katauber katauber moved this from Review to Backlog in Metafacture Jan 10, 2024
@blackwinter
Copy link
Member

Resources are now closed in metafacture-core; should be released with version 6.0 (see metafacture/metafacture-core#515).

@dr0i dr0i moved this from Backlog to Ready in Metafacture Feb 5, 2024
@dr0i dr0i moved this from Ready to Selected in Metafacture Feb 5, 2024
@katauber katauber moved this from Selected to Ready in Metafacture Feb 5, 2024
@katauber
Copy link
Member

katauber commented Feb 5, 2024

I would like to split this issue in two:

  1. Remove every tmp file after a succesful answer to the client Remove tmp files #158
  2. Restrict the size of tmp files in general Restrict file size of tmp files #159

@dr0i
Copy link
Member Author

dr0i commented Mar 12, 2024

Resources are now closed in metafacture-core; should be released with version 6.0 (see metafacture/metafacture-core#515).

metafacture-core 6.0.0 is released as is metafacture-playground 1.0.0 (which is based on the former). Also the latter is deployed in it's newest version.

Subissues are opend, so this issue can be closed..

@dr0i dr0i closed this as completed Mar 12, 2024
@github-project-automation github-project-automation bot moved this from Ready to Done in Metafacture Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Status: Backlog
Development

No branches or pull requests

5 participants