Skip to content
This repository has been archived by the owner on Jan 21, 2020. It is now read-only.

Fine-grained skipUnchanged: consider <#include and <#import #11

Open
vlsi opened this issue Dec 15, 2014 · 13 comments
Open

Fine-grained skipUnchanged: consider <#include and <#import #11

vlsi opened this issue Dec 15, 2014 · 13 comments

Comments

@vlsi
Copy link

vlsi commented Dec 15, 2014

  1. fmpp can check if the produced contents is identical to the previous file. In that case fmpp can skip rewriting the destination file, thus saving time for the consumers. The next build steps could see that "no modification was performed", thus recompilation/repackaging is not required.
    This does not require <#include/<#import analysis and it will speed up builds that include fmpp steps.

  2. fmpp can compute the set of imported files and their timestamps when processing the template. Thus at the subsequent build, it could check the actual timestamps and reprocess the file if imported file is updated.

Of course both approaches would fail in face of templates that create/modify/delete files when they execute, however I believe it is sad fmpp can't handle the typical cases gracefully.

@ddekany
Copy link
Contributor

ddekany commented Dec 16, 2014

I don't understand point 1. Could you rephrase it?

Point 2 is in the TODO.txt, so, yes, it could be smarter.

@vlsi
Copy link
Author

vlsi commented Dec 16, 2014

1 is as follows:
Problem. Suppose you have an input file. It might include lots of files and you basically have no idea if any of the required files is modified.

Solution.

  1. You perform template rendering as usual (in memory or into some temporary file) and when it is done you compare the resulting file with the result of previous fmpp execution
  2. If the result is identical (i.e. the new result is bytewise equal to the existing file), then you just throw away the computation.
  3. If the newly computed file is different for some reason, you overwrite the destination

Does that make sense?

@ddekany
Copy link
Contributor

ddekany commented Dec 17, 2014

So, the point of doing it like that would be that thus the modification time of earlier identical output file remains the same, and that we spare storage writing on the expense of a potentially needless storage reading (and buffering), right?

@vlsi
Copy link
Author

vlsi commented Dec 17, 2014

The main point is to avoid subsequent build steps (the steps that consume the result of fmpp processing).

For instance, in Calcite we use fmpp to generate javacc parser (https://github.com/apache/incubator-calcite/blob/master/core/src/main/codegen/config.fmpp, https://github.com/apache/incubator-calcite/blob/master/core/src/main/codegen/templates/Parser.jj) and since fmpp overwrites the destination file, it goes through javacc (it is slow), javac and repackages into new jar.

So the plan is to spend some (little) time reading the existing file, and win lots of time in javacc, javac, jar and similar steps.

@ddekany
Copy link
Contributor

ddekany commented Dec 17, 2014

I see. I have added this to the TODO.txt, so it won't be forgotten. I don't know when it will be done though...

@vlsi
Copy link
Author

vlsi commented Dec 17, 2014

I've filed a similar pool request for maven-remote-resources-plugin recently: apache/maven-plugins#40

Do you think similar can be committed in fmpp?

Should commons-io/DeferredFileOutputStream be added to fmpp or just cache the full file in memory?

@ddekany
Copy link
Contributor

ddekany commented Dec 17, 2014

Ideally, it collects into a memory buffer, and after a few MB threshold it flushed that into a temporary file. And at the end it moves the temporary file (if there's any) and appends the buffer to the end of it. And the whole feature should be opt-in (I suppose).

Adding anything to FMPP requires Contributor License Agreement (sent via traditional mail), otherwise surely it's possible to contribute with this. (Some test coverage and such is expected of course.)

@vlsi
Copy link
Author

vlsi commented Dec 17, 2014

Do think temporary file is important?

  1. It is harder to implement since it requires to pick a temporary file name, make sure it won't clash with existing files/directories
  2. It is unsafe to use File.createTempFile since the temp.dir is not usually set by end-users, so the whole thing can fail with "out of space" error.
  3. I think it should be a rare case when multi-megabyte files are processed.

Generally, it should be fine to keep files in-memory up to 5-10 megabytes (it should be configurable), and if the buffer overflows, then it is written to destination file. That is basically "fallback to old mode if the file is too big".

If someone requires optimization for big files, an in-memory buffer size can be expanded.

@ddekany
Copy link
Contributor

ddekany commented Dec 17, 2014

Sure, doing it without temporary files is acceptable. With the temporary files it keeps the expected semantic in all circumstances, but as far as it's just about performance, no big deal. (BTW, certainly File.createTempFile shouldn't be used, as it often uses another partition than the final destination file, and then moving the file can be expensive. Finding a reasonably safe temporary destination file name isn't a real problem IMO.)

@vlsi
Copy link
Author

vlsi commented Sep 15, 2018

Just a note: org.apache.drill.tools:drill-fmpp-maven-plugin that generates files to a temporary folder and copies the result if it is modified. (see https://github.com/apache/drill/tree/master/tools/fmpp)
It is sad that FMPP does not have that by default.

@ddekany
Copy link
Contributor

ddekany commented Sep 15, 2018

I haven't check what exactly are they doing and how... but as it builds on FMPP, is there any technical reason that they don't contribute the feature to FMPP and instead solve it locally?

@vlsi
Copy link
Author

vlsi commented Sep 15, 2018

Well, the whole thing is 150 lines (see from line 100): https://github.com/apache/drill/blob/master/tools/fmpp/src/main/java/org/apache/drill/fmpp/mojo/FMPPMojo.java
I guess it was easier to just solve at maven-plugin side than study FMPP code.

Note: there's no "canonical" FMPP-maven-plugin, so Drill project might want to create their own plugin.

That is yet another issue.

@ddekany
Copy link
Contributor

ddekany commented Sep 16, 2018

Most of FMPP was developed (and used by me) before Maven has become widely known. FMPP was maintained, but wasn't in my focus since then, so, yeah, there are many missing things.

BTW, it seems that FreeMarker itself will have an official Maven plugin (not related to FMPP), as somebody decided to contribute what they have used internally for code generation. How feature rich it can grow to though, will depend on the further contributions. Maybe I will not be able to resist and apply some ideas and lessons learn at FMPP...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants