Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gomplate produce output file when parsing fails #578

Open
valepert opened this issue Aug 5, 2019 · 17 comments
Open

Gomplate produce output file when parsing fails #578

valepert opened this issue Aug 5, 2019 · 17 comments
Assignees
Milestone

Comments

@valepert
Copy link

valepert commented Aug 5, 2019

Subject of the issue

gomplate using an input template with reference to an environment variable that doesn't exist, still writes an output file.

Your environment

  • gomplate version 3.5.0

Steps to reproduce

$ gomplate -i 'fail: {{ .Env.BOGUS }}' -o fail.yml
template: <arg>:1:13: executing "<arg>" at <.Env.BOGUS>: map has no entry for key "BOGUS"

Expected behaviour

$ stat -f "%N" fail.yml
stat: fail.yml: stat: No such file or directory
$ test -f fail.yml || echo $?
1

Actual behaviour

$ stat -f "%N" fail.yml
fail.yml
$ test -f fail.yml && echo $?
0
@hairyhenderson
Copy link
Owner

Hi @valepert, thanks for logging this, and sorry for the slow response - I've been on vacation 😉

This is the expected behaviour. Because the template is parsed and interpreted simultaneously, gomplate creates the output file before it starts to render the template.

Also, this is a useful behaviour when troubleshooting template errors, especially with larger templates, since the content that successfully rendered will all be present in the output.

One way to accomplish what you're looking for is to delete the file on failure:

$ gomplate -i 'fail: {{ .Env.BOGUS }}' -o fail.yml || rm -f fail.yml

This way, fail.yml will never be present when gomplate fails for any reason.

A slightly different approach, without requiring a shell, would be to try to resolve environment variables up front, and then use the GOMPLATE_SUPPRESS_EMPTY environment variable to cause the output file to not be written on failures:

$ GOMPLATE_SUPPRESS_EMPTY=true gomplate -i '{{ $bogus := .Env.BOGUS -}}
fail: {{ $bogus }}' -o fail.yml
template: <arg>:1:17: executing "<arg>" at <.Env.BOGUS>: map has no entry for key "BOGUS"
$ ls fail.yml
ls: cannot access 'fail.yml': No such file or directory

This works because the template fails before any output has been rendered.

I'm going to close this issue now, since this is working as designed. Feel free to re-open though, if you feel strongly about this, or if the suggested workarounds don't work 😉

@ProvoK
Copy link

ProvoK commented Aug 14, 2019

Hi @hairyhenderson
I think your approach is solid and maybe this conversation could be used to improve the actual documentation.

I still think the issue persist because the output (file or stdout) is a stream. This is good from a Unix philosophy, don't get me wrong, but this make quite difficult to use gomplate with pipes.

For instance:

# file: fail.tpl
Hello {{ .Env.USER }}
Today is {{ .Env.BOGUS }}

the partially rendered output will be passed to the pipe, regardless it fails or not

$ gomplate -f fail.tpl 2>/dev/null| xargs echo

Hello vic Today is

In one side project of mine, I'm using gomplate to render a config file for my application because it may contains secrets (access keys, apikeys etc).
In this case, I'd like to don't write sensible information into a file (so the "--" approach does not work for me) and my application consumes the config from stdin (it has also a flag for the config path, of course, is security is not a concern).

The workaround I'm using is the following

$ rendered_template=$(gomplate -f template.tpl) && echo "$rendered_template" | my_binary

It works, and it's fine.
It's kinda verbose though, and difficult to read.
I'm not sure about the flag name, but something like the following may be more clean and easy to reason about

$ gomplate -f template.tpl -stdin -- my_binary

where stdin stands for "pass as input to the command I'll execute on success"

It's not a blocking issue, and there are plenty of workarounds.
But I still think that is something quite important and will come often out in the future.

I could try to do a PR to extend the documentation, if you like

EDIT: After some thinking, I realized that the fact that the gomplate "stream" could be incomplete may be not very adherent to Unix philosophy

@hairyhenderson
Copy link
Owner

This is good from a Unix philosophy, don't get me wrong, but this make quite difficult to use gomplate with pipes.
[...]
EDIT: After some thinking, I realized that the fact that the gomplate "stream" could be incomplete may be not very adherent to Unix philosophy

Though I'm generally a fan of the Unix philosophy, I've never made any special effort to make gomplate with it... So any similarities are purely coincidental and unintentional 😉

What may be possible is a feature like the GOMPLATE_SUPPRESS_EMPTY one, where you could set an environment variable to change its behaviour to buffer rendering and not write to the output stream when errors are encountered. The challenge is what to do when outputting multiple files - I suppose it would have to output file-by-file...

Or maybe the behaviour could be changed in the next major version of gomplate (v4) 🤔...

I'm going to re-open this since I'm now thinking this may be a good change in default behaviour, though until the next major release it would be opt-in. The tradeoff (slower time to any data being written) is probably worth it!

@hairyhenderson
Copy link
Owner

I could try to do a PR to extend the documentation, if you like

@ProvoK that would be great! perhaps it would be good to also mention that the behaviour will change in future versions.

@hairyhenderson
Copy link
Owner

I'm not sure about the flag name, but something like the following may be more clean and easy to reason about

$ gomplate -f template.tpl -stdin -- my_binary

where stdin stands for "pass as input to the command I'll execute on success"

I'd thought about that when I wrote that feature, and opted to ignore it at that point - but I'm definitely open to it! Can you open a separate issue for that? 🙂

@ProvoK
Copy link

ProvoK commented Aug 15, 2019

I could try to do a PR to extend the documentation, if you like

@ProvoK that would be great! perhaps it would be good to also mention that the behaviour will change in future versions.

I'm gonna do it 😄 Probably next week

@danielkza
Copy link

I was surprised by this behaviour. I'm using Gomplate to generate config files for services. If there is an error, Gomplate will clobber the (previously working) config file with an invalid one.

I expect instead - at least with a config option - that the content can be written to a temporary file and atomically renamed to the existing file to avoid such surprises.

@hairyhenderson
Copy link
Owner

Sorry you were surprised @danielkza - I was expecting a documentation PR to help make the behaviour more clear, but as you see that hasn't happened 😞

Based on memory, and the above conversation, my plan is to add an optional behaviour such that output is buffered, and nothing is written if an error is encountered. Multi-file output is still an unsolved problem though: what if the template is executed on 10 files and the first 9 succeed? Is it OK to have 9/10 files updated, but not the last?

The other hitch is memory use - buffering output will use more memory, which probably is a non-issue on small templates, but for very large templates it will make a significant difference.

Personally the way I've avoided the issue (and the reason this issue is still open after all these years) is to use GOMPLATE_SUPPRESS_EMPTY and always validate my input data up-front in the template (see the test namespace).

@hairyhenderson
Copy link
Owner

Oh, I meant to address this as well:

the content can be written to a temporary file and atomically renamed to the existing file

Using temp files is an option, but not one I'm particularly excited about. I've seen that approach result in lots of extra files being written, filling up disk in unexpected ways. It would solve the multi-file output problem though.

@AndrewSav
Copy link
Contributor

Since we get a bot staling (and presumably then closing) issues that do not have activity, I would like to let you know that I'm still pretty interested in this one. I can re-post this every 60 days, if that's what required to keep this open.

@hairyhenderson
Copy link
Owner

Thanks for that @AndrewSav - this is specifically why I added the stale workflow - it's very hard to know which issues/PRs people are still interested in 😉

There's some feedback I'm still looking for, though.

From last November:

Based on memory, and the above conversation, my plan is to add an optional behaviour such that output is buffered, and nothing is written if an error is encountered.

IMO this is still reasonable - however:

Multi-file output is still an unsolved problem though: what if the template is executed on 10 files and the first 9 succeed? Is it OK to have 9/10 files updated, but not the last?

I'd like to hear feedback on this. Personally I'm OK with this but I don't know if that's what other people are thinking.

The other hitch is memory use - buffering output will use more memory, which probably is a non-issue on small templates, but for very large templates it will make a significant difference.

This is still an important consideration. Especially given that it's straightforward to validate inputs up-front, I wouldn't want to increase memory use for the default case.

@github-actions
Copy link

This issue is stale because it has been open for 60 days with no activity. Remove stale label or comment or this will be automatically closed in a few days.

@github-actions github-actions bot added the Stale label Jun 12, 2023
@AndrewSav
Copy link
Contributor

Using temp files is an option, but not one I'm particularly excited about. I've seen that approach result in lots of extra files being written, filling up disk in unexpected ways.

Just look at the Windows temp folder, which is, unlike in Linux is not cleaned up on reboot. Some programs, editors use a behaviour like this:

  • Write to a temp file
  • On error delete the temp file
  • On success remove the target file if any, and rename the temp file. In case of stdout, etc you might need to copy / delete instead of renaming

Will this work?

@hairyhenderson
Copy link
Owner

@AndrewSav that's an interesting approach - some programs call this a "scratch file".

This doesn't add anything to the original plan of buffering output in memory though, which I prefer in general, since it's atomic with respect to the filesystem. It would avoid unbounded memory growth - perhaps a hybrid approach where a template is rendered in-memory until it reaches a certain size (maybe 1MB?), at which point the file is flushed to a scratch file on disk and writes continue there?

I'm interested in hearing any ideas/opinions you have about the multi-file edge-case, too.

One option for those could involve simply rendering all outputs to memory, and flushing to disk only when all are successful...

@AndrewSav
Copy link
Contributor

AndrewSav commented Jun 15, 2023

This doesn't add anything to the original plan of buffering output in memory though

It does not? I thought that with this approach buffering is no longer required? You had an argument against the buffering that it won't be suitable for big files, or for multiple files, it seems that this can solve it.

which I prefer in general, since it's atomic with respect to the filesystem.

I guess one has to be practical here. It is quite possible to have files that do not fit in memory, and also some people will consider using up all available memory not reasonable for a tool like that.

It would avoid unbounded memory growth

That's the idea, if you mean what was suggested above. If you mean buffering, I do not see how

perhaps a hybrid approach where a template is rendered in-memory until it reaches a certain size (maybe 1MB?), at which point the file is flushed to a scratch file on disk and writes continue there?

What would the advantage be? You lose the atomicity anyway, what do you gain by buffering? In general 1MB is small enough to be an implementation detail, if from the implementation perspective this makes more sense (e.g performance is better, code is better structured and easier to understand and no side effects) - go for it.

I'm interested in hearing any ideas/opinions you have about the multi-file edge-case, too.

If the question is: should we delete already finished complete files if a error is discovered during a subsequent file, I do not really have an opinion, perhaps make it tuneable with a command line switch, but since streaming is out of the picture here either way I think people will be surprised less as long as there are no partial output files written. If the question is different, please let me know what it is.

One option for those could involve simply rendering all outputs to memory, and flushing to disk only when all are successful...

I'm apprehensive of the memory buffering approach without restriction on memory use, as I alluded to above. While it is true, that for most use cases when files are small it does not matter, I would be uncomfortable with inconsistency here between small and large files because it exposes a leaky abstraction. In languages like PowerShell there is a clear difference between (in pseudo-code):

$a = Get-Content
$b = DoSomething $a
$b | Set-Content

and

Get-Content | DoSomething | Set-Content

One can clearly see that in the former case you read everything into memory, while in the latter you are streaming it as it goes, and the trade-offs are clear to you.

We are discussing going into a territory where this line is blurred.
Note: with streaming PowerShell will leave behind partial files, this is not the point of this example

@github-actions github-actions bot removed the Stale label Jun 19, 2023
@hairyhenderson hairyhenderson added this to the v4.0.0 milestone Jun 25, 2023
@tavlima
Copy link

tavlima commented Nov 30, 2023

I'm interested in this feature as well, for the same reasons others mentioned.

Buffering into memory is pretty common, even when piping. It all depends on the nature of the tool: jq will buffer in order to parse, sort/uniq/wc must do the same (for obvious reasons), etc. gomplate doing the same is fine, but I'm not so sure it should be the default behavior (an opt-in flag would be just as good, at least for me).

Regarding the multiple input/output issue, I think the result should be consistent with multiple independent invocations of gomplate with a single input/output, as long as final exit status correctly reflects any intermediate failures.

@hairyhenderson hairyhenderson removed this from the v4.0.0 milestone Jun 16, 2024
@hairyhenderson hairyhenderson added this to the v4.1.0 milestone Jun 16, 2024
@hairyhenderson
Copy link
Owner

I've (obviously) not had much time lately to work on this feature, but I want to do a quick brain-dump of what I think the right approach is - mainly for future-me, or anyone else who wants to implement it (hint, hint!)...

  1. this should not change existing behaviour (i.e. no buffering) - it should be controlled by a flag like --buffer-output
  2. buffering is done purely in memory (no scratch files, yet)
  3. buffering is per-file - i.e. if the first template succeeds and the second one fails, the first output will still be written

Given that this will not break behaviour, I'm going to target this to v4.1 instead of v4.0. That'll let me get 4.0 released sooner 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants