Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Omit empty command strings when constructing shell scripts #14700

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jmarshall
Copy link
Contributor

No description provided.

@jmarshall
Copy link
Contributor Author

jmarshall commented Sep 24, 2024

We recently encountered jobs that failed due to syntax errors in the shell script generated by Hail, stemming from code such as

job.command('touch before')
job.command('\n'.join(f'echo {shlex.quote(msg)}' for msg in messages))
job.command('touch after')

Occasionally messages is an empty list, so this evaluates to job.command('') and the eventual shell script submitted by Hail contains

…
{
touch before
}
{

}
{
touch after
}
…

Shell compound commands like { … } must contain at least one command, so this is a syntax error.

Empty commands could be rewritten to generate e.g. : such as

…
{
:
}
…

but it seems easier and probably less surprising to just omit them.

@jmarshall
Copy link
Contributor Author

(We've added if messages: … to the code in question to prevent this, but it would be good to fix the issue from the Hail end as well.)

@cjllanwarne
Copy link
Collaborator

@jmarshall another way to do this might be to ignore/do nothing during the initial call of job.command('') when the argument is empty. Assuming we never want to do anything for empty/None commands we might as well eliminate them at source. That way you should only have to handle the situation in one place, and not have to handle the invalid data structure in multiple downstream places.

I do like the idea of fixing this in the hail library, but you could also consider altering your upstream code so that it never makes an invalid "job.command('')" call in the first place. Eg by doing something like

job.command('touch before')
for msg in messages:
  job.command(f'echo {shlex.quote(msg)})
job.command('touch after')

(assuming it doesn't matter to you whether the messages are handled in the same section of the resulting command or not)

What do you think?

@jmarshall
Copy link
Contributor Author

As I said, we have already worked around the particular instance of this. But I've raised it here so that we or other Hail users won't need to do so in future.

another way to do this might be to ignore/do nothing during the initial call of job.command('') when the argument is empty.

🤷 I'm not sure that the presence of empty items in BashJob._command is intrinsically invalid, so whether to ignore such calls (by having BashJob.command() do nothing when its argument contains no non-whitespace characters, and verifying that calling command() is the only way to get entries into this list) or to prevent the actual invalid syntax that results later (either by omitting or emitting : or similar) is up to the maintainers.

@cjllanwarne
Copy link
Collaborator

@jmarshall it sounded when we caught up in person like you agreed that skipping empty command strings at source (and emitting a warning) during the original job.command call would be the way forward here... were you planning to update this PR with that change?

Using job.command('') (or more likely something more complex
that sometimes evaluates to an empty string) led to shell snippets
containing `... { } ...`, which is a shell syntax error. Prevent
this by not adding such strings to self._command and warning.
@jmarshall
Copy link
Contributor Author

I've updated the PR accordingly — feel free to wordsmith the warning message.

@ehigham
Copy link
Member

ehigham commented Oct 7, 2024

@jmarshall, thanks doing this - would you mind adding a simple unit test to lock down the behaviour?

@jmarshall
Copy link
Contributor Author

I am unfamiliar with Hail's test infrastructure so it would be more time efficient for the maintainers to add a test themselves.

@ehigham
Copy link
Member

ehigham commented Oct 7, 2024

@jmarshall - PR populationgenomics#349 for your consideration

@jmarshall
Copy link
Contributor Author

Thanks, added with one tweak. Sadly I don't know how to convince your code analyser that using randint to make test cases in test code is not a security issue…

Feel free to push to PR branches directly, or just to add things while merging. You folks are the Hail maintainers after all!

@ehigham
Copy link
Member

ehigham commented Oct 9, 2024

Thanks, added with one tweak. Sadly I don't know how to convince your code analyser that using randint to make test cases in test code is not a security issue…

Me neither 🤷

Feel free to push to PR branches directly, or just to add things while merging.

I don't have write access to the populationgenomics fork, hence the PR :)

@jmarshall
Copy link
Contributor Author

TIL “Allow edits from maintainers” applies only to PRs from personal forks, not organisation forks. Annoying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants