Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Prevent scan hanging if command exceeds time limit #504

Open
kdelee opened this issue Dec 1, 2017 · 4 comments
Open

Prevent scan hanging if command exceeds time limit #504

kdelee opened this issue Dec 1, 2017 · 4 comments

Comments

@kdelee
Copy link

kdelee commented Dec 1, 2017

Specify type:

  • Enhancement

Priority:

  • High

Description:

While individual instances of this bug have been encountered and resolved, for example the pagination of systemctl output, I think this has the potential of coming up over and over again, especially on alternate OSes and login shells that we are not targeting in our test scenarios.

While our main concern is RHEL-flavored linux systems, we will invariably run into oddballs with strange shells and unconventional setups, and we need to be resilient to these occurrences and not allow encountering these systems to prevent us from gleaning information from the systems we can properly access.

A easy to think of example is that a command that is not available on a system is executed, but instead of just giving us a non-zero exit code, the shell prompts us and says, "A package is available that provides that command, would you like to install it [N/y]?"


Bug Report

Version of rho:

[ 0.0.28, 0.0.29, 0.0.30, 0.0.31 ]

Expected behavior:

I expect all tasks will time out after some period of time and not hang indefinitely.

Actual behavior:

It is entirely possible for a task to hang indefinitely and we have been seeing many permutation of this.

Steps to reproduce:

Since we have seen this arise in various scenarios, and I desire in fact to prevent future situations, I recommend reproducing this by creating a task that will intentionally take a very long time on your test machine. For example,

for rhel 5/6

- name: this will take forever
  command: "tail -f /var/log/messages"

for rhel 7/recent fedora

- name: this will take forever
  command: "journalctl -f"

Possible solution:

As I was researching the problem of the scan hanging if a host is lost in the middle of a long running task, I came accross the async/poll feature of ansible. See this issue comment.

@kdelee
Copy link
Author

kdelee commented Dec 1, 2017

@mdvickst this is one of the issues I'm filing in response our recent conversation about possible causes of a scan hanging

@mdvickst
Copy link

mdvickst commented Dec 2, 2017

I actually witnessed this happening on a troubleshooting call this afternoon. It was the virt.num_guests fact that was hanging on centos bare metal servers which the customer said were hosting Docker containers.

It appears this is a pretty common issue as described here. I'm sure there are other examples but this one happened just today.

@noahl
Copy link

noahl commented Dec 13, 2017

This issue is quite tricky to solve. It would be nice to solve it at the ssh layer, but as far as I can tell, ssh doesn't support a per-command timeout (based on https://linux.die.net/man/5/ssh_config).

Next option would be to have Ansible provide the timeout, but Ansible doesn't support a timeout in the raw module, which is what we use (based on http://docs.ansible.com/ansible/latest/raw_module.html).

We can't do it in rho, because rho doesn't have enough hooks into Ansible to cancel a single command and move on to the next one without cancelling the whole playbook.

One option which might conceivably work but would be setting Ansible's ssh_executable (http://docs.ansible.com/ansible/latest/intro_configuration.html#ssh-executable) configuration to a bash script that did timeout 10 ssh $@ or something like that, but this is extremely hacky. (It would also require Ansible 2.2 or greater.)

@noahl
Copy link

noahl commented Dec 13, 2017

To be clear, I think the best thing to do for the long run is to make the fix in Ansible. But until that happens, the hack above might possibly work.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants