Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lifecycle hooks can make the agent unresponsive #337

Open
ionphractal opened this issue Nov 4, 2024 · 1 comment
Open

Lifecycle hooks can make the agent unresponsive #337

ionphractal opened this issue Nov 4, 2024 · 1 comment

Comments

@ionphractal
Copy link

Bosh-agent itself is already running with higher priority than BOSH/monit jobs to mitigate CPU-intensive workloads blocking the agent <-> director communication, see cloudfoundry/bosh-linux-stemcell-builder@00054bd .

However, as it seems lifecycle hooks like pre-start scripts can as well have the same negative effect on the communication with the director because they are started by the bosh-agent itself and hence run with the same priority. At least this is my assumption because I wasn't able to find a line of code that lowers that priority and looking at a VM while it is running a pre-start reveals that the pre-start script with all sub-processes runs with the same priority as the agent.

In our case cloning a lot of data from the remaining part of a BOSH-managed PostgreSQL cluster can trigger this issue inconsistently, which in extreme situations extends downtime unnecessarily because the bosh task itself errors with an agent timeout and the pre-start has to run from scratch again.

Of course as a quick mitigation we could for example renice the priority in our pre-start script. Yet I would see benefit as well as consistency and hence predictability if bosh agent starts external scripts/binaries with lower priority than itself.

@rkoster
Copy link
Contributor

rkoster commented Nov 7, 2024

@ionphractal this seems like a good idea! Happy to review a PR.

@rkoster rkoster moved this from Inbox to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

2 participants