Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry osquery instance launch until successful or shutdown requested #1952

Merged

Conversation

RebeccaMahany
Copy link
Contributor

@RebeccaMahany RebeccaMahany commented Nov 12, 2024

Closes #1937

When the osquery runner cannot launch an osquery instance, we currently return an error, which will shut down launcher entirely.

Looking over the logs and past issues we've investigated, I saw two primary errors: 1) timeout waiting for osqueryd to create socket, indicating the osquery process did not start up, and 2) could not create an extension client where the socket file does not exist or the connection is refused.

In both of these cases, restarting launcher is overkill, and even detrimental to solving the issue. In some cases, we can see these errors happen when the current osquery version is old and not compatible with the current database; restarting launcher in this case is actively harmful because it resets the autoupdate delay, preventing a newer osquery version from being downloaded.

Instead, this PR implements retrying osquery instance launch until successful (or shutdown is requested).

I originally also wanted to implement If osquery instance launch fails, also consider triggering an autoupdate check for osquery. However, I realized this part of the criteria will be significantly easier to implement once #1412 is complete, so I'm putting it off for now.

@RebeccaMahany RebeccaMahany marked this pull request as ready for review November 12, 2024 19:38
@RebeccaMahany RebeccaMahany changed the title Retry osquery launch until successful or shutdown requested Retry osquery instance launch until successful or shutdown requested Nov 12, 2024
@RebeccaMahany RebeccaMahany added this pull request to the merge queue Nov 12, 2024
Merged via the queue into kolide:main with commit 34bfb61 Nov 12, 2024
32 checks passed
@RebeccaMahany RebeccaMahany deleted the becca/retry-instance-launch branch November 12, 2024 21:01
@RebeccaMahany RebeccaMahany added the features-improvements Features and Improvements label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
features-improvements Features and Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry launching osquery instance on failure
3 participants