Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make osq runner responsive to registration updates #2007

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

zackattack01
Copy link
Contributor

@zackattack01 zackattack01 commented Dec 17, 2024

  • reworks interfaces for runner change detection
    • knapsack's querier is currently implemented by our runner, but we need the runner to do more here. This adds an OsqRunner interface and which encompasses the previous InstanceQuerier interface and adds our new RegistrationChangeHandler requirements
    • adds an UpdateRegistrationIDs method to the runner which will detect any changes and restart the instances accordingly

To allow for a graceful restart of all known instances after registration IDs has been updated, a few tricky changes were required that I'd like more eyes on. context:

  • within our Shutdown method, we currently close the shutdown channel to signal our shutdown intent. This works well but isn't compatible with needing to read from that channel again when the runner would stay up (e.g. updated registration ID restarts). resetting the channel did not feel correct and introduces data races due to the way we need to thread/use wait groups here
  • I initially tried reworking things to have alternate shutdown patterns for reloading/restarting vs actually shutting down, and adding a secondary reload channel - but we would still need the shutdown channel to stay open
  • updating to just send on the shutdown channel and leave it open breaks all shutdown functionality, because there is no way to sync the timing required here across multiple instances. send will block until read on the unbuffered channel, but depending on the state of any given instance at the time the shutdown is called, we may not have that select block open (we block on instance.Exited() above that)
  • so, we need r.shutdown to hold that message- the reason this worked as expected with close(r.shutdown) is because that channel will read a zero value and fire immediately whenever it is read next, even if the select wasn't open before the close was called
  • the only way I could think to get around this was to instead make that a buffered channel, and ensure we send one shutdown message per instance to see

If anyone has alternate suggestions please let me know!

@zackattack01 zackattack01 force-pushed the zack/runner_registration_ids branch from a1d2603 to 01e4063 Compare December 19, 2024 17:35
@zackattack01 zackattack01 force-pushed the zack/runner_registration_ids branch from 01e4063 to 2d74749 Compare December 19, 2024 18:27
@@ -49,6 +53,31 @@ func New(k types.Knapsack, serviceClient service.KolideService, opts ...OsqueryI
}

func (r *Runner) Run() error {
for {
// if our instances ever exit unexpectedly, return immediately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment comment isn't accurate? I think runRegisteredInstances only returns if a shutdown was requested. And it only returns an error if a) shutdown was requested and b) we were trying to restart one or more instances during that time and c) hadn't successfully restarted one of them yet.

Either way, why would we return the error here instead of checking rerunRequired first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point thank you. we'd probably want to rerun regardless if required, I will update that comment and get this fixed up!

)

// we know there are changes, safe to update the internal registrationIDs now
r.registrationIds = newRegistrationIDs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a lock on r.registrationIds to avoid a data race?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems like a good idea, will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants