Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Pair TransportService / ActionListener initialization to allow shutdown/restart hot plugging #174

Closed
dbwiddis opened this issue Oct 10, 2022 · 5 comments

Comments

@dbwiddis
Copy link
Member

What/Why

What are you proposing?

Establish the capability to:

  • Add a new extension without restarting OpenSearch
  • Remove an active extension (and its dependencies) without restarting OpenSearch
  • Auto-reboot extensions to handle transport failures (or upon request by a user)

What users have asked for this feature?

In PR #172 (based on #171), we pulled test code out of the ExtensionsRunner. This exposed a flaw with our current "initialize and leave it running forever" setup and the need to include TransportService.stop() somewhere in the API.

What problems are you trying to solve?

While adding a stop/start method is easy enough, the bigger question is, when will that be called? Testing as in #172 is only one use case. But we can also integrate this with our longer term goal of hot plugging extensions.

As part of that effort we are dealing with the sequencing of initializing Extensions, and knowing when they are initialized. See issues #65 and #94, and possible usage in #149 and #151.

During some of my REST handler testing, I've ended up with a bug causing Transport to fail, with the only way to fix it restarting the extension (and restarting OpenSearch). And we want a future where we do not have to restart OpenSearch and can dynamically add or remove extensions on-the fly.

What is the developer experience going to be?

  • To add a new extension on the fly, start it up and then make a REST request to connect it: PUT /_extensions/add/uniqueID
  • To remove a new extension on the fly, start it up and then make a REST request to connect it: PUT /_extensions/remove/uniqueID
  • To (attempt to) reconnect to a problematic extension which executes the above two steps and sends a signal to the extension to restart itself: PUT /_extensions/restart/uniqueID

Are there any security considerations?

Extensions should require any shutdown/reboot requests come from the OpenSearch instance they already are (or have been) communicating with.

What is the user experience going to be?

A user can spin up a new extension on a new node (or same node) and issue a single REST request to activate it (and its dependencies).

A user can remove an extension, or restart it to upgrade to a new software version.

Why should it be built? Any reason not to?

There may be other ways to handle this functionality but connecting REST handlers to the transport service start/stop seems a great way to enable this functionality.

What will it take to execute?

We'll need to create a design/sequence diagram that clearly identifies the complete set of initialization steps and what is required to "uninitialize".

Any remaining open questions?

Dependencies are a big unknown here. This proposal primarily deals with extensions without dependencies.

@peterzhuamazon
Copy link
Member

If this file exists on the OS cluster, would it also track which plugin version it can support?
Or wise versa?

Thanks.

@dbwiddis
Copy link
Member Author

If this file exists on the OS cluster, would it also track which plugin version it can support? Or wise versa?

I think it goes the other way around: an extension may require OpenSearch version X (or greater) which defines the SDK and capabilities that it has integrated.

I think the concept is that a file would establish an initial/baseline configuration but that configuration could be updated via REST requests (including adding/removing extensions). The JSON for the request would include the necessary settings such as version compatibility (for example, this plugin requires 2.4+, so could be a version range like [2.4.0,3.0.0).)

@dbwiddis
Copy link
Member Author

@peterzhuamazon also see #65 which directly addresses what goes into the yml file. This issue discusses updating that on the fly but #65 would be the baseline and the necessary fields to set.

@peterzhuamazon
Copy link
Member

@peterzhuamazon also see #65 which directly addresses what goes into the yml file. This issue discusses updating that on the fly but #65 would be the baseline and the necessary fields to set.

Thanks @dbwiddis for the explanation I will keep my eyes on that issue.

@dbwiddis
Copy link
Member Author

Closing this and tracking in #356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants