Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create LocalProcessLauncher and ExceptionHandlingLauncher #164

Merged
merged 6 commits into from
Mar 22, 2022

Conversation

engechas
Copy link
Contributor

@engechas engechas commented Mar 18, 2022

Description

Migrates the ProcessLauncher to a LocalProcessLauncher. Adds an ExceptionHandlingLauncher to decorate the Launcher implementations with exception handling.

I initially tried to generalize the LocalProcessLauncher to use shell commands instead of helpers such as io or os modules. This would have allowed the implementation to be used by non-local providers. However, the telemetry devices and java_resolver are tightly coupled to evaluating the local machine as well and would have required significant refactoring.

Rather than going this route, I decided that a separate implementation for ExternalProcessLauncher could be created to handle the generalized flow of this class without coupling the logic to the local machine. This seems to the be quickest and cleanest path forwards. If there is shared logic between these implementations, it will be broken out to avoid duplication.

The LocalProcessLauncher is mostly unchanged from the original ProcessLauncher, with the exception of using the executor instead of the previous subprocess.Popen call. Additionally, I refactored a few of the methods to make them more modular.

The LocalProcessLauncher was tested by migrating the unit tests from the original ProcessLauncher. Additionally, I wired it into the builder system and ran the test_tar_distributions and test_sources integration tests.

Issues Resolved

#134

Check List

  • New functionality includes testing
    • All unit and integration tests pass
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

def __init__(self, pci, shell_executor, metrics_store, clock=time.Clock):
super().__init__(shell_executor)
self.logger = logging.getLogger(__name__)
self.pci = pci
Copy link
Collaborator

@IanHoang IanHoang Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you chose pci over provision_config_instance? I think the latter is more descriptive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth on this in my mind a couple times. I settled on pci for brevity since many of the accesses to this object will be something like self.pci.variables["abcd"]["efgh"].

But ultimately clarity > brevity so I will refactor. I have used pci in a few of the other PRs as well so I will create a followup PR after this one to replace all of the references in 1 go

if not env.get("OPENSEARCH_JAVA_OPTS"):
env["OPENSEARCH_JAVA_OPTS"] = "-XX:+ExitOnOutOfMemoryError"

# we just blindly trust telemetry here...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a note or will this be a TODO to refactor this portion so that we don't blindly trust telemetry in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inherited comment. I don't think it's a TODO so much as calling out that the telemetry devices can set environment variables that may break the launching of OpenSearch. Essentially loosely passing the burden of validating the env vars that a telemetry device specifies to the telemetry device itself.

A better solution would probably be an allowlist, but I don't have enough familiarity with the scope of telemetry environment variables to propose one.


def _get_pid_from_file(self, pid_file_name):
with open(pid_file_name, "rb") as f:
buf = f.read()
Copy link
Collaborator

@IanHoang IanHoang Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: what does buf represent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the byte buffer read from the file

Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments to get some clarity but otherwise, LGTM!

Signed-off-by: Chase Engelbrecht <[email protected]>
…og statements to be more specific

Signed-off-by: Chase Engelbrecht <[email protected]>
try:
self.launcher.stop(host, nodes)
except Exception as e:
raise LaunchError("Launching node(s) on host \"{}\" failed".format(host), e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to change the log statement here to suggest its a failure during stop process

@engechas engechas merged commit a4763e6 into opensearch-project:main Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants