Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][vpj] Add a way to run DataWriter jobs in an isolated environment #1265

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nisargthakkar
Copy link
Contributor

@nisargthakkar nisargthakkar commented Oct 28, 2024

Add a way to run DataWriter jobs in an isolated environment

Currently, VPJ is written in a way where it launches the compute tasks from it's local environment. If an environment only allows running Spark jobs via spark-submit, the entire VPJ logic needs to run on the Spark driver, making the driver non-idempotent.

With this change, we can separate the VPJ driver logic from Spark compute environment, and launch only the compute tasks on Spark. The driver on Spark is a very thin wrapper that only parses CLI args and launches the compute jobs. This makes the Spark compute job idempotent as well, and can improve the resiliency.

Another benefit of this change is that this improves the debugging experience as the logs can now be viewed in the environment where the user's job was triggered, and they don't have to check the logs on the Spark driver.

This change is implemented as an implementation of the DataWriterComputeJob interface, and all interactions with external systems are contained within this interface. This change serializes the job properties and job configs as CLI args and passes them to the isolated environment. The main class in the isolated environment parses these CLI args and configures the actual compute job. At the end of the compute job, the driver program on the isolated environment serializes the DataWriterTaskTracker to HDFS, and the VPJ driver program reads the same file from HDFS and returns it to VPJ to perform further validation and job polling.

How was this PR tested?

Tested manually, and in integration tests. More testing is in progress. Unit tests need to be added

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant