Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Handle unsupported runtimes for platform and fallback to default #1420

Open
parthosa opened this issue Nov 12, 2024 · 4 comments · May be fixed by #1421
Open

[BUG] Handle unsupported runtimes for platform and fallback to default #1420

parthosa opened this issue Nov 12, 2024 · 4 comments · May be fixed by #1421
Assignees
Labels
bug Something isn't working core_tools Scope the core module (scala)

Comments

@parthosa
Copy link
Collaborator

parthosa commented Nov 12, 2024

Describe the bug

In #1413, we added functionality to store the runtime type (SPARK, SPARK_RAPID, PHOTON) for each application by parsing event logs.

However, users might provide an event log with a runtime type that is unsupported by the specified platform. For example, a user could provide a photon event log with the --platform onprem, which may not support it.

Current Result
For the above scenario, we would generate the runtime as PHOTON.

Expected Outcome
We should fallback to the default runtime for the given platform i.e. SPARK for onprem.

Proposed Solution

We need to maintain a list of supported runtimes for each platform. If the parsed runtime is unsupported, we should:

  1. Generate a warning about the platform mismatch.
  2. Use the default runtime for that platform as a fallback.

Since the runtime is calculated in the Scala code, decision to fall back to the default runtime should also be made in the Scala code. This will ensure consistent usage by the downstream Python CLI and QualX.

@amahussein
Copy link
Collaborator

This is more complicated. The tools would actually be wrong if the output reports that a photon eventlog "SPARK". So, I don't agree with that behavior because the field is supposed to be a raw output that should not change by the user's input.

In Scala, the platform argument was used to decide on which speedFactor file to load.
Since we dropped the speedup-factor, I wonder if we should consider getting rid of the argument and let the tools detect the platfrom based on the eventlog!

The other side of the story is the Python wrapper. We needed the platform argument basically because the python was using the CSP sdk.
We are moving away from that requiremnt by running a local-mode. Even for distributed mode, the runtime of the Tools does not have to reflect the platfrom of the eventlogs.
This means that the platfrom argument will have to be revisited anyway.

@parthosa
Copy link
Collaborator Author

That is an interesting point.

Alternative Behaviour:

  • Instead of silently falling back to SPARK, we report failure for incompatible apps.
  • Now, one of the cons could be that we spend time processing these apps only to report it as failure.

@mattahrens
Copy link
Collaborator

A couple points...

  1. I am generally in favor of letting the tools detect the platform based on the eventlog, but what about logs in S3? Are we able to distinguish between EMR and Databricks AWS and even AWS EC2/EKS?
  2. Thinking more of this, I'd rather just fail the job if we are giving PHOTON runtime logs and platform is given as on-prem by the user so we educate the user on the wrong platform.

@parthosa
Copy link
Collaborator Author

parthosa commented Dec 9, 2024

  • For the scope of this issue, it seems reasonable to fail the job if the user provides event logs with a runtime that is not supported by the platform.
  • I agree, we should be educating the user to use correct platform. If the user does not provide one and we default to onprem, the speedups provided by QualX based on the onprem platform will be inaccurate. This issue also exists when using Photon or no-Photon configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
3 participants