Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparkRuntime property to capture runtime type in application_information #1414

Merged
merged 2 commits into from
Nov 8, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Nov 7, 2024

Fixes #1413

This PR adds a new getSparkRuntime method to capture the Spark Runtime type (SPARK, PHOTON, SPARK_RAPIDS) and store this in application_information.csv

Changes

Profiling Enhancements:

  • Added sparkRuntime property to AppInfoProfileResults to capture the runtime environment and updated the outputHeaders and convertToSeq methods to include this new property. [1] [2]
  • Updated AppInformationViewTrait to map the new sparkRuntime property when creating AppInfoProfileResults instances.

Runtime Handling:

  • Introduced SparkRuntime enumeration to represent different Spark runtimes (SPARK, PHOTON, SPARK_RAPIDS).
  • Added getSparkRuntime method to CacheablePropsHandler

Testing:

  • Added test cases in ApplicationInfoSuite to validate the spark runtime value for different event logs.

Output

File: application_information.csv

SPARK Runtime:

appIndex,appName,appId,sparkUser,startTime,endTime,duration,durationStr,sparkRuntime,sparkVersion,pluginEnabled
1,"Databricks Shell","app-20240827220408-0000","root",1724796242014,1724799713682,3471668,"58 min","SPARK","13.3.x-aarch64-scala2.12",false

SPARK_RAPIDS Runtime:

appIndex,appName,appId,sparkUser,startTime,endTime,duration,durationStr,sparkRuntime,sparkVersion,pluginEnabled
1,"Databricks Shell","app-20240827233829-0000","root",1724801903175,1724802355703,452528,"7.5 min","SPARK_RAPIDS","13.3.x-gpu-ml-scala2.12",true

PHOTON Runtime:

appIndex,appName,appId,sparkUser,startTime,endTime,duration,durationStr,sparkRuntime,sparkVersion,pluginEnabled
1,"Databricks Shell","app-20240818062343-0000","root",1723962217320,1723962595796,378476,"6.3 min","PHOTON","13.3.x-aarch64-photon-scala2.12",false

cc: @leewyang

@parthosa parthosa added the core_tools Scope the core module (scala) label Nov 7, 2024
@parthosa parthosa self-assigned this Nov 7, 2024
@parthosa parthosa marked this pull request as ready for review November 7, 2024 05:24
cindyyuanjiang
cindyyuanjiang previously approved these changes Nov 7, 2024
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa! A minor nit.

nartal1
nartal1 previously approved these changes Nov 8, 2024
Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @parthosa !

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa
Only need more comments/description to the new class/method

@amahussein amahussein added the affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) label Nov 8, 2024
@parthosa parthosa dismissed stale reviews from nartal1 and cindyyuanjiang via 2baefe2 November 8, 2024 18:04
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @parthosa! LGTM.

@parthosa parthosa merged commit 4e783d9 into NVIDIA:dev Nov 8, 2024
14 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1413 branch November 8, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Qualification/Profiling Tool: Store spark runtime for different application type
4 participants