Add qualification support for Photon jobs in the Python Tool #1409

parthosa · 2024-11-02T00:17:21Z

Issue #251.

This PR introduces support for recommending Photon applications, using a separate strategy for categorizing them:

Spark Runtime: Recommend apps with a speedup greater than 1.3x.
Photon Runtime: Recommend apps with a speedup greater than 1x.

Additionally, the Small category for Photon applications is different from that of Spark-based applications:

Spark Runtime: Apps with a speedup in the range of 1.3x to 2x are categorized as Small.
Photon Runtime: Apps with a speedup in the range of 1x to 2x are categorized as Small.

Note

Speedup Strategy is assigned on a per-app basis, enabling support for heterogeneous cases.
Hence, if a user provides both Photon and Spark event logs, the Python Tool will apply separate strategy for each app based on its execution engine (Spark or Photon)

Output

As this is a metadata property, for each app, included an entry sparkRuntime in app_metadata.json

  {
    "appId": "app-20240818062343-0000",
    "appName": "Databricks Shell",
    "eventLog": "file:/path/to/log/photon_eventlog",
    "sparkRuntime": "PHOTON",
    "estimatedGpuSpeedupCategory": "Not Recommended"
  }

Changes

Enhancements and New Features:

tool_ctxt.py: Introduced a new method get_metrics_output_folder to fetch the metrics output directory.
qualification-conf.yaml: Updated configuration to include new metrics subfolder and execution engine settings. [1] [2] [3] [4]
enums.py: Added a new ExecutionEngine class to represent different execution engines.
speedup_category.py: Introduced SpeedupStrategy class and refactored methods to accommodate execution engine-specific speedup strategies. [1] [2] [3] [4]

Refactoring and Utility Improvements:

qualification.py: Added a helper method _read_qualification_metric_file to read metric files and _assign_execution_engine_to_apps to assign execution engines to applications.
util.py: Added a utility method convert_df_to_dict to convert DataFrames to dictionaries.

Tests:

event_log_processing.feature: Added new test scenarios to validate the execution engine assignment.
e2e_utils.py and test_steps.py: Updated end-to-end test utilities to support new features. [1] [2] [3]

Follow Up

Add qualx support for platform runtime variants (DB AWS) #1417

Signed-off-by: Partho Sarthi <[email protected]>

amahussein

Thanks @parthosa !
Just for sake of confirmation:

Is there another followup PR to change the QualX module to read the app_meta.json to decide whether this app is photon or not? In that case the PR description is not accurate because it gives impression that it adds support e-2-e.
I am concerned about how we can troubleshoot and validate app_meta.json. the wrapper reads the autotuner's output and copy some of the fields to that file in the upper level. With this PR, we are adding a new field derived from python logic. Later, we will hit a question "Where does each field come from?" (this becomes even more challenging if fields might be overridden by Python wrapper). CC: @tgravescs

user_tools/src/spark_rapids_pytools/rapids/qualification.py

amahussein · 2024-11-06T17:15:42Z

user_tools/src/spark_rapids_pytools/resources/qualification-conf.yaml

+              upperBound: 1000000.0
+            - columnName: 'Unsupported Operators Stage Duration Percent'
+              lowerBound: 0.0
+              upperBound: 25.0


This needs some thinking on the impact of design.
This introduces a platform configuration inside the tool's conf. On the other hand, we do have a configuration file per platform.

I was thinking since all platforms would have the same value for spark case, we would be duplicating the configuration in each platform. In future, if we have different values for different platform, we could put these in separate platform config files.

It is a valid point that there are some common settings between platforms.
In future, we can improve our config structure to have common parent or something shared between all the platforms.
The other way around of specifying the platfrom behavioor inside the tools config will trigger a design inconsistency moving fwd; especially with every contributor's preference on where a newly added config should go.

I guess it is okay for now to keep that in order to unblock the photon feature.
Later, we can revisit this.

user_tools/src/spark_rapids_tools/enums.py

user_tools/src/spark_rapids_tools/utils/util.py

parthosa · 2024-11-06T23:12:12Z

From offline discussions with @amahussein and @leewyang, moving the detection of runtime (Spark/Photon/Velox) to Scala.

This PR will be refactored afterwards.

user_tools/tests/spark_rapids_tools_e2e/features/event_log_processing.feature

Signed-off-by: Partho Sarthi <[email protected]>

This reverts commit 8921a85 Signed-off-by: Partho Sarthi <[email protected]>

Signed-off-by: Partho Sarthi <[email protected]>

parthosa · 2024-11-12T22:46:37Z

@amahussein

Is there another followup PR to change the QualX module to read the app_meta.json to decide whether this app is photon or not?

After the changes in [FEA] Qualification/Profiling Tool: Store spark runtime for different application type #1413, the logic to determine the runtime (whether Photon or not) has been moved to Scala.
QualX module will now read the sparkRuntime column from the application_information.csv metric file to determine if the app is running on Photon.

I am concerned about how we can troubleshoot and validate app_meta.json. the wrapper reads the autotuner's output and copy some of the fields to that file in the upper level. With this PR, we are adding a new field derived from python logic. Later, we will hit a question "Where does each field come from?" (this becomes even more challenging if fields might be overridden by Python wrapper).

Similarly, all values in app_meta.json will now be derived from Scala logic, with no Python logic involved.

cindyyuanjiang

Thanks @parthosa!

amahussein

Thanks @parthosa

Just add a comment in the config file to explain why we picked those new threshold for the photon categories..

user_tools/src/spark_rapids_tools/storagelib/csppath.py

user_tools/src/spark_rapids_pytools/resources/qualification-conf.yaml

amahussein · 2024-11-14T17:33:00Z

user_tools/src/spark_rapids_pytools/resources/qualification-conf.yaml

+              upperBound: 1000000.0
+            - columnName: 'Unsupported Operators Stage Duration Percent'
+              lowerBound: 0.0
+              upperBound: 25.0


I guess it is okay for now to keep that in order to unblock the photon feature.
Later, we can revisit this.

Signed-off-by: Partho Sarthi <[email protected]>

amahussein

Thanks @parthosa

cindyyuanjiang

Thanks @parthosa! LGTM.

parthosa added 5 commits November 1, 2024 16:30

Add support for different speedup threshold in Photon

cd0ab54

Signed-off-by: Partho Sarthi <[email protected]>

Add speed up strategy per app

825bb95

Signed-off-by: Partho Sarthi <[email protected]>

Assign app execution engine and categorize speedup based on engine type

f29057e

Signed-off-by: Partho Sarthi <[email protected]>

Add E2E test cases

8921a85

Signed-off-by: Partho Sarthi <[email protected]>

Rename App Execution Engine to Execution Engine

bf9d0d0

Signed-off-by: Partho Sarthi <[email protected]>

parthosa added feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels Nov 2, 2024

parthosa self-assigned this Nov 2, 2024

Rename loop variables

53f60ac

Signed-off-by: Partho Sarthi <[email protected]>

parthosa marked this pull request as ready for review November 4, 2024 20:19

parthosa requested review from tgravescs, cindyyuanjiang, amahussein and nartal1 November 4, 2024 20:21

parthosa added the affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) label Nov 4, 2024

parthosa mentioned this pull request Nov 4, 2024

[FEA] Add qualification support for Databricks Photon event logs #251

Closed

amahussein reviewed Nov 6, 2024

View reviewed changes

parthosa marked this pull request as draft November 6, 2024 23:12

cindyyuanjiang reviewed Nov 6, 2024

View reviewed changes

user_tools/tests/spark_rapids_tools_e2e/features/event_log_processing.feature Outdated Show resolved Hide resolved

parthosa added 4 commits November 8, 2024 12:29

Merge branch 'dev' into spark-rapids-tools-251-support-photon-in-python

5cfd635

Address review comments and parse spark runtime

cf010f8

Signed-off-by: Partho Sarthi <[email protected]>

Revert "Add E2E test cases"

bc2e3c7

This reverts commit 8921a85 Signed-off-by: Partho Sarthi <[email protected]>

Buffer logging

204a8e5

Signed-off-by: Partho Sarthi <[email protected]>

parthosa marked this pull request as ready for review November 12, 2024 22:38

cindyyuanjiang previously approved these changes Nov 14, 2024

View reviewed changes

amahussein requested changes Nov 14, 2024

View reviewed changes

Add directory list helper methods in CspFs and Photon comment

7fcdbd6

Signed-off-by: Partho Sarthi <[email protected]>

parthosa dismissed cindyyuanjiang’s stale review via 7fcdbd6 November 14, 2024 18:42

Use create_sub_path() to keep protocol intact

3283749

Signed-off-by: Partho Sarthi <[email protected]>

amahussein approved these changes Nov 14, 2024

View reviewed changes

cindyyuanjiang approved these changes Nov 14, 2024

View reviewed changes

parthosa merged commit 43825d8 into NVIDIA:dev Nov 14, 2024
14 checks passed

parthosa deleted the spark-rapids-tools-251-support-photon-in-python branch November 14, 2024 19:37

parthosa mentioned this pull request Nov 18, 2024

[FEA] Qualification/Profiling Tool: Store spark runtime for different application type #1413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add qualification support for Photon jobs in the Python Tool #1409

Add qualification support for Photon jobs in the Python Tool #1409

parthosa commented Nov 2, 2024 •

edited

Loading

amahussein left a comment

amahussein Nov 6, 2024

parthosa Nov 12, 2024

amahussein Nov 14, 2024

amahussein Nov 14, 2024

parthosa commented Nov 6, 2024

parthosa commented Nov 12, 2024

cindyyuanjiang left a comment

amahussein left a comment

amahussein Nov 14, 2024

amahussein left a comment

cindyyuanjiang left a comment

Add qualification support for Photon jobs in the Python Tool #1409

Add qualification support for Photon jobs in the Python Tool #1409

Conversation

parthosa commented Nov 2, 2024 • edited Loading

Note

Output

Changes

Enhancements and New Features:

Refactoring and Utility Improvements:

Tests:

Follow Up

amahussein left a comment

Choose a reason for hiding this comment

amahussein Nov 6, 2024

Choose a reason for hiding this comment

parthosa Nov 12, 2024

Choose a reason for hiding this comment

amahussein Nov 14, 2024

Choose a reason for hiding this comment

amahussein Nov 14, 2024

Choose a reason for hiding this comment

parthosa commented Nov 6, 2024

parthosa commented Nov 12, 2024

cindyyuanjiang left a comment

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

amahussein Nov 14, 2024

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

cindyyuanjiang left a comment

Choose a reason for hiding this comment

parthosa commented Nov 2, 2024 •

edited

Loading