Skip to content

Commit

Permalink
Merge pull request #58 from Pennycook/multi-component-example
Browse files Browse the repository at this point in the history
Add draft of per-component example
  • Loading branch information
Pennycook authored Sep 3, 2024
2 parents ef826b2 + 1733fdf commit def324f
Show file tree
Hide file tree
Showing 3 changed files with 270 additions and 0 deletions.
2 changes: 2 additions & 0 deletions examples/metrics/application_efficiency.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
# Copyright (c) 2024 Intel Corporation
# SPDX-License-Identifier: 0BSD
"""
.. _working_with_app_efficiency:
Working with Application Efficiency
===================================
Expand Down
266 changes: 266 additions & 0 deletions examples/metrics/multiple_components.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
#!/usr/bin/env python3
# Copyright (c) 2024 Intel Corporation
# SPDX-License-Identifier: 0BSD
"""
Handling Software with Multiple Components
==========================================
Viewing applications as composites.
When working with very large and complex pieces of software, reporting
performance using a single number (e.g., total time-to-solution) obscures
details about the performance of different software components. Using such
totals during P3 analysis therefore prevents us from understanding how
different software components behave on different platforms.
Identifying which software components have poor P3 characteristics is necessary
to understand what action(s) we can take to improve the P3 characteristics of
a software package as a whole. Although accounting for multiple components can
make data collection and analysis slightly more complicated, the additional
insight it provides is very valuable.
.. tip::
This approach can be readily applied to parallel software written to
heterogeneous programming frameworks (e.g., CUDA, OpenCL, SYCL, Kokkos),
where distinct "kernel"s can be identified and profiled easily. For a
real-life example of this approach in practice, see "`A
Performance-Portable SYCL Implementation of CRK-HACC for Exascale
<https://dl.acm.org/doi/10.1145/3624062.3624187>`_.
Data Preparation
----------------
To keep things simple, let's imagine that our software package consists of just
two components, and that each component has two different implementations that
can both be run on two different machines:
.. list-table::
:widths: 20 20 20 20
:header-rows: 1
* - component
- implementation
- machine
- fom
* - Component 1
- Implementation 1
- Cluster 1
- 2.0
* - Component 2
- Implementation 1
- Cluster 1
- 5.0
* - Component 1
- Implementation 2
- Cluster 1
- 3.0
* - Component 2
- Implementation 2
- Cluster 1
- 4.0
* - Component 1
- Implementation 1
- Cluster 2
- 1.0
* - Component 2
- Implementation 1
- Cluster 2
- 2.5
* - Component 1
- Implementation 2
- Cluster 2
- 0.5
* - Component 1
- Implementation 2
- Cluster 2
- 3.0
Our first step is to project this data onto P3 definitions,
treating the functionality provided by each component as a
separate problem to be solved:
"""

# sphinx_gallery_start_ignore
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import p3

data = {
"component": ["Component 1", "Component 2"] * 4,
"implementation": (["Implementation 1"] * 2 + ["Implementation 2"] * 2) * 2,
"machine": ["Cluster 1"] * 4 + ["Cluster 2"] * 4,
"fom": [2.0, 5.0, 3.0, 4.0, 1.0, 2.5, 0.5, 3.0],
}

df = pd.DataFrame(data)
# sphinx_gallery_end_ignore

proj = p3.data.projection(
df,
problem=["component"],
application=["implementation"],
platform=["machine"],
)
print(proj)

# %%
# .. note::
# See ":ref:`Understanding Data Projection <understanding_projection>`" for
# more information about projection.
#
# Application Efficiency per Component
# ------------------------------------
#
# Having projected the performance data onto P3 definitions, we can now compute
# the application efficiency for each component:

effs = p3.metrics.application_efficiency(proj)
print(effs)

# %%
# .. note::
# See ":ref:`Working with Application Efficiency
# <working_with_app_efficiency>`" for more information about application
# efficiency.
#
# Plotting a graph for each platform separately is a good way to visualize and
# compare the application efficiency of each component:

cluster1 = effs[effs["platform"] == "Cluster 1"]
pivot = cluster1.pivot(index="application", columns=["problem"])["app eff"]
pivot.plot(
kind="bar",
xlabel="Component",
ylabel="Application Efficiency",
title="Cluster 1",
)
plt.savefig("cluster1_application_efficiency_bars.png")

cluster2 = effs[effs["platform"] == "Cluster 2"]
pivot = cluster2.pivot(index="application", columns=["problem"])["app eff"]
pivot.plot(
kind="bar",
xlabel="Component",
ylabel="Application Efficiency",
title="Cluster 2",
)
plt.savefig("cluster2_application_efficiency_bars.png")

# %%
# On Cluster 1, Implementation 1 delivers the best performance for Component 1,
# but Implementation 2 delivers the best performance for Component 2. On
# Cluster 2, that trend is reversed. Clearly, there is no single implementation
# that delivers the best performance everywhere.
#
# Overall Application Efficiency
# ------------------------------
#
# Computing the application efficiency of the software package as a whole
# requires a few more steps.
#
# First, we need to compute the total time taken by each application on each
# platform:

package = proj.groupby(["platform", "application"], as_index=False)["fom"].sum()
package["problem"] = "Package"
print(package)

# %%
# Then, we can use this data to compute application efficiency, as below:

effs = p3.metrics.application_efficiency(package)
print(effs)

# %%
# These latest results suggest that both Implementation 1 and Implementation 2
# are both achieving the best-known performance when running the package as a
# whole. This isn't *strictly* incorrect, since the values of their combined
# figure-of-merit *are* the same, but we know from our earlier per-component
# analysis that it could be possible to achieve better performance results.
#
# Specifically, our per-component analysis shows us that an application that
# could pick and choose the best implementation of different components for
# different platforms would achieve better overall performance.
#
# .. important::
# Combining component implementations in this way is purely hypothetical,
# and there may be very good reasons (e.g., incompatible data structures)
# that an application is unable to use certain combinations. Although
# removing such invalid combinations would result in a tighter upper
# bound, it is much simpler to leave them in place. Including all
# combinations may even identify potential opportunities to combine
# approaches that initially appeared incompatible (e.g., by writing
# routines to convert between data structures).
#
# We can fold that observation into our P3 analysis by creating an entry in our
# dataset that represents the results from a hypothetical application:

hypothetical_components = proj.groupby(["problem", "platform"], as_index=False)[
"fom"
].min()
hypothetical_components["application"] = "Hypothetical"
print(hypothetical_components)

# %%

# Calculate the combined figure of merit for both components
hypothetical_package = hypothetical_components.groupby(
["platform", "application"], as_index=False,
)["fom"].sum()
hypothetical_package["problem"] = "Package"

# Append the hypothetical package data to our previous results
package = pd.concat([package, hypothetical_package], ignore_index=True)
print(package)

# %%
# As expected, our new hypothetical application achieves better performance
# by mixing and matching different implementations. And if we now re-compute
# application efficiency with this data included:

effs = p3.metrics.application_efficiency(package)
print(effs)

# %%
# ... we see that the application efficiency of Implementation 1 and
# Implementation 2 has been reduced accordingly. Including hypothetical
# upper-bounds of performance in our dataset can therefore be a simple and
# effective way to improve the accuracy of our P3 analysis, even if a
# true theoretical upper-bound (i.e., from a performance model) is unknown.
#
# .. note::
# The two implementations still have the *same* efficiency, even after
# introducing the hypothetical implementation. Per-component analysis is
# still required to understand how each component contributes to the
# overall efficiency, and to identify which component(s) should be improved
# on which platform(s).

# %%
# Further Analysis
# ----------------
#
# Computing application efficiency is often simply the first step of a
# more detailed P3 analysis.
#
# The examples below show how we can use the visualization capabilities
# of the P3 Analysis Library to compare the efficiency of different
# applications running across the same platform set, or to gain insight
# into how an application's efficiency relates to the code it uses on each
# platform.
#
# .. minigallery::
# :add-heading: Examples
#
# ../../examples/cascade/plot_simple_cascade.py
# ../../examples/navchart/plot_simple_navchart.py
2 changes: 2 additions & 0 deletions examples/metrics/projection.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
# Copyright (c) 2024 Intel Corporation
# SPDX-License-Identifier: 0BSD
"""
.. _understanding_projection:
Understanding Data Projection
=============================
Expand Down

0 comments on commit def324f

Please sign in to comment.