Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure application order is respected in cascade #63

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions p3/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,21 @@ def _require_numeric(df, columns):
except Exception:
msg = "Column '%s' must contain only numeric values."
raise TypeError(msg % (column))


def _sort_by_app_order(df, app_order):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems really convoluted and I might just be missing the reason why because I don't have a clear mental model for what the dataframe might look like, could you give a really small example of the input/output desired?

I've also never sorted by strings this way before, so it might just be what's required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to question it. I struggle to make sense of the pandas interface at the best of times, and so a lot of the solutions I come up with are not necessarily the best ones.

If we initially have a DataFrame like this:

Application FOM
A 0.5
B 0.4
B 0.9
A 0.7

...then df["application"].unique() returns ["A", "B"].

If we filter the original DataFrame such that only the best FOMs remain, then the DataFrame looks like this:

Application FOM
B 0.9
A 0.7

...and df["application"].unique() returns ["B", "A"].

So what this function does is accept the second (post-filtering) DataFrame and a list denoting the desired application order (i.e., ["A", "B"]), and then sorts the DataFrame such that the results appear in the same order:

Application FOM
A 0.7
B 0.9

Note that this isn't the same as sorting the application names. What we're trying to do is ensure that we don't alter the order in which applications appear in the DataFrame, in case that order is meaningful to the user.

As a concrete example... Imagine that the results in the DataFrame are all sorted by collection date, and that a user is appending new results to the DataFrame as they're collected. Without this convoluted sorting, adding a new (better) result to the bottom of the DataFrame could change the order in which applications appear, resulting in a different legend the next time the graph is plotted.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspired by our conversation about "doing one thing well" on another pull request, I think I'm coming around to the view that #22 probably was a mistake. I'll sketch out an alternative fix and open that as another pull request, so we can compare the two possible behaviors.

"""
Sort the DataFrame such that the order of applications matches that
specified in app_order.
"""

def index_function(row):
return app_order.tolist().index(row["application"])

sort_index = df.apply(index_function, axis=1)
sort_index.name = "sort_index"

order = df.join(sort_index).sort_values(by=["sort_index"]).index
df = df.loc[order]
df.reset_index(inplace=True, drop=True) # add style change
return df
7 changes: 6 additions & 1 deletion p3/metrics/_pp.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import pandas as pd

from p3._utils import _require_columns, _require_numeric
from p3._utils import _require_columns, _require_numeric, _sort_by_app_order


def _hmean(series):
Expand Down Expand Up @@ -83,6 +83,8 @@ def pp(df):
if not df[eff].fillna(0).between(0, 1).all():
raise ValueError(f"{eff} must in range [0, 1]")

app_order = df["application"].unique()

# Keep only the most efficient (application, platform) results.
key = ["problem", "platform", "application"]
groups = df[key + efficiencies].groupby(key)
Expand Down Expand Up @@ -124,4 +126,7 @@ def pp(df):
pp.rename(columns={eff: new_column}, inplace=True)
pp = pp.astype({new_column: "float64"})

# Sort the final DataFrame to match the original application order
pp = _sort_by_app_order(pp, app_order)

return pp
5 changes: 4 additions & 1 deletion p3/plot/backend/matplotlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from matplotlib.path import Path

import p3.metrics
from p3._utils import _require_numeric
from p3._utils import _require_numeric, _sort_by_app_order
from p3.plot._common import ApplicationStyle, Legend, PlatformStyle
from p3.plot.backend import CascadePlot, NavChart

Expand Down Expand Up @@ -149,10 +149,13 @@ def __init__(self, df, eff=None, size=None, fig=None, axes=None, **kwargs):
size = (6, 5)

# Keep only the most efficient (application, platform) results.
# Ensure that the order of applications is unchanged, for the legend.
app_order = df["application"].unique()
key = ["problem", "platform", "application"]
groups = df[key + [eff_column]].groupby(key)
df = groups.agg("max")
df.reset_index(inplace=True)
df = _sort_by_app_order(df, app_order)

platforms = df["platform"].unique()
applications = df["application"].unique()
Expand Down
20 changes: 1 addition & 19 deletions p3/report/_snapshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

import p3.metrics
import p3.plot
from p3._utils import _require_columns
from p3._utils import _require_columns, _sort_by_app_order
from p3.metrics._divergence import _coverage_string_to_json


Expand All @@ -34,24 +34,6 @@ def _block_symlinks(path):
raise PermissionError("Refusing to create files via symbolic link.")


def _sort_by_app_order(df, app_order):
"""
Sort the DataFrame such that the order of applications matches that
specified in app_order.
"""

def index_function(row):
return app_order.tolist().index(row["application"])

sort_index = df.apply(index_function, axis=1)
sort_index.name = "sort_index"

order = df.join(sort_index).sort_values(by=["sort_index"]).index
df = df.loc[order]
df.reset_index(inplace=True, drop=True) # add style change
return df


def snapshot(df, cov=None, directory=None):
"""
Generate an HTML report representing a snapshot of P3 characteristics.
Expand Down
8 changes: 5 additions & 3 deletions tests/metrics/test_pp.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Copyright (C) 2022-2023 Intel Corporation
# SPDX-License-Identifier: MIT

import unittest

import pandas as pd

from p3.metrics import pp
import unittest


class TestPP(unittest.TestCase):
Expand Down Expand Up @@ -90,8 +92,8 @@ def test_pp(self):

expected_data = {
"problem": ["test"] * 3,
"application": ["best", "dummy", "latest"],
"app pp": [1.0, 0.0, 0.4878],
"application": ["latest", "best", "dummy"],
"app pp": [0.4878, 1.0, 0.0],
"arch pp": [0.0] * 3,
}
expected_df = pd.DataFrame(expected_data)
Expand Down
Loading