Ensure application order is respected in cascade #63

Pennycook · 2024-08-14T13:39:50Z

Dropping results from the df might lead to df["application"].unique() returning a different list of applications. The order of applications in this list determines the order in which applications are plotted.

In order to respect the order of applications in the original DataFrame, and thereby ensure that cascade and navchart plots produced from the same projection use the same legend, we need to ensure that we do not change the application order.

Related issues

This was broken by #22. We didn't pick it up until now because the graphs we've been producing aren't strictly wrong -- it's hard to see the issue without plotting a cascade plot and navchart side-by-side.

@swright87, I don't know if the PGFplots backend already handles this correctly, or if it requires an equivalent fix. Please take a look.

Proposed changes

Make note of the application order in the DataFrame before we perform any operations which might modify it.
Ensure that the application order in the DataFrame is matches the original before plotting anything.

There might be a more efficient way to do this that doesn't require sorting the DataFrame, but the _sort_by_app_order function already existed elsewhere in the code base, making this an expedient fix.

It may also be worth discussing if this is actually the fix we want, or if we applied #22 too hastily. If we instead threw an error whenever there were duplicate results in a DataFrame (and required the user to remove them somehow) there would be no chance of this happening.

Dropping results from the df might lead to df["application"].unique() returning a different list of applications. The order of applications in this list determines the order in which applications are plotted. In order to respect the order of applications in the original DataFrame, and thereby ensure that cascade and navchart plots produced from the same projection use the same legend, we need to ensure that we do not change the application order. Signed-off-by: John Pennycook <[email protected]>

laserkelvin · 2024-08-14T15:10:03Z

p3/_utils.py

@@ -30,3 +30,21 @@ def _require_numeric(df, columns):
        except Exception:
            msg = "Column '%s' must contain only numeric values."
            raise TypeError(msg % (column))
+
+
+def _sort_by_app_order(df, app_order):


This function seems really convoluted and I might just be missing the reason why because I don't have a clear mental model for what the dataframe might look like, could you give a really small example of the input/output desired?

I've also never sorted by strings this way before, so it might just be what's required

You're right to question it. I struggle to make sense of the pandas interface at the best of times, and so a lot of the solutions I come up with are not necessarily the best ones.

If we initially have a DataFrame like this:

Application FOM

A 0.5

B 0.4

B 0.9

A 0.7

...then df["application"].unique() returns ["A", "B"].

If we filter the original DataFrame such that only the best FOMs remain, then the DataFrame looks like this:

Application FOM

B 0.9

A 0.7

...and df["application"].unique() returns ["B", "A"].

So what this function does is accept the second (post-filtering) DataFrame and a list denoting the desired application order (i.e., ["A", "B"]), and then sorts the DataFrame such that the results appear in the same order:

Application FOM

A 0.7

B 0.9

Note that this isn't the same as sorting the application names. What we're trying to do is ensure that we don't alter the order in which applications appear in the DataFrame, in case that order is meaningful to the user.

As a concrete example... Imagine that the results in the DataFrame are all sorted by collection date, and that a user is appending new results to the DataFrame as they're collected. Without this convoluted sorting, adding a new (better) result to the bottom of the DataFrame could change the order in which applications appear, resulting in a different legend the next time the graph is plotted.

Does that make sense?

Inspired by our conversation about "doing one thing well" on another pull request, I think I'm coming around to the view that #22 probably was a mistake. I'll sketch out an alternative fix and open that as another pull request, so we can compare the two possible behaviors.

Plotting a NavChart requires a calculation of performance portability, which may adjust the order of applications in the DataFrame. Signed-off-by: John Pennycook <[email protected]>

The previous "expected" test result had actually been chosen based on the empirical behavior of the library. If we expect the output of the DataFrame to remain sorted by application, we should test for that. Signed-off-by: John Pennycook <[email protected]>

Pennycook · 2024-08-28T15:57:41Z

Closed in favor of #65.

Pennycook added the bug Something isn't working label Aug 14, 2024

Pennycook added this to the 1.0.0 milestone Aug 14, 2024

Pennycook requested review from swright87 and laserkelvin August 14, 2024 13:39

Pennycook marked this pull request as ready for review August 14, 2024 13:39

laserkelvin reviewed Aug 14, 2024

View reviewed changes

Pennycook added 2 commits August 16, 2024 15:13

Ensure application order is respected in navchart

59ccc5a

Plotting a NavChart requires a calculation of performance portability, which may adjust the order of applications in the DataFrame. Signed-off-by: John Pennycook <[email protected]>

Pennycook mentioned this pull request Aug 22, 2024

Reject duplicate results when handling efficiencies #65

Merged

Pennycook closed this Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure application order is respected in cascade #63

Ensure application order is respected in cascade #63

Pennycook commented Aug 14, 2024 •

edited

Loading

laserkelvin Aug 14, 2024

Pennycook Aug 14, 2024

Pennycook Aug 22, 2024

Pennycook commented Aug 28, 2024

Ensure application order is respected in cascade #63

Ensure application order is respected in cascade #63

Conversation

Pennycook commented Aug 14, 2024 • edited Loading

Related issues

Proposed changes

laserkelvin Aug 14, 2024

Choose a reason for hiding this comment

Pennycook Aug 14, 2024

Choose a reason for hiding this comment

Pennycook Aug 22, 2024

Choose a reason for hiding this comment

Pennycook commented Aug 28, 2024

Pennycook commented Aug 14, 2024 •

edited

Loading