Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject duplicate results when handling efficiencies #65

Merged
merged 6 commits into from
Aug 29, 2024

Commits on Aug 22, 2024

  1. Refactor eff column handling for CascadePlot

    Previously, the validity of the efficiency column was being checked by
    the matplotlib and pgfplots backends instead of by the dispatch code.
    
    Centralizing the checks will ensure they stay in sync.
    
    Signed-off-by: John Pennycook <[email protected]>
    Pennycook committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    8331f4c View commit details
    Browse the repository at this point in the history
  2. Reject duplicate (application, platform) pairs

    Our previous attempt at solving this problem (see intel#22) had unexpected
    knock-on effects, since removing data from a user-supplied DataFrame
    might impact certain properties of the data (e.g., the order in which
    applications, platforms, and/or problems appear).
    
    Rather than complicate our implementation with workarounds that might
    not address every possible use-case, we can simply detect and reject
    problematic data.
    
    This change slightly complicates the process of working with large data,
    but ensures that users are always in control over which data is plotted.
    
    Signed-off-by: John Pennycook <[email protected]>
    Pennycook committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    5633db0 View commit details
    Browse the repository at this point in the history
  3. Modify PP duplicates test

    Since we're changing the way that duplicates are handled by PP, the
    newest version fails the original test.
    
    Signed-off-by: John Pennycook <[email protected]>
    Pennycook committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    3715d7c View commit details
    Browse the repository at this point in the history
  4. Prevent pp from sorting during calculation

    By default, groupby sorts the DataFrame. This leads to weird reordering
    effects when pp is used in conjunction with cascade plots and navcharts.
    
    Signed-off-by: John Pennycook <[email protected]>
    Pennycook committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    666d81a View commit details
    Browse the repository at this point in the history
  5. Update pp test to match new sorting behavior

    The previous "expected" test result had actually been chosen based on
    the empirical behavior of the library. If we expect the output DataFrame
    to remain unsorted, we should test for that.
    
    Signed-off-by: John Pennycook <[email protected]>
    Pennycook committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    47d7110 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. Explain why we reject duplicate (app, plat) pairs

    Signed-off-by: John Pennycook <[email protected]>
    Pennycook committed Aug 29, 2024
    Configuration menu
    Copy the full SHA
    9db17ae View commit details
    Browse the repository at this point in the history