Reject duplicate results when handling efficiencies #65

Previously, the validity of the efficiency column was being checked by the matplotlib and pgfplots backends instead of by the dispatch code. Centralizing the checks will ensure they stay in sync. Signed-off-by: John Pennycook <[email protected]>

Our previous attempt at solving this problem (see intel#22) had unexpected knock-on effects, since removing data from a user-supplied DataFrame might impact certain properties of the data (e.g., the order in which applications, platforms, and/or problems appear). Rather than complicate our implementation with workarounds that might not address every possible use-case, we can simply detect and reject problematic data. This change slightly complicates the process of working with large data, but ensures that users are always in control over which data is plotted. Signed-off-by: John Pennycook <[email protected]>

Since we're changing the way that duplicates are handled by PP, the newest version fails the original test. Signed-off-by: John Pennycook <[email protected]>

By default, groupby sorts the DataFrame. This leads to weird reordering effects when pp is used in conjunction with cascade plots and navcharts. Signed-off-by: John Pennycook <[email protected]>

The previous "expected" test result had actually been chosen based on the empirical behavior of the library. If we expect the output DataFrame to remain unsorted, we should test for that. Signed-off-by: John Pennycook <[email protected]>

Signed-off-by: John Pennycook <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject duplicate results when handling efficiencies #65

Reject duplicate results when handling efficiencies #65

Commits on Aug 22, 2024

Commits on Aug 29, 2024