Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add relative standard deviation to aggregated test execution metrics #681

Merged
merged 4 commits into from
Oct 25, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions osbenchmark/aggregator.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,13 +133,13 @@ def build_aggregated_results(self):
# Calculate RSD for the mean values across all test executions
# We use mean here as it's more sensitive to outliers, which is desirable for assessing variability
mean_values = [v['mean'] for v in task_metrics[metric]]
rsd = self.calculate_rsd(mean_values)
rsd = self.calculate_rsd(mean_values, f"{task}.{metric}.mean")
op_metric[metric]['mean_rsd'] = rsd

# Handle derived metrics (like error_rate, duration) which are stored as simple values
else:
# Calculate RSD directly from the metric values across all test executions
rsd = self.calculate_rsd(task_metrics[metric])
rsd = self.calculate_rsd(task_metrics[metric], f"{task}.{metric}")
op_metric[f"{metric}_rsd"] = rsd

aggregated_results["op_metrics"].append(op_metric)
Expand Down Expand Up @@ -214,9 +214,9 @@ def calculate_weighted_average(self, task_metrics: Dict[str, List[Any]], iterati

return weighted_metrics

def calculate_rsd(self, values):
def calculate_rsd(self, values, metric_name: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should also incldue the type hints for values if we are providing one for metric_name

if not values:
raise ValueError("Cannot calculate RSD for an empty list of values")
raise ValueError(f"Cannot calculate RSD for metric '{metric_name}': empty list of values")
if len(values) == 1:
return "NA" # RSD is not applicable for a single value
mean = statistics.mean(values)
Expand Down
Loading