add a sample script to validate Metric Hub definitions #527

bochocki · 2024-07-18T22:14:29Z

I used a script similar to this to validate definition changes in #483 and #523 after merging. @jmsilverman suggested it could be helpful to check the script into the repo.

Open to a discussion about whether or not this is useful and, if not, if there's something we can do instead.

@jmsilverman

I used a script similar to this to validate definition changes in #483 and #523 after merging. @jmsilverman suggested it could be helpful to check the script into the repo. Open to a discussion about whether or not this is useful and, if not, if there's something we can do instead.

jmsilverman

Requesting a couple small code changes via inline comments.

As Brad mentioned, I think this script is useful to have in the repo and I ran it in Colab and it worked great!

I'm not totally sure if it belongs in the README vs. as a standalone script/file, so I'm curious to hear other people's opinions on that.

definitions/README.md

danielkberry

I think this is a great idea, just a few suggestions

danielkberry · 2024-07-22T14:40:06Z

definitions/README.md

+    # Modify the metric source table string so that it formats nicely in the query.
+    from_expression = metric.data_source._from_expr.replace("\n", "\n" + " " * 15)
+
+    query = dedent(


What about adding the query builder as a method inside of DataSource, something like DataSource.build_validation_query(start_date: str, end_date: str) -> str. Then we can expose that through the Metric through something like:

# in Metric class def build_validation_query(self, start_date: str, end_date: str) -> str: return self.data_source.build_validation_query(start_date, end_date)

In the DataSource method, we can handle modifying the from expression like how build_query does it. Additionally, that gives us a separate method to unit test. Right now, the logic is hidden within this larger function.

If we do that, this function can be simplified to:

metric = ConfigLoader.get_metric(metric_slug=metric_slug, app_name=app_name) query = metric.build_validation_query(start_date, end_date) df = bigquery.Client(project=bq_project).query(query).to_dataframe() df[submission_date_column] = pd.to_datetime(df[submission_date_column]).dt.date return df

I really, really like the idea of having an attribute on Metric that provides a query. I don't know if it's difficult to do in the general case, but it's something I think could be useful outside of this case. For example, there's a class in the forecasting code whose main goal is to build a query from Metric Hub components.

definitions/README.md

danielkberry · 2024-07-22T14:44:22Z

definitions/README.md

+               {metric.select_expr} AS value
+          FROM {from_expression}
+         WHERE {submission_date_column} BETWEEN '{start_date}' AND '{end_date}'
+         GROUP BY {submission_date_column}


Most metric definitions assume that we're grouping by client_id too, so we should add that to the GROUP BY

Is there a programmatic way to figure out what other groupby dimensions might be?

definitions/README.md

bochocki · 2024-07-22T19:45:41Z

Changed to WIP so it's clear this isn't meant to be merged right now. I also addressed some of the comments.

I think that having a way to build queries from Metric Hub objects would be a useful addition to mozanalysis and would probably be more valuable than documenting a script like this somewhere.

bochocki requested review from scholtzan, danielkberry, jmsilverman and fbertsch July 18, 2024 22:14

github-actions bot approved these changes Jul 18, 2024

View reviewed changes

bochocki added 2 commits July 18, 2024 15:15

remove newline

89d0139

add newline to pass CI check

0e3648c

github-actions bot approved these changes Jul 18, 2024

View reviewed changes

remove newline to pass CI check

439c6f1

jmsilverman suggested changes Jul 19, 2024

View reviewed changes

definitions/README.md Show resolved Hide resolved

definitions/README.md Show resolved Hide resolved

danielkberry requested changes Jul 22, 2024

View reviewed changes

bochocki marked this pull request as draft July 22, 2024 19:39

bochocki and others added 2 commits July 22, 2024 12:39

make suggested changes

2ad9b56

Merge branch 'main' into add-python-metric-validation-script

0aba5af

github-actions bot approved these changes Jul 22, 2024

View reviewed changes

remove newline to pass CI validation

b60e104

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a sample script to validate Metric Hub definitions #527

add a sample script to validate Metric Hub definitions #527

bochocki commented Jul 18, 2024

jmsilverman left a comment

danielkberry left a comment

danielkberry Jul 22, 2024

danielkberry Jul 22, 2024

bochocki Jul 22, 2024

danielkberry Jul 22, 2024

bochocki Jul 22, 2024

bochocki commented Jul 22, 2024

add a sample script to validate Metric Hub definitions #527

Are you sure you want to change the base?

add a sample script to validate Metric Hub definitions #527

Conversation

bochocki commented Jul 18, 2024

jmsilverman left a comment

Choose a reason for hiding this comment

danielkberry left a comment

Choose a reason for hiding this comment

danielkberry Jul 22, 2024

Choose a reason for hiding this comment

danielkberry Jul 22, 2024

Choose a reason for hiding this comment

bochocki Jul 22, 2024

Choose a reason for hiding this comment

danielkberry Jul 22, 2024

Choose a reason for hiding this comment

bochocki Jul 22, 2024

Choose a reason for hiding this comment

bochocki commented Jul 22, 2024