Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ele 1484 source freshness invocation #1165

Merged
merged 6 commits into from
Sep 19, 2023

Conversation

NoyaArie
Copy link
Contributor

No description provided.

@linear
Copy link

linear bot commented Sep 18, 2023

ELE-1484 Source Freshness Invocation ID

Is your feature request related to a problem? Please describe.
Trying to cross-reference the data on model dbt_source_freshness_results and the respective jobs (on BigQuery audit logs), I've realized that the invocation_id on this table relates to the invocation of the upload_source_freshness macro and that the invocation_id for the source freshness invocation is not stored.
Also, it is a little confusing to have the upload_source_freshness invocation_id concatenated to make the source_freshness_execution_id, since it is not the source freshness execution, but a whole different job.

Describe the solution you'd like

  • Including a new column on the dbt_source_freshness_results model with the source freshness execution invocation_id.
  • Using the source freshness execution invocation_id to create the source_freshness_execution_id field

Describe alternatives you've considered
No alternatives, since the invocation_id of the source freshness execution is not included anywhere

Additional context

  • The requested source freshness execution invocation_id is available on the artifact target/sources.json where all the other data is extracted from, so it shouldn't be a big challenge to include it.
  • There are concerns with creating a clear nomenclature if the currently stored upload_source_freshness invocation_id is to be kept, since two different invocation ids would be stored on this table.
  • Adding a new column would change the model schema and require recreating tables.

Would you be willing to contribute this feature?
Yes, I can contribute this.

@github-actions
Copy link
Contributor

👋 @NoyaArie
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

@NoyaArie
Copy link
Contributor Author

This PR also covers upload-once-source-freshness-results

)
select count(*) as count
from invocations
where invocation_id = {{ "'" ~ invocation_id ~ "'" }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where invocation_id = {{ "'" ~ invocation_id ~ "'" }}
where invocation_id = {{ elementary.edr_quote(invocation_id) }}

@@ -1,5 +1,5 @@
{% macro upload_source_freshness(results) %}
{% set source_freshness_results_dicts = fromjson(results) %}
{% set source_freshness_results_dict = fromjson(results) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does take multiple dicts though as target/sources.json:results is a list.
Otherwise it wouldn't work since upload_artifacts_to_table takes a list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming to source_freshness_results :)

where invocation_id = {{ "'" ~ invocation_id ~ "'" }}
{% endset %}

{% set result = elementary.run_query(query) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use .result_value()?

self.upload_results(results)
click.echo("Uploaded source freshness results successfully.")
sources_file_contents = self.get_sources_file_contents()
if sources_file_contents:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could it be falsely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty file, which should not happen, but since in the next line accessing to keys prefer checking it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would just remove it as this is not something that can happen and this flow is a bit unclear.
If you want to keep it, maybe do:

if not sources_file_contents:
    raise raise click.ClickException("Unexpectedly found an empty source.json file.")

@@ -0,0 +1,19 @@
{% macro validate_source_freshness_invocation(invocation_id, days_back=14) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Care to rename this macro to something that indicates that it checks whether the source freshness was uploaded or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming the macro to can_upload_source_freshness, and the internal variables as well

@NoyaArie NoyaArie force-pushed the ele-1484-source-freshness-invocation-id branch from c48b93f to a9447b1 Compare September 19, 2023 08:00
self.upload_results(results)
click.echo("Uploaded source freshness results successfully.")
sources_file_contents = self.get_sources_file_contents()
if sources_file_contents:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would just remove it as this is not something that can happen and this flow is a bit unclear.
If you want to keep it, maybe do:

if not sources_file_contents:
    raise raise click.ClickException("Unexpectedly found an empty source.json file.")

chunk_size = 100
chunk_list = list(range(0, len(results), chunk_size))
upload_with_progress_bar = alive_it(
chunk_list, title="Uploading source freshness results"
)
for chunk in upload_with_progress_bar:
results_segment = results[chunk : chunk + chunk_size]

for result in results_segment:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to pass the metadata to flatten_source_freshness for each record

@NoyaArie NoyaArie merged commit c9f9996 into master Sep 19, 2023
@NoyaArie NoyaArie deleted the ele-1484-source-freshness-invocation-id branch September 19, 2023 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants