Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add robust pipeline schema usage #6

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jorisbr
Copy link
Collaborator

@jorisbr jorisbr commented Oct 4, 2024

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change.

This PR contains two changes:

  • We currently don't enforce completeness of the API schema. Since the schema of the available data in a given organisation propagates throughout the pipeline we ran into an issue where we're trying to build a BQ query that fails since certain fields were missing. The component impacted was the transform step and all subsequent component also couldn't deal with the absence of certain fields. One field missing from the organisation I'm setting up was previousMeasurement. It had previously never happened that a Firestore collection didn't contain any instance of this field but this case shouldn't be breaking for the pipeline. There was also at least one field missing from within the previous/current measurement struct. While these fields are nullable they should still exist in the schema for the queries to properly function.
  • Currently we ran into an issue where, for some reason, the firestore exort request file didn't get cleaned up properly. To this end we updated the firestore_export component to always start of with clearing the directory in the storage container of any remaining files.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • This has been tested by running a pipeline job for the newly added organisation which has missing fields for both operating systems

Copy link
Member

@leonardpunt leonardpunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Only 1 comment, we're now cleaning the bucket first. Which means we can't run 2 export jobs consecutively. Maybe that's not needed (and maybe we couldn't already?), but it's good to be aware of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants