-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs / SOP: How to review a data build #544
Comments
Also maybe worth adding, but I remember also:
|
We could also make a QC test / script, maybe even make a GH action for it ; but I don’t know if we’re at the point where that’s worth doing. |
@matentzn About this criteria:
I sometimes see files removed or added, but they are like lexical mapping If there were |
Yeah, this can happen. Basically we need to learn as a group that "files removed" is only a warning sign, and judge internally if it was expected or not (ideally by ourselves). Small files that disappear that are not exact are usually no sign for concern. |
Overview
Add / update a page to include these docs:
The data build is not reviewed for specific changes in content, but general patterns of changes. It is recommended to spend 10 minutes reviewing each databuild.
There are two important reasons to review data builds: (1) looking out for large unexplainable changes and (2) increasing your familiarity with the data generated by the pipeline. The later is as important as the former: as data stewards in the Mondo Ingest pipeline you should understand all data (every single file!) that is generated insight out, and the best way to do that is to review each file many times until it sticks. It is not wrong to use a data release to ask questions like: "what is the purpose of this file?".
Checklist
Checklist item details
4. (Almost) no file should be totally empty.
Nico:
Additional info
Context: Original discussion
Related
The text was updated successfully, but these errors were encountered: