Docs / SOP: How to review a data build #544

joeflack4 · 2024-05-25T18:55:58Z

Overview

Add / update a page to include these docs:

The data build is not reviewed for specific changes in content, but general patterns of changes. It is recommended to spend 10 minutes reviewing each databuild.

There are two important reasons to review data builds: (1) looking out for large unexplainable changes and (2) increasing your familiarity with the data generated by the pipeline. The later is as important as the former: as data stewards in the Mondo Ingest pipeline you should understand all data (every single file!) that is generated insight out, and the best way to do that is to review each file many times until it sticks. It is not wrong to use a data release to ask questions like: "what is the purpose of this file?".

Checklist

Ensure that no files are added or removed. There are few good reasons for files being added or removed and if they happen they should be explained.
ORDO, DOID and OMIM matches and migration files should have "reasonable" changes, i.e. be in line with one could expect as a consequence of a few weeks worth of curation (example: 1000 added lines is not a good sign, but 70 removed lines is within reason).
Metrics and ontology related files should change within reason (numbers like axiom counts changing in the realms of 250 plus minus are nearly always ok, changes between 250 and 1000 are worth a second look, and changes beyond 1000 merit an investigation).
(Almost) no file should be totally empty.

Checklist item details

4. (Almost) no file should be totally empty.

Nico:

Lexmatch files can be empty (although they should be predictably empty, e.g. the emptiness should be explainable and I think such files should at least have the column headers in them). es!

Additional info

Context: Original discussion

Approvers: At least 1 approver who is not the PR author is required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs / SOP: How to review a data build #544

Docs / SOP: How to review a data build #544

joeflack4 commented May 25, 2024 •

edited

Loading

joeflack4 commented Jun 13, 2024

joeflack4 commented Jun 14, 2024

joeflack4 commented Jul 25, 2024 •

edited

Loading

matentzn commented Jul 26, 2024

Docs / SOP: How to review a data build #544

Docs / SOP: How to review a data build #544

Comments

joeflack4 commented May 25, 2024 • edited Loading

Overview

Checklist

Checklist item details

4. (Almost) no file should be totally empty.

Additional info

Related

joeflack4 commented Jun 13, 2024

joeflack4 commented Jun 14, 2024

joeflack4 commented Jul 25, 2024 • edited Loading

matentzn commented Jul 26, 2024

joeflack4 commented May 25, 2024 •

edited

Loading

joeflack4 commented Jul 25, 2024 •

edited

Loading