-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build 2024-May-23 (re-run build) #542
Conversation
the build was run as: sh run.sh make build-mondo-ingest -B
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this build for?
Is this a routine build or is this build linked to another PR or issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wrote a thread in a recent PR, IDR if I got a response, but I'm generally not sure of what to do when reviewing builds. I mostly just face check slurp and lexmatch, and just skim through it. I spend like 1-2 minutes on that. If anyone knows of a better practice, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Maybe add a documentation page "How to review a data build" to the docs:
- The data build is not reviewed for specific changes in content, but general patterns of changes. It is recommended to spend 10 minutes reviewing each databuild.
- There are two important reasons to review databuilds: (1) looking out for large unexplainable changes and (2) increasing your familiarity with the data generated by the pipeline. The later is as important as the former: as data stewards in the Mondo Ingest pipeline you should understand all data (every single file!) that is generated insight out, and the best way to do that is to review each file many times until it sticks. It is not wrong to use a data release to ask questions like: "what is the purpose of this file?".
- Ensure that no files are added or removed. There are few good reasons for files being added or removed and if they happen they should be explained.
- ORDO, DOID and OMIM matches and migration files should have "reasonable" changes, i.e. be in line with one could expect as a consequence of a few weeks worth of curation (example: 1000 added lines is not a good sign, but 70 removed lines is within reason).
- Metrics and ontology related files should change within reason (numbers like axiom counts changing in the realms of 250 plus minus are nearly always ok, changes between 250 and 1000 are worth a second look, and changes beyond 1000 merit an investigation).
- No file should be totally empty in a data sync.
From your perspective, then, you could have noticed the massive changes in NCIT, and I would have told you that there had not been an NCIT release for over a year, now there was one, and the big change is very much expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this guidance is beyond what I expected! Been really wondering about this since we've been more active with builds. I think it's a good idea to have a page for this. I created the issue:
There are a few docs issues piling up, so we can probably efficiently do them in a batch; like I think there's a doc-athon coming up IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @joeflack4 maybe make issue with the one comment I made under yours!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed build check; deleted OMIMPS files
RE: Ensure that no files are added or removed.
I was going to approve, but I noticed that there are 2 deleted files in this diff. They're both OMIMPS. I investigated, thinking that this must be because of the www
issue. I have 2 PRs that address this, one in omim
, and one in mondo-ingest
, and both are basically ready to merge. I just checked the diff in the mondo-ingest
PR, and was relieved to see that the files are not deleted there: https://github.com/monarch-initiative/mondo-ingest/pull/520/files
So in any case, I'm guessing it is fine to merge this, but I don't know enough about what goes on the post-mondo-ingest
side of things to know if it's actually OK to merge, or if we instead need to merge the 2 OMIMPS PRs first and then re-run the build again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Could have been OMIMPS, but it this case it likely is not. My guess is it is related to monarch-initiative/mondo#7563! Do you understand how the two might be related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; I didn't know about that. In that case, I'm not sure which is the cause of OMIMPS not appearing here. If it is caused by monarch-initiative/mondo#7563, does this (also) mean that the removal of these OMIMPS file is not a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this is a better of judgement. Here is how my brain thinks of it:
- If you are right (and it is related to the prefix), then all omimps related files (in particular the "exact" ones) would be removed.
- If I am right, only the non-exact ones (close, broad, etc) would be removed.
So I would check if the "exact" one is still there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, so looks like no problem then!
@twhetzel I think you said that this build is no longer relevant and was replaced by:
So I'm going to close this. Feel free to re-open if I am incorrect about that!
the build was run as: sh run.sh make build-mondo-ingest -B
this was re-run due to the failure in #537
Pre-merge checklist
Documentation
Was the documentation added/updated under
docs/
?QC
Was the full pipeline run before submitting this PR using
sh run.sh make build-mondo-ingest
on this branch (afterdocker pull obolibrary/odkfull:dev
), and no errors occurred?New Packages
Were any new Python packages added?
Were any other non-Python packages added?
PR Review and Conversations Resolved
Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?