Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Do not merge) Port of nvscorevariants into GATK, with a basic tool frontend #8004

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

droazen
Copy link
Collaborator

@droazen droazen commented Aug 26, 2022

Minimal GATK port of nvscorevariants from https://github.com/NVIDIA-Genomics-Research/nvscorevariants

The tool runs successfully in both 1D and 2D modes, and a strict integration test passes for the 1D model. However, this PR has a number of outstanding issues that need to be resolved before it can be merged and replace the legacy CNNScoreVariants tool:

  • The conda environment in scripts/nvscorevariants_environment.yml needs to be incorporated into the main GATK conda environment

  • The integration test for the 2D model does not currently pass, despite using a much higher epsilon than the 1D test. Some of the scores differ by significant amounts vs. the CNNScoreVariants 2D output. We need to investigate why this is.

  • There is currently no training tool to train a new model, like there is for the legacy CNN tool.

@samuelklee and @mwalker174 , could you please comment on what it would take to incorporate the scripts/nvscorevariants_environment.yml conda environment into the main GATK conda environment, assuming we are free to remove/retire the CNN tool?

@lbergelson and @zamirai, please do a general code review when you get a chance.

@gatk-bot
Copy link

gatk-bot commented Aug 26, 2022

Github actions tests reported job failures from actions build 2935907552
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 11 2935907552.11 logs
cloud 8 2935907552.10 logs
unit 11 2935907552.13 logs
integration 11 2935907552.12 logs
conda 8 2935907552.3 logs
unit 8 2935907552.1 logs
variantcalling 8 2935907552.2 logs
integration 8 2935907552.0 logs

@samuelklee
Copy link
Contributor

Thanks, @droazen! @asmirnov239 has been looking at PyMC3 updates for gCNV, which will help unlock the conda environment. I understand he has a working branch, but needs to do more testing—perhaps he can comment further?

@zamirai
Copy link

zamirai commented Aug 29, 2022

Thanks @droazen! What data are you using to test the 2D model? And can we have access to your verification method?

@gatk-bot
Copy link

gatk-bot commented Sep 6, 2022

Github actions tests reported job failures from actions build 3002176541
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 11 3002176541.11 logs
cloud 8 3002176541.10 logs
unit 11 3002176541.13 logs
integration 11 3002176541.12 logs
unit 8 3002176541.1 logs
integration 8 3002176541.0 logs
variantcalling 8 3002176541.2 logs
conda 8 3002176541.3 logs

@gatk-bot
Copy link

gatk-bot commented Sep 20, 2022

Github actions tests reported job failures from actions build 3092731818
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 8 3092731818.10 logs
cloud 11 3092731818.11 logs
unit 11 3092731818.13 logs
integration 11 3092731818.12 logs
conda 8 3092731818.3 logs
unit 8 3092731818.1 logs
integration 8 3092731818.0 logs
variantcalling 8 3092731818.2 logs

@droazen
Copy link
Collaborator Author

droazen commented Sep 20, 2022

@zamirai I've incorporated your patch from https://github.com/NVIDIA-Genomics-Research/nvscorevariants/commit/937ffafb78b0f3e7df9b1edc3b08d11e3ebee35a into this PR. With this change, the 2D tests now pass, even when I reduce the epsilon to 0.01. Thanks for the fix!

@asmirnov239 is now working on merging the new conda environment into the GATK conda environment and making the necessary updates to existing tools. This will likely require at least another few weeks.

@gatk-bot
Copy link

gatk-bot commented Sep 20, 2022

Github actions tests reported job failures from actions build 3092905417
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 8 3092905417.10 logs
cloud 11 3092905417.11 logs
unit 11 3092905417.13 logs
integration 11 3092905417.12 logs
unit 8 3092905417.1 logs
conda 8 3092905417.3 logs
variantcalling 8 3092905417.2 logs
integration 8 3092905417.0 logs

@droazen
Copy link
Collaborator Author

droazen commented Oct 20, 2022

Rebased onto latest master

@gatk-bot
Copy link

gatk-bot commented Oct 20, 2022

Github actions tests reported job failures from actions build 3291375153
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 8 3291375153.10 logs
cloud 11 3291375153.11 logs
unit 11 3291375153.13 logs
integration 11 3291375153.12 logs
unit 8 3291375153.1 logs
conda 8 3291375153.3 logs
variantcalling 8 3291375153.2 logs
integration 8 3291375153.0 logs

@gatk-bot
Copy link

gatk-bot commented Oct 21, 2022

Github actions tests reported job failures from actions build 3300297321
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 8 3300297321.10 logs
unit 11 3300297321.13 logs
cloud 11 3300297321.11 logs
conda 8 3300297321.3 logs
integration 11 3300297321.12 logs
unit 8 3300297321.1 logs
variantcalling 8 3300297321.2 logs
integration 8 3300297321.0 logs

@gatk-bot
Copy link

gatk-bot commented Oct 21, 2022

Github actions tests reported job failures from actions build 3300316784
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 8 3300316784.10 logs
cloud 11 3300316784.11 logs
unit 11 3300316784.13 logs
integration 11 3300316784.12 logs
conda 8 3300316784.3 logs
unit 8 3300316784.1 logs
variantcalling 8 3300316784.2 logs
integration 8 3300316784.0 logs

@gatk-bot
Copy link

gatk-bot commented Sep 27, 2024

Github actions tests reported job failures from actions build 11076165405
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 17.0.6+10 11076165405.10 logs
unit 17.0.6+10 11076165405.12 logs
integration 17.0.6+10 11076165405.11 logs
unit 17.0.6+10 11076165405.1 logs
conda 17.0.6+10 11076165405.3 logs
variantcalling 17.0.6+10 11076165405.2 logs
integration 17.0.6+10 11076165405.0 logs

@gatk-bot
Copy link

gatk-bot commented Sep 27, 2024

Github actions tests reported job failures from actions build 11077108461
Failures in the following jobs:

Test Type JDK Job ID Logs
unit 17.0.6+10 11077108461.1 logs
conda 17.0.6+10 11077108461.3 logs
variantcalling 17.0.6+10 11077108461.2 logs
integration 17.0.6+10 11077108461.0 logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants