-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCGA Setup Part 3: Simplified and Optimized ETL #7
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
``` | ||
wget | ||
unzip | ||
java (openjdk-23) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know which part of this java would be required for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's required for running SnpEff. Check out the function annotate_vcf.
The openjdk-23 is the version of Java that I used to run this command. Earlier versions may not work (which will throw an error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Nice work with the documentation, made this pretty easy to follow along. I might look into trying to convert that annotate function to make use of a python library since it seems a bit overkill to use java, but thats for a separate PR and I can take that on if there is time.
The set_tcga_data.py script is now functional!
It will download all data and reference files needed for the analysis.
It will also modify and prepare data for downstream analysis.
Just make sure to read the README to understand what is going on.