-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed Pipeline #7
Comments
sff -> fastq should be completed in |
So it seems like the filter and transform steps will likely produce paired + unpaired output as any time you filter or possibly cut and entire read out of one of the paired files, it will orphan the paired read in the other file |
I guess the important thing about that is that we will have to accept paired + unpaired for every step that takes in reads |
I welcome ideas about how to test the consensus program (creates consensus from VCF and reference) especially using hypothesis. presentation on property-based testing Also worth checking out "Metamorphic Testing" link specific to bioinformatics |
Haha, this is just so much YES |
Seems like the consensus would be fairly straight forward Strategy to generate fasta for reference So then the question becomes what is the correct consensus |
Here is a cached link to the paper above. |
Here's what I'm planning:
|
I feel like there should be a majority(>50%) thing in there somewhere |
Regarding the <10 depth case: Indeed there is an inconsistency. from here
We don't have the quality for each individual base, but we do have the total quality for the ALTS and ref-supporting bases (recorded as QA and QR respectively). (25 comes from minimum quality and 8 comes from majority threshold of 80%)
This seems reasonable. edit: I realized that this method doesn't bias at all towards the reference, which I think we want. |
@mmelendrez, could use your input. Review of the the discussion starting here: We are discussing the logic used to call bases from a freebayes VCF file. I am ready to add special logic for the case where the total depth of reads at a point is It's not possible to exactly duplicate You could argue that this approximation would be misleading, because ALT or REF can be multiple bases, so the All of this, in my opinion, suggests that something simple should be done and either |
@InaMBerry may also like to contribute to the discussion starting here |
Let me get my head around this. A convo might work better for me to make sure I understand what freebayes can and cannot do when implemented in our pipeline since it cannot emulate what ngs_mapper does right now (taking quality/base into account when calling bases). Lets chat tomorrow. Then I can summarize our thoughts here and @InaMBerry and @necrolyte2 can put input in. |
Here's a listing of all the information freebayes provides by default: https://gist.github.com/averagehat/29c4ad1f3a7837fb2f82#file-gistfile1-txt |
Inputs:
Transformation:
Filter:
Transformation (trimming):
-a
,-A
,-g
,-G
--trim-n
-q <five>,<three>
or-q <five>
-u <X>
Reduction (mapping/assembly): -- This is where denovo/mapping branch
freebayes
on the bam fileThe text was updated successfully, but these errors were encountered: