-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using non-prodigal gene calls #83
Comments
Hello, Yes, at the moment this is a pretty glaring limitation of inStrain. Fixing this has been near the top of my ToDo list for a while. Can I ask what format your genes are in? There is experimental support for gene input in the form of genbank files, but that feature is not very well tested. Best, |
Nothing fancy, it's basically just a fasta file as follows:
I'm not sure what information inStrain is extracting from the prodigal-do you need coordinates? Maybe the file + GFF annotations might work better than gene calls? |
Here is an example prodigal header:
The things inStrain picks up from that are:
If you add all that to your gene headers it should work while I try and add a better solution. Best, Also just FYI, the specific function that does this parsing is |
Okay, I can whip that up easily enough! |
Unfortunately I get the same coverage KeyError described in #56. I'll see if I can figure out why. |
Let me know if this ends up remaining a problem. My guess is that the problem is that the coordinates of the genes are slightly different from how prodigal reports them. InStrain translates the gene into amino acid space based on the gene coordinates, so if they're off-by-one and it ends up with an amount of nucleotides that's not divisible by 3, that'll throw an error. Let me know if this continues to be an issue, and at the very least I can update inStrain to throw the exception logs into the log file instead of STDOUT so we can at least know what the error code is. -Matt |
That gives me an idea, maybe I'm outputting the coordinates incorrectly. I'll check that and get back to you! |
Hi,
But instrain profile returns the error
I am not completely sure what to make of it, should i replace /locus_tag with /gene? Thank you Amendment: |
Hi @nschan - Thanks for reaching out and sorry for this mess. The current .gbk parsing is really weird, and really only works for specific weirdly-formatted .gbk files. I have re-written a good part of this code in a developmental version of inStrain that I'm working on but it's not quite ready yet. I really hope to have this working and pushed to the development branch of inStrain soon; until then all I can say is that prodigal-formatted genes are really the only supported option. Apologies again, |
Hi Matt,
Thanks for getting back to me. Using prodigal for inStrain works very well.
Currently I am adding in the annotations later on during analysis, which
works fine for me. From my limited experience gbk files tend to be kind of
poorly standardized and reliably parsing them can be a bit of a pain..
Maybe more of a feature suggestion, but do you by any chance have plans to
expand the snv region information "intergenic" to mark snvs located in
non-cds annotations (rRNA, tRNA, etc)?
Kind regards
Niklas
|
Hi @nschan - I will add this to my "To Do" list, but it likely won't be ready for a while. Right now a big reason I do the intragenic gene calling is to call non-synonymous / synonymous mutations, which isn't possible for non-cds annotations, but I do see the value in this. Best, |
Hi,
I have a set of already called genes (not from prodigal) that I would like to use as input to inStrain. Is there any easy way to do this? Based on a previous issue (#56 specifically), it appears that inStrain depends on prodigal input specifically.
The text was updated successfully, but these errors were encountered: