-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use indexed BLAST DB if already exists #124
base: master
Are you sure you want to change the base?
Conversation
Currently, when given the --proteins argument, prokka will automatically create a BLAST indexed database using the makeblastdb command, regardless of whether or not one already exists. This modification will search the directory of the protein database that is supplied by the user to determine whether an indexed database with the same prefix as the fasta file supplied already exists. If it does, prokka will use that instead of creating a new one.
@cnthornton Thanks for this request - I have some comments: I assume you want to provide a large database to Prokka but not have to format it each time? If so, this is related to Issue #90 The code you provide is fine, except I am not sure if parsing for a .XXX suffix is correct. What happens if the original file was "refseq.proteins" - it will look for "refseq.pin" rather than "refseq.proteins". I think what we want to do is allow the --proteins option to be a BLAST database name. If the --proteins file doesn't exist at all, just check for a .pal or .pin and go. No need to extract any .XXX suffix. The other big problem is that later on the files get deleted!
Does this make sense? |
Your assumption is correct. The protein database that I have been giving prokka is large, although not excessively so. And while makeblastdb does not contribute significantly to the run time, it does add up when run frequently enough. You are right though - there really is no need to strip away the portion presumed to be the file extension. I will fix this, as well as add an associated message, when I get back from my upcoming sampling expedition next Thursday. As for your other point, I don't think that that section of code will be a problem. The files being deleted there should not exist anyway unless somehow the indexed dbs being provided are in the output directory, which I think should never be the case. However, it is sloppy coding on my end. I should have modified that section to check the existence of one or more of the actual db files in the output directory, and if they exist delete them. |
Currently, when given the --proteins argument, prokka will automatically create a BLAST indexed database using the makeblastdb command, regardless of whether or not one already exists. This modification will search the directory of the protein database that is supplied by the user to determine whether an indexed database with the same prefix as the fasta file supplied already exists. If it does, prokka will use that instead of creating a new one.