-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 99 #102
Issue 99 #102
Conversation
can use cached seqs in JSON and/or FASTA file, and can use a combiation of cache and db seqs
The initial problem stated in issue #99 is fixed. However, GenBank accessions that are no longer listed in NCBI are still retried as many times as defined by |
Unit test coverage could be increased to >= 60% |
chage skip_download to bool not path
catch runtime errors, incomplete reads, notxmlerorr, typeerror and attributeerror. Improve detection of invalid IDs. Working on not retrying conenction for invalid IDs. Invalid Ids are kepy searate ffrom IDs whose querying to NCBI was interrupted by a connection failure
move the functions that perform the called to NCBI.Entrez to the NCBI module
…n batches. Process each separetly
use downloaded seq and overwrite cached seq
add missing commas, remove unneeded brackets
Codecov Report
@@ Coverage Diff @@
## master #102 +/- ##
==========================================
- Coverage 58.62% 56.57% -2.06%
==========================================
Files 60 61 +1
Lines 5337 5541 +204
==========================================
+ Hits 3129 3135 +6
- Misses 2208 2406 +198 |
do not remove batch from dict of failed batches multiple times, only once finished parsing batch
Changes implementedChanging operation has been successful:
Downloaded protein sequences are cached to a FASTA file. Updated information in the docs on caching. Future developments notes[1]
[2] Add the protein description to a new column called |
Fix Issue #99
Improve handling when incurring errors when retrieving data from NCBI