Change Log

Version 2.0b4 (ongoing)

Changed

Updated pre-built default database to 2023-01-02 (after NCBI RefSeq 215).

Version 2.0b3 (11/25/2021)

Added

Predicted HGT list now includes potential donors. Users can optionally specify a taxonomic rank at which they will be reported.
A quick-start guide added to the homepage.
Let the database construction task resume from an interrupted task and use already downloaded files, while detecting and skipping corrupt files.
Added an option --above to sample genomes at all ranks from the designated one to phylum during database construction.
Added an option --typemater to include NCBI-defined type materials in genome sampling during database construction.
Added an option --manual to export URLs of sampled genomes during database construction, and let the user download them manually.

Changed

Updated pre-built default database to 2021-11-21 (after NCBI RefSeq 209).
Repository transferred from DittmarLab to qiyunlab.
Updated recommended dependency versions, however the program should continue to be compatible with previous versions.
Minor tweaks with no visible impact on program behavior.

Fixed

Fixed an issue with the NCBI FTP server connection during database construction. NCBI now recommends rsync over ftp. Therefore the protocol has been updated accordingly.
Fixed compatibility with latest scikit-learn (1.0.1).
Fixed compatibility with latest DIAMOND (2.0.13).

Version 2.0b2 (5/3/2020)

Note: The command-line interface and the format and content of input, output and database files are identical to last version (2.0b1). You may safely combine analysis results of this and last versions without the need for re-analysis.

Changed

Increased Python unit testing coverage.
Re-styled Python code to adopt recent features (3.6+).
Updated installation tutorial, script and dependency versions.
Minor tweaks of code.

Added

Instructions for using the GTDB database.

Fixed

Removed deprecated parameter "iid" for grid search since scikit-learn 0.22.
Fixed a bug in custom genome exclusion during database building.
Fixed a bug which breaks the program when the temporary directory (tmpdir) is not specified.
Allowed matplotlib to generate figures when it cannot open X window (e.g., in a remote server).

Version 2.0b1 (complete rework) (10/23/2019)

Changed

Completely re-written the program using Python 3, following modern software engineering practices.
Changed license to BSD 3-clause.
Combined search results into a single file to improve disk I/O performance.
New format of taxonomy database which is identical to NCBI taxdump.
Relaxed from dependency on a sequence-to-taxon mapping file to improve disk and memory performance.
Re-designed module organization. Now only two modules: "search" and "analyze" are required for a typical analysis.

Added

Standardized installation method using pip / Conda.
Standardized command-line user interface.
Complete documentation in GitHub markdown format.
Automated decision of search strategy.
- Support for extra search arguments.
Support for pre-computed search results.
Automated self-alignment workflow.
- Built-in algorithm for calculating self-alignment score.
Automated inference of input genome taxonomy.
Automated design of grouping scenario.
Automated optimization of kernel bandwidth.
- Built-in algorithm for progressive kernel bandwidth calculation.
- Grid search with cross validation for bandwidth optimization.
Automated refinement of prediction result using silhouette scores in 2D space.
Database download workflow can resume from an interrupted run.
New plotting features.

Fixed

NCBI FTP server timeout issue.
Compatible with latest DIAMOND (0.9.26).
Compatible with latest NCBI BLASTp and eFetch servers.

Removed

Dependency on R
Modules: reporter, treer, orthologer.
HTML user interface.
Step-by-step wizard.
RAPSearch2 support.
Dip test for unimodality.
Histogram-based clustering.

Version 0.2.2 (8/23/2017)

Added

Provided a pre-built standard database.
Added an installation script installer.sh.
Adopted new NCBI standard, i.e., using accession instead of GI as sequence identifier.
Modified the mechanism of databaser.py. Now it downloads both DNA and protein sequences.
Replaced 'http' with 'https' in NCBI server URLs.
Adopted the command-line interface of DIAMOND 0.8+.

Fixed

Reformatted all Perl and Python scripts.
Reworded multiple screen prompts.
Multiple trivial bug fixes.

Incremental update (1/28/2017)

Fixed

Fixed a bug in databaser.py which incorrectly deals the case with subsampling off.

Incremental update (11/6/2016)

Added

Added Python 3 support for databaser.py

Fixed

NCBI has changed the FTP site file system structure (see notification). So databaser.py was modified accordingly.

Version 0.2.1 (2/2/2016)

Added

Optimized the process of self-alignment by DIAMOND and RAPSearch2. Significantly faster. Results don't change.
Added a smart decision-making process. If the user does not specify a proper grouping scenario, the program will try to generate one.
Changed the way of creating graphics.

Fixed

Disabled FTP passive mode in databaser.py.

Version 0.2.0 (major upgrade) (10/25/2015)

Changed

Major modification of the entire pipeline.

Structural change:

Renamed blaster and summarizer as searcher and reporter, respectively.
Removed filterer, taxonomer and validator.
Renamed input / output files and directories.
Re-wrote searcher.
Re-wrote the main program.
Re-wrote the GUI.

Added

New script databaser.py to create a customized, taxonomically balanced BLAST database, which significantly improves the efficiency and accuracy of HGT discovery, comparing to the original nr.
Added native support for non-BLAST search tools, including RAPSearch2 and DIAMOND.
Allowed one protein to correspond to multiple TaxIDs. This further balances the taxonomic distribution of reference proteins.

Fixed

Multiple bug fixes.

Incremental update (9/1/2015)

Added

Re-wrote the GUI.

Fixed

Several bug fixes.

Version 0.1.9 (6/1/2015)

Added

Accepting pre-computed BLAST results (by BLAST, Blat, Usearch, etc).
Sending multiple query sequences per search, thus accelerating the batch BLAST task.
Looking up taxonomy from a GI-to-TaxID dictionary, thus accelerating the process.
Plotting reference gene set along with prediction results.
Percentage coverage cutoff (for BLAST search).

Fixed

Updated the BLAST server URL.
Improved efficiency of batch BLAST and result processing.
Allowed taxonomy of input genomes defined in config.txt.
Multiple bug fixes in Perl code and in GUI.

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Change Log

Version 2.0b4 (ongoing)

Changed

Version 2.0b3 (11/25/2021)

Added

Changed

Fixed

Version 2.0b2 (5/3/2020)

Changed

Added

Fixed

Version 2.0b1 (complete rework) (10/23/2019)

Changed

Added

Fixed

Removed

Version 0.2.2 (8/23/2017)

Added

Fixed

Incremental update (1/28/2017)

Fixed

Incremental update (11/6/2016)

Added

Fixed

Version 0.2.1 (2/2/2016)

Added

Fixed

Version 0.2.0 (major upgrade) (10/25/2015)

Changed

Added

Fixed

Incremental update (9/1/2015)

Added

Fixed

Version 0.1.9 (6/1/2015)

Added

Fixed