Skip to content

Releases: zellerlab/GECCO

0.6.0

28 Feb 18:44
Compare
Choose a tag to compare

Changed

  • Updated internal model with a cleaned-up version of the MIBiG-2.0
    Pfam-33.1/Tigrfam-15.0 embedding.
  • Updated internal InterPro catalog.

Fixed

  • Features not being grouped together in gecco cv and gecco train
    when provided with a feature table where rows were not sorted by
    protein IDs.

0.5.5

28 Feb 15:32
Compare
Choose a tag to compare

Fixed

  • gecco cv bug causing only the last fold to be written.

0.5.4

28 Feb 15:20
Compare
Choose a tag to compare

Changed

  • Replaced verboselogs, coloredlogs and better-exceptions with rich.

Removed

  • tqdm training dependency.

Added

  • gecco annotate command to produce a feature table from a genomic file.
  • gecco embed to embed BGCs into non-BGC regions using feature tables.

0.5.3

21 Feb 14:33
Compare
Choose a tag to compare

Fixed

  • Coordinates of genes in output GenBank files.
  • Potential issue with the number of CPUs in PyHMMER.run.

Changed

  • Bump required pyrodigal version to v0.4.2 to fix buffer overflow.

0.5.2

29 Jan 20:47
Compare
Choose a tag to compare

Added

  • Support for downloading HMM files directly from GitHub releases assets.
  • Validation of filtered HMMs with MD5 checksum.

Fixed

  • Invalid coordinates of protein domains in GenBank output files.
  • gecco.interpro module not being added to wheel distribution.

Changed

  • Bump required pyhmmer version to v0.2.1.

0.5.1

15 Jan 15:12
Compare
Choose a tag to compare

Fixed

  • --hmm flag being ignored in in gecco run command.
  • PyHMMER using HMM names instead of accessions, causing issues with Pfam HMMs.

0.5.0

11 Jan 15:59
v0.5.0
Compare
Choose a tag to compare

Added

  • Explicit support for Python 3.9.

Changed

  • pyhmmer is used to annotate protein sequences instead of HMMER3 binary hmmsearch.
  • HMM files are stored in binary format to speedup parsing and reduce storage size.
  • tqdm is now a training-only dependency.
  • gecco cv now requires training dependencies.

0.4.5

11 Jan 15:59
v0.4.5
Compare
Choose a tag to compare

Added

  • Additional fold column to cross-validation table output.

Changed

  • Use sequence ID instead of protein ID to extract type from cluster in gecco cv.
  • Install HMM data in pre-pressed format to make hmmsearch runs faster on short sequences.
  • gecco.orf was rewritten to extract genes from input sequences in parallel.

0.4.4

11 Jan 15:59
v0.4.4
Compare
Choose a tag to compare

Added

  • gecco cv loto command to run LOTO cross-validation using BGC types
    for stratification.
  • header keyword argument to FeatureTable.dump and ClusterTable.dump
    to write the table without the column header allowing to append to an
    existing table.
  • __getitem__ implementation for FeatureTable and ClusterTable
    that returns a single row or a sub-table from a table.

Fixed

  • gecco cv command now writes results iteratively instead of holding
    the tables for every fold in memory.

Changed

  • Bumped pandas training dependency to v1.0.

0.4.3

11 Jan 15:59
v0.4.3
Compare
Choose a tag to compare

Fixed

  • GenBank files being written with invalid /cds feature type.

Changed

  • Blocked installation of Biopython v1.78 or newer as it removes Bio.Alphabet
    and breaks the current code.