Skip to content

Releases: AnantharamanLab/protein_set_transformer

v1.3.0

23 Oct 16:21
Compare
Choose a tag to compare

Major changes for 1.3

  • Stabilized support for multi-scaffold genome detection and optional features in the graph-formatted data file.
  • References to internal data in the GenomeDataset object now are prepended with the biological feature level. For example, previously, the protein embeddings were stored in .data but are now in .protein_data. There are no changes to the batch objects, however.

Major changes for 1.2

  • The prediction output file has different fields other than data now since there is support for genome fragmentation (for large genomes) and multi-scaffold genomes. Thus, there will be up to 3 fields (fragment, scaffold, and genome) depending on the dataset that represent the protein-based embeddings of contiguous genomic segments and collections of scaffolds in a genome.
  • Genomic scaffolds in the GenomeDataset can be artificially fragmented. This has several purposes:
    1. Scaffolds encoding more proteins than a pretrained PST expected can be used for fine-tuning and inference.
    2. Reduces memory burden if a smaller fragment size is chosen

v1.1.0

09 Oct 20:37
Compare
Choose a tag to compare

First stable release with model abstract base classes to enable finetuning pretrained models with new objectives

  • this release integrates the esm_embed module into the pst command line suite
  • all data associated with this repository and the manuscript can be downloaded using the pst download command