Releases: AnantharamanLab/protein_set_transformer
Releases · AnantharamanLab/protein_set_transformer
v1.3.0
Major changes for 1.3
- Stabilized support for multi-scaffold genome detection and optional features in the graph-formatted data file.
- References to internal data in the
GenomeDataset
object now are prepended with the biological feature level. For example, previously, the protein embeddings were stored in.data
but are now in.protein_data
. There are no changes to the batch objects, however.
Major changes for 1.2
- The prediction output file has different fields other than
data
now since there is support for genome fragmentation (for large genomes) and multi-scaffold genomes. Thus, there will be up to 3 fields (fragment, scaffold, and genome) depending on the dataset that represent the protein-based embeddings of contiguous genomic segments and collections of scaffolds in a genome. - Genomic scaffolds in the
GenomeDataset
can be artificially fragmented. This has several purposes:- Scaffolds encoding more proteins than a pretrained PST expected can be used for fine-tuning and inference.
- Reduces memory burden if a smaller fragment size is chosen
v1.1.0
First stable release with model abstract base classes to enable finetuning pretrained models with new objectives
- this release integrates the esm_embed module into the
pst
command line suite - all data associated with this repository and the manuscript can be downloaded using the
pst download
command