Skip to content

Latest commit

 

History

History
92 lines (64 loc) · 6.16 KB

CHANGELOG.md

File metadata and controls

92 lines (64 loc) · 6.16 KB

0.2.6 (UNRELEASED)

Datasets

  • Correct stage-based conditions mentioned in notebook tutorials #92
  • Add stage-based conditions to setup in ProteinDataModule #72
  • Improves support for datamodules with multiple test sets. Generalises this to support GO and FOLD. Also adds multiple seq ID.-based splits for GO. #72
  • Add redownload checks for already downloaded datasets and harmonise pdb download interface #86
  • Remove remaining errors from PDB dataset change
  • Add option to create pdb datasets with sequence-based splits #88 as well as time-based splits #89

Models

  • Adds missing pos attribute to GearNet required_batch_attributes (fixes #73) #74
  • Fixes PDB download failure due to missing protein data #77
  • Add support for handling training/validation OOMs gracefully #81
  • Add support for handling backward OOMs gracefully #83
  • Update GCPNet paper link #85

Framework

  • Adds InverseSquareRoot LR scheduler #71

Command

  • Adds --force-cuda-version to workshop install #78

Features

  • Fix sequence_edges behaviour when argument b is a Data object #80

Misc

  • Update ICLR paper link and citation #82
  • Add an optional group for installing plotting and analysis specific libraries to lighten the install of the core framework #90

0.2.5 (28/12/2023)

Datasets

  • Adds to antibody-specific datasets using the IGFold corpuses for paired OAS and Jaffe 2022 #53
  • Set in_memory=True as default for most (small) datasets for improved performance #53
  • Fix num_classes for GO datamodules * Set in_memory=True as default for most (downstream) datasets for improved performance #53
  • Fixes GO labelling #53

Features

  • Improves positional encoding performance by adding a seq_pos attribute on Data/Protein objects in the base dataset getter. #53
  • Ensure correct batched computation of orientation features. #58

Models

  • Implement ESM embedding encoder (#33, #41)
  • Adds CDConv implementation #53
  • Adds tuned hparams for models #53

Framework

  • Refactors beartype/jaxtyping to use latest recommended syntax #53
  • Adds explainability module for performing attribution on a trained model #53
  • Change default finetuning features in config: ca_base -> ca_seq #53
  • Add optional hparam entry point to finetuning config #53
  • Fixes GPU memory accumulation for some metrics #53
  • Updates zenodo URL for processed datasets to reflect upstream API change #53
  • Adds multi-hot label encoding transform #53
  • Fixes auto PyG install for torch>2.1.0 #53
  • Adds proteinworkshop.model_io containing utils for loading trained models #53
  • Add script for plotting UMAP embeddings of any dataset given a pre-trained encoder model

0.2.4 (10/09/2024)

  • Fixes error in Metal3D processed download link (#28)
  • Fixes typo in wandb run name setting (#30)
  • Fixes paths for models and datasets when testing instantiation of each module (#32)
  • Improvements to TFN, MACE and EGNN models and layers, including DiffDock-style intermediate edge feature creation (TFN), dropout, gaussian RBF, mean global pooling (#38)

0.2.3 (31/08/2023)

  • Minor patch; adds missing overwrite attribute to CATHDataModule, FoldClassificationDataModule and GeneOntologyDataModule. (#25)

0.2.2 (30/08/2023)

  • Fixes raw data download triggered by absence of PDB when using pre-processed datasets (#24)
  • Fixes bug where batches created from in_memory=True data were not correctly formatted (#24)
  • Consistently exposes the overwrite argument for datamodules to users (#24)
  • Fixes bug where downloading FoldComp datasets into directories with the same name as the dataset throws an error (#24)
  • Increments graphein dependency to 1.7.3 (#24)

0.2.1 (29/08/2023)

  • Fixes incorrect lookup of DATA_PATH env var (#19)

0.2.0 - 28/08/2023

  • First public release