Skip to content

Latest commit

 

History

History

src

The code

One-time scripts

  • download.py: downloads the human GEO data sets and their corresponding papers/abstracts.
  • manual_inspect.py and manual_inspect_helper.py: manual inspection CLI tool for GEO data sets. For each data set it displays its title, abstract, description, and summary statistics. Then it displays info of different sample annotations/tasks and the correlation matrix between tasks. It then takes user inputs to decide what tasks to accept/reject and how to remap tasks to binary tasks.
  • prepare_train.py: keep only data sets with certain gene expression attributes and sufficient sample size, then split the data into training and testing group.
  • sample_distributions.py: scrape the pages of ArrayExpress, GEO, and dbGaP for data set sample size.

Implementations

  • l1000.py: holds the L1000 genome data.
  • geo_dataset.py: implements the Dataset class, which holds and prepares training, validation, and testing matrices.
  • geo_models.py: implements different architectures: autoencoder, constraint autoencoder, variational autoencoder, multi-task model, and multi-task model with attention.
  • geo_train.py: implements the training loop.
  • geo_test.py: implements the testing loop.

Experiments