You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
download.py: downloads the human GEO data sets and their corresponding papers/abstracts.
manual_inspect.py and manual_inspect_helper.py: manual inspection CLI tool for GEO data sets. For each data set it displays its title, abstract, description, and summary statistics. Then it displays info of different sample annotations/tasks and the correlation matrix between tasks. It then takes user inputs to decide what tasks to accept/reject and how to remap tasks to binary tasks.
prepare_train.py: keep only data sets with certain gene expression attributes and sufficient sample size, then split the data into training and testing group.
sample_distributions.py: scrape the pages of ArrayExpress, GEO, and dbGaP for data set sample size.
geo_dataset.py: implements the Dataset class, which holds and prepares training, validation, and testing matrices.
geo_models.py: implements different architectures: autoencoder, constraint autoencoder, variational autoencoder, multi-task model, and multi-task model with attention.