GitHub

A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of transcriptome data

This repository contains the official Keras implementation of:

A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of transcriptome data

Requirements

Python 3.6
conda 4.4.10
keras 2.2.4
tensorflow 1.11.0

1. Model training

Input

Input1: num_samples * num_genes (--inputdata)
Input2: num_samples * num_covarites (--input_covariates)
Input3: num_samples * num_cellytpe (--inputcelltype)
        supervised: one-hot vector for each sample
        semi-supervised: the num_celltype is set to be larger than the cell types in the reference dataset
                         one-hot vector for sample in the reference dataset
                         zero vector for sample in the query dataset
        unsupervised: the num_celltype is a hyperparameter,  and it is the number of cluster categories set by the user
                      zero vector for each sample

About this article

#Augments:
#'--inputdata', type=str, default='data/training_data/Fig3_scgen_count7000r.npz', help='address for input data')
#'--input_covariates', type=str, default='data/training_data/Fig3_scgen_study_condition.npy', help='address for covariate (e.g. batch)')
#'--inputcelltype', type=str, default='data/training_data/Fig3_scgen_cell_type.npy', help='address for celltype label')
#'--randoms', type=int, default=30, help='random number to split dataset')
#'--permute_input', type=str, default='T', help='whether permute the input')

#'--dim_latent', type=int, default=50, help='dimension of latent space')
#'--dim_intermediate', type=int, default=200, help='dimension of intermediate layer')
#'--activation', type=str, default='relu', help='activation function: relu or tanh')
#'--arithmetic', type=str, default='minus', help='arithmetic: minus or plus')

#'--batch_size', type=int, default=500, help='training parameters_batch_size')
#'--epochs', type=int, default=50, help='training parameters_epochs')

#'--training', type=str, default='T', help='training model(T) or loading model(F) ')
#'--weights', type=str, default='data/weights_and_results/Fig3_demo.weight', help='trained weights')

#'--mode', type=str, default='supervised', help='mode: supervised, semi_supervised, unsupervised, user_defined')

#'--reconstruction_loss', type=int, default=5, help='The reconstruction loss for VAE')
#'--kl_cross_loss', type=int, default=1, help='')
#'--prior_distribution_loss', type=int, default=1, help='The assumption that prior distribution is uniform distribution')
#'--label_cross_loss', type=int, default=50, help='Loss for integrating label information into the model')


For supervised mode
python inClust.py --inputdata=data/training_data/Fig3_PBMC_count7000r.npz --input_covariates=data/training_data/Fig3_PBMC_study_condition.npy --inputcelltype=data/training_data/Fig3_PBMC_cell_type.npy --mode=supervised

For semi_supervised mode
python inClust.py --inputdata=data/training_data/Fig5_heart_count.npz --input_covariates=data/training_data/Fig5_heart_batch.npy --inputcelltype=data/training_data/Fig5_heart_label_semi.npy --mode=semi_supervised --permute_input=F

For unsupervised mode
python inClust.py --inputdata=data/training_data/Fig7_10X.npy --input_covariates=data/training_data/Fig7_10X_neighbor.npy --inputcelltype=data/training_data/Fig7_10X_label.npy --mode=unsupervised  --activation=tanh --arithmetic=plus --independent_embed=F --training=F --weights=data/weights_and_results/Fig7_10X_demo.weight

Further Explore

Testing your own dataset
python inClust.py --inputdata=your_data --input_covariates=your_inputcelltype --inputcelltype=your_inputcelltype --mode=your_mode

training weight

results/training.weight

2. Analysis

Demo -- About this article

The following codes could generate analysis data in the main text.

For supervised mode
python inClust.py --inputdata=data/training_data/Fig3_PBMC_count7000r.npz --input_covariates=data/training_data/Fig3_PBMC_study_condition.npy --inputcelltype=data/training_data/Fig3_PBMC_cell_type.npy --mode=supervised --training=F --weights=data/weights_and_results/Fig3_PBMC_demo.weight

For semi_supervised mode
python inClust.py --inputdata=data/training_data/Fig5_heart_count.npz --input_covariates=data/training_data/Fig5_heart_batch.npy --inputcelltype=data/training_data/Fig5_heart_label_semi.npy --mode=semi_supervised --training=F --weights=data/weights_and_results/Fig5_heart_demo.weight

For unsupervised mode

python inClust.py --inputdata=data/training_data/Fig7_10X.npy --input_covariates=data/training_data/Fig7_10X_neighbor.npy --inputcelltype=data/training_data/Fig7_10X_label.npy --mode=unsupervised  --arithmetic=plus --independent_embed=F --training=F --weights=data/weights_and_results/Fig7_10X_demo.weight

Output

The output is in the results folder, including

two latent space representation

mean_vector.npy
Low_dimnesion_vector.npy
batch_vector.npy

predictd label for each sample

predict_labels.csv

3. reproduction figures

demo_reproducing_figures.ipynb

VAE and its variant implementation https://github.com/bojone/vae

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
results		results
README.md		README.md
demo_reproducing_figures.ipynb		demo_reproducing_figures.ipynb
inClust.py		inClust.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of transcriptome data

About

Releases 1

Packages

Languages

wanglf19/inClust

Folders and files

Latest commit

History

Repository files navigation

A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of transcriptome data

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages