Using TCR sequences for sequence embedding
Single-cell sequencing is now a integral tool in the field of immunology and oncology that allows researchers to couple RNA quantification and other modalities, like immune cell receptor profiling at the level of an individual cell. Towards this end, we developed the scRepertoire R package to assist in the interaction of immune receptor and gene expression sequencing. However, utilization of clonal indices for more complex analyses are still lacking, specifically in using clonality in embedding of single-cells. To this end, we developed an R package that uses deep learning to vectorize TCR sequences using order or translating the sequence into amino acid properties.
If you are looking for the (very cool) TCR-epitope prediction algorithm TCRex, check out their website here.
Trex has been tested on R versions >= 4.0. Please consult the DESCRIPTION file for more details on required R packages - it is specifically designed to work with single-cell objects that have had TCRs added using scRepertoire. Trex has been tested on OS X and Windows platforms.
keras is necessary to use the autoencoder function (this includes the set up of the tensorflow environment in R):
##Install keras
install.packages("keras")
##Setting up Tensor Flow
library(reticulate)
conda_create("r-reticulate") ##If first time using reticulate
use_condaenv(condaenv = "r-reticulate", required = TRUE)
library(keras)
install_keras()
An alternative to this approach above (especially if you want to avoid conda) is to use reticulate to generate a virtualenv, using virtualenv_create()
and subsequently installing the above python packages using virtualenv_install()
.
Each of the models available in Trex follow similar architecture with depth and width of input layers, epochs, batch size, and early stopping calls. The major difference is the size of the input layer, depending on the method chosen with encoder.input.
To run Trex, open R and install Trex from github:
devtools::install_github("ncborcherding/Trex")
Trex should be able to be run in popular R-based single-cell workflows, including Seurat and Bioconductor/Single-Cell Experiment formats.
Check out this vignette for a quick start tutorial.
The Trex algorithm allows users to select TCR-based metrics to return autoencoded values to be used in dimensional reduction. If single-cell objects are not filtered for T cells with TCR, maTrex()
will still return values, however TREX_1 will be based on the disparity of TCR-containing and TCR-non-containing cells based on the Trex algorithm.
library(Trex)
my_trex <- maTrex(singleObject)
You can run Trex within your Seurat or Single-Cell Experiment workflow. Importantly runTrex()
will automatically filter single-cells that do not contain TCR information in the meta data of the single-cell object.
seuratObj_Tonly <- runTrex(seuratObj, #The single cell object
chains = "TRB", #Use of "TRA" or "TRB"
method = "encoder", #Use "encoder" for CNNs or "geometric" geometric-based transformation
encoder.model = "VAE" #"VAE" (variational autoencoder) or "AE" (autoencoder)
encoder.input = "AF" #Inputs into encoder - "AF", "KF", "both", "OHE
reduction.name = "Trex", #Name designation for slot in single-cell object)
seuratObj_Tonly <- runTrex(seuratObj, reduction.name = "Trex")
From here, you can generate a tSNE/UMAP using the Trex values, similar to the PCA values based on variable gene expression.
seuratObj <- RunTSNE(seuratObj, reduction = "Trex", reduction.key = "Trex_")
seuratObj <- RunUMAP(seuratObj, reduction = "Trex", reduction.key = "Trex_")
If using Seurat package, the Trex embedding information and gene expression PCA can be used to find the Weighted Nearest Neighbors. Before applying the WNN approach, best practice would be to remove the TCR-related genes from the list of variable genes and rerunning the PCA analysis.
seuratObj <- quietTCRgenes(seuratObj)
seuratObj <- RunPCA(seuratObj)
seuratObj <- FindMultiModalNeighbors(seuratObj,
reduction.list = list("pca", "Trex"),
dims.list = list(1:30, 1:20),
modality.weight.name = "RNA.weight")
seuratObj <- RunUMAP(seuratObj,
nn.name = "weighted.nn",
reduction.name = "wnn.umap",
reduction.key = "wnnUMAP_")
If you run into any issues or bugs please submit a GitHub issue with details of the issue.
- If possible please include a reproducible example. Alternatively, an example with the internal trex_example would be extremely helpful.
Any requests for new features or enhancements can also be submitted as GitHub issues.
Pull Requests are welcome for bug fixes, new features, or enhancements.
If using the Trex package, please cite our manuscript. This is also a good place to find more information about the models.