diff --git a/README.Rmd b/README.Rmd index f6158c2..f144ac2 100644 --- a/README.Rmd +++ b/README.Rmd @@ -84,16 +84,32 @@ load(rse_file, verbose = TRUE) ``` -In this next step we subset for the transcripts associated with degradation. These were determined by Joshua M. Stolz et al, 2022. We have provided three models to choose from. Here the names `"cell_component"`, `"top1500"`, and `"standard"` refer to models that were determined to be effective in removing degradation effects. The `"standard"` model involves taking the union of the top 1000 transcripts associated with degradation from the interaction model and the main effect model. The `"top1500"` model is the same as the `"standard"` model except the union of the top 1500 genes associated with degradation is selected. The most effective of our models, `"cell_component"`, involved deconvolution of the degradation matrix to determine the proportion of cell types within our studied tissue. These proportions were then added to our `model.matrix()` and the union of the top 1000 transcripts in the interaction model, the main effect model, and the cell proportions model were used to generate this model of qSVs. In this example we will choose `"cell_component"` when using the `getDegTx()` and `select_transcripts()` functions. +In this next step, we subset to the transcripts associated with degradation. +`qsvaR` provides significant transcripts determined in four different linear +models of transcript expression against degradation time, brain region, and +potentially cell-type proportions: -```{r VennDiagram,fig.cap="The above venn diagram shows the overlap between transcripts in each of the previously mentioned models.", echo = FALSE} -knitr::include_graphics("./man/figures/transcripts_venn_diagramm.png") -``` +1. `exp ~ DegradationTime + Region` +2. `exp ~ DegradationTime * Region` +3. `exp ~ DegradationTime + Region + CellTypeProp` +4. `exp ~ DegradationTime * Region + CellTypeProp` + +`select_transcripts()` returns degradation-associated transcripts and supports +two parameters. First, `top_n` controls how many significant transcripts to +extract from each model. When `cell_component = TRUE`, all four models are used; +otherwise, just the first two are used. The union of significant transcripts +from all used models is returned. + +As an example, we'll subset our `RangedSummarizedExperiment` to the union of +the top 1000 significant transcripts derived from each of the four models. ```{r select_transcripts} -## Next we get the degraded transcripts for qSVA from the "cell_component" -## model -DegTx <- getDegTx(rse_tx, type = "cell_component") +# Subset 'rse_tx' to the top 1000 significant transcripts from the four +# degradation models +DegTx <- getDegTx( + rse_tx, + sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE) +) ## Now we can compute the Principal Components (PCs) of the degraded ## transcripts @@ -123,7 +139,12 @@ This can be done in one step with our wrapper function `qSVA` which just combind ```{r "wrapper function"} ## Example use of the wrapper function qSVA() -qsvs_wrapper <- qSVA(rse_tx = rse_tx, type = "cell_component", mod = mod, assayname = "tpm") +qsvs_wrapper <- qSVA( + rse_tx = rse_tx, + sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE), + mod = mod, + assayname = "tpm" +) dim(qsvs_wrapper) ``` diff --git a/README.md b/README.md index 7dc8cd3..0c30243 100644 --- a/README.md +++ b/README.md @@ -113,6 +113,7 @@ rse_file <- BiocFileCache::bfcrpath( "https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata", x = bfc ) +#> adding rname 'https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata' ## Now that we have the data in our computer, we can load it. load(rse_file, verbose = TRUE) @@ -120,38 +121,34 @@ load(rse_file, verbose = TRUE) #> rse_tx ``` -In this next step we subset for the transcripts associated with -degradation. These were determined by Joshua M. Stolz et al, 2022. We -have provided three models to choose from. Here the names -`"cell_component"`, `"top1500"`, and `"standard"` refer to models that -were determined to be effective in removing degradation effects. The -`"standard"` model involves taking the union of the top 1000 transcripts -associated with degradation from the interaction model and the main -effect model. The `"top1500"` model is the same as the `"standard"` -model except the union of the top 1500 genes associated with degradation -is selected. The most effective of our models, `"cell_component"`, -involved deconvolution of the degradation matrix to determine the -proportion of cell types within our studied tissue. These proportions -were then added to our `model.matrix()` and the union of the top 1000 -transcripts in the interaction model, the main effect model, and the -cell proportions model were used to generate this model of qSVs. In this -example we will choose `"cell_component"` when using the `getDegTx()` -and `select_transcripts()` functions. +In this next step, we subset to the transcripts associated with +degradation. `qsvaR` provides significant transcripts determined in four +different linear models of transcript expression against degradation +time, brain region, and potentially cell-type proportions: -