diff --git a/articles/Intro_qsvaR.html b/articles/Intro_qsvaR.html index 9be159e..f3eb38d 100644 --- a/articles/Intro_qsvaR.html +++ b/articles/Intro_qsvaR.html @@ -99,7 +99,7 @@
vignettes/Intro_qsvaR.Rmd
Intro_qsvaR.Rmd
Date the vignette was generated.
-#> [1] "2024-11-26 20:48:38 UTC"
+#> [1] "2024-11-27 17:02:29 UTC"
Wallclock time spent generating the vignette.
-#> Time difference of 54.017 secs
+#> Time difference of 54.052 secs
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
@@ -632,7 +632,7 @@ Reproducibility#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz UTC
-#> date 2024-11-26
+#> date 2024-11-27
#> pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
@@ -711,7 +711,7 @@ Reproducibility#> plyr 1.8.9 2023-10-02 [1] RSPM (R 4.3.0)
#> png 0.1-8 2022-11-29 [1] RSPM (R 4.3.0)
#> purrr 1.0.2 2023-08-10 [2] RSPM (R 4.3.0)
-#> qsvaR * 1.9.1 2024-11-26 [1] Bioconductor
+#> qsvaR * 1.9.1 2024-11-27 [1] Bioconductor
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.3.0)
#> ragg 1.3.0 2024-03-13 [2] RSPM (R 4.3.0)
#> Rcpp 1.0.12 2024-01-09 [2] RSPM (R 4.3.0)
diff --git a/index.html b/index.html
index 882f351..792b157 100644
--- a/index.html
+++ b/index.html
@@ -120,21 +120,29 @@ Example
"https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata",
x = bfc
)
+#> adding rname 'https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata'
## Now that we have the data in our computer, we can load it.
load(rse_file, verbose = TRUE)
#> Loading objects:
#> rse_tx
-In this next step we subset for the transcripts associated with degradation. These were determined by Joshua M. Stolz et al, 2022. We have provided three models to choose from. Here the names "cell_component"
, "top1500"
, and "standard"
refer to models that were determined to be effective in removing degradation effects. The "standard"
model involves taking the union of the top 1000 transcripts associated with degradation from the interaction model and the main effect model. The "top1500"
model is the same as the "standard"
model except the union of the top 1500 genes associated with degradation is selected. The most effective of our models, "cell_component"
, involved deconvolution of the degradation matrix to determine the proportion of cell types within our studied tissue. These proportions were then added to our model.matrix()
and the union of the top 1000 transcripts in the interaction model, the main effect model, and the cell proportions model were used to generate this model of qSVs. In this example we will choose "cell_component"
when using the getDegTx()
and select_transcripts()
functions.
In this next step, we subset to the transcripts associated with degradation. qsvaR
provides significant transcripts determined in four different linear models of transcript expression against degradation time, brain region, and potentially cell-type proportions:
exp ~ DegradationTime + Region
exp ~ DegradationTime * Region
exp ~ DegradationTime + Region + CellTypeProp
exp ~ DegradationTime * Region + CellTypeProp
select_transcripts()
returns degradation-associated transcripts and supports two parameters. First, top_n
controls how many significant transcripts to extract from each model. When cell_component = TRUE
, all four models are used; otherwise, just the first two are used. The union of significant transcripts from all used models is returned.
As an example, we’ll subset our RangedSummarizedExperiment
to the union of the top 1000 significant transcripts derived from each of the four models.
-## Next we get the degraded transcripts for qSVA from the "cell_component"
-## model
-DegTx <- getDegTx(rse_tx, type = "cell_component")
+# Subset 'rse_tx' to the top 1000 significant transcripts from the four
+# degradation models
+DegTx <- getDegTx(
+ rse_tx,
+ sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE)
+)
+#> Using 2496 degradation-associated transcripts.
## Now we can compute the Principal Components (PCs) of the degraded
## transcripts
@@ -147,19 +155,25 @@ Example
)
k <- k_qsvs(DegTx, mod, "tpm")
print(k)
-#> [1] 34
Now that we have our PCs and the number we need we can generate our qSVs.
+#> [1] 900 20This can be done in one step with our wrapper function qSVA
which just combinds all the previous mentioned functions.
## Example use of the wrapper function qSVA()
-qsvs_wrapper <- qSVA(rse_tx = rse_tx, type = "cell_component", mod = mod, assayname = "tpm")
+qsvs_wrapper <- qSVA(
+ rse_tx = rse_tx,
+ sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE),
+ mod = mod,
+ assayname = "tpm"
+)
+#> Using 2496 degradation-associated transcripts.
dim(qsvs_wrapper)
-#> [1] 900 34
Finally, you can compare the resulting t-statistics from your differential expression model against the degradation time t-statistics adjusting for the six different brain regions. This type of plot is called DEqual
plot and was shown in the initial qSVA framework paper (Jaffe et al, PNAS, 2017). We are really looking for two patterns exemplified here in Figure 1 (cartoon shown earlier). A direct positive correlation with degradation shown in Figure 1 on the right tells us that there is signal in the data associated with qSVs. An example of nonconfounded data or data that has been modeled can be seen in Figure 1 on the right with its lack of relationship between the x and y variables.