-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
78 lines (53 loc) · 3.08 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# DropletQC
<!-- badges: start -->
<!-- badges: end -->
This is a simple R package to calculate, for every requested cell barcode in a provided scRNA-seq BAM file, the nuclear fraction score:
nuclear fraction = intronic reads / (intronic reads + exonic reads)
The score captures the proportion of reads from intronic regions. These RNA fragments originate from unspliced (nuclear) pre-mRNA, hence the name "nuclear fraction". This score can be used to help identify:
1. "Empty" droplets containing ambient RNA: low nuclear fraction score and low UMI count
2. Droplets containing damaged cells: high nuclear fraction score and low UMI count
## Installation
You can install DropletQC with:
``` r
# install.packages("devtools")
devtools::install_github("powellgenomicslab/DropletQC", build_vignettes = TRUE)
```
## Calculating the nuclear fraction
There are two functions which can be used to calculate the nuclear fraction; `nuclear_fraction_tags` and `nuclear_fraction_annotation`.
If your BAM file contains region tags which identify aligned reads as intronic or exonic, such as those produced by 10x Genomics' Cell Ranger software, then the simplest and fastest way to calculate the nuclear fraction is to point `nuclear_fraction_tags` to the directory:
```{r example1}
library(DropletQC)
nf1 <- nuclear_fraction_tags(
outs = system.file("extdata", "outs", package = "DropletQC"),
tiles = 1, cores = 1, verbose = FALSE)
head(nf1)
```
Alternatively, you can point `nuclear_fraction_annotation` to a gene annotation, BAM and barcode files:
```{r example2}
nf2 <- nuclear_fraction_annotation(
annotation_path = system.file("extdata/outs/chr1.gff3",package = "DropletQC"),
bam = system.file("extdata/outs/possorted_genome_bam.bam",package = "DropletQC"),
barcodes = system.file("extdata/outs/filtered_feature_bc_matrix/barcodes.tsv.gz",package = "DropletQC"),
tiles = 1, cores = 1, verbose = FALSE)
head(nf2)
```
This method is more flexible, as it makes no assumptions about how your BAM file was produced - but it will take longer.
Take care that the provided barcodes match the barcode structure in the BAM file.
## Identifying empty droplets and damaged cells
Once the nuclear fraction score has been calculated, the `identify_empty_drops` and `identify_damaged_cells` functions can be used to assist in identifying each these populations. Empty or damaged cells are flagged, not removed.
## More information
For a detailed discussion see our manuscript:
[DropletQC: improved identification of empty droplets and damaged cells in single-cell RNA-seq data](https://doi.org/10.1101/2021.08.02.454717)
For more information about the functions included in the package, including tips on how to assess the nuclear fraction score using real-world examples, see the [package vignette](https://powellgenomicslab.github.io/DropletQC/articles/DropletQC.html).