Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell Ranger alignment stats #104

Open
iwillham opened this issue Apr 12, 2020 · 3 comments
Open

Cell Ranger alignment stats #104

iwillham opened this issue Apr 12, 2020 · 3 comments

Comments

@iwillham
Copy link

Hi there,
I have some 10X data that I'd like to try scone on. I'm trying to find alignment QC metrics in the Cell Ranger output files. Does Cell Ranger output the alignment QC metrics you reported in Table S2 of the paper (i.e., unmapped_reads, umi_corrected, etc.) ?

Thanks,
ian

@drisso
Copy link
Contributor

drisso commented Apr 27, 2020

Hi @iwilliams91 ,

I only have a vague recollection of what we did, but if I remember correctly we had to extract the metrics from the cell ranger output in a non-obvious location.

@mbcole performed the analysis and might remember more?

@asmariyaz23
Copy link

@iwillham were you able to find the answer to your question? I am stuck at the same.

@coltonrobbins73
Copy link

coltonrobbins73 commented Jan 25, 2021

@iwillham and @asmariyaz23 Not sure if you are still looking for a solution here, but I've made a little progress with this. So far I've been able to find 1) unmapped_reads 2) num_reads

You can get the complete list of mapped reads from your .bam file using samtools. (Note, I'm using Unix commands to find these barcodes. I think you can find equivalent commands for mac or PC.)

samtools view possorted_genome_bam.bam | awk '
match($0,/CB:Z:[ACGT]*/) {
a[substr($0,RSTART+5,RLENGTH-5)]++
}
END {
for(i in a)
print i,a[i]
}' >> /mapped_reads_per_barcode

output of the first 10 lines
GAAACTCTCGCAAACT | 14
ACATACGTCTCATTCA | 7
GATCGCGAGAACAATC | 4
CACACTCAGAAGGTGA | 18
TGCACCTAGTCCGGTC | 22889
GGACATTAGGATGTAT | 9
GACCAATCACATTCGA | 1
GAACCTATCAGAAATG | 6
AGCTCTCGTACACCGC | 13
CACAGTAAGCGCCTCA | 1043

You can then subset this barcode count list with the verified barcodes from cell ranger

For unmapped reads, replace the first command line with: 'samtools view -f 4 possorted_genome_bam.bam'

num_reads would then just be the two tables trimmed, ordered, and summed.

@mbcole Does that sound right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants