Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the mapping quality of spike-in sequences #1

Open
liu-genomics opened this issue Oct 27, 2018 · 0 comments
Open

Check the mapping quality of spike-in sequences #1

liu-genomics opened this issue Oct 27, 2018 · 0 comments

Comments

@liu-genomics
Copy link
Owner

liu-genomics commented Oct 27, 2018

possible problem

It seems there are many very short alignments, in the range of 4-6bps, in the alignment. I guess this is because in the alignment command, I removed alignments that hit more than twice in the genome. This should basically avoid the alignment of very short sequences when aligning to the whole genome. But when I used a very short sequence as a reference sequence, there is a great chance that a 4-mer or a 5-mer, would only have alignment at 1 position in the genome.

So in the clean up process, would eliminate those short sequences and also those that don't start from the core region (96bp到102bp on the reference sequence) of the spike-in sequence.

reference coordinates

I used the reverse of the sequence to do mapping, so the location of the true signal would be at position 92 on the reverse strand. So lulu's request from 96bp to 102 bp on the plus strand (wait, is Lulu talking about the earlier version of the spike-in? The new spike-in only has a length of 180bp, and the modification site is on the 89bp using the plus strand as reference).

OK. For the reference for PGC experiments, we should look at 89-95 mapping to the reverse strand. There are very few number of extremely short alignments in this range. But to be more accurate, also need to only count alignments that have "67M". The original read length should be 75bp.

samtools view -F 16 -m 67 \ 
../CHe-LHu-pl1-Lu9_S9_R1_001.171022_spike_in_reverse_complement.umi_encoded_adaptor_removed_no_mismatch.sorted.dedup.bam.sorted \
"171022_spikein_RC:89-95" > test4.sam

相关应急code放在 181028_MS_code,manuscript saving code.

#重点确认一下显示定量准确性的那个图还行不行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant