-
Hi! Thank you for the very nice talk on JBrowse2 during BGA24. During the talk it was mentioned that there is a way to get a reduced BAM file for a particular region of interest (retaining only informative reads) to visualize a chromosomal inversion. This approach would save us storage space as we would like to avoid adding whole genome BAM files to JBrowse. How could such BAM file be generated? Thanks :-) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
thanks for the kind words and for attending the BGA talk! I'm glad you followed up. here are some random ideas filtering long read BAM files for informative readsone idea to filter for informative reads is to get all reads that have a signature of being split. one of these tags is the "SA:" tag for long reads. the SA tag indicates "split" or "supplementary" alignments. note that filtering by flag 2048 is not sufficient to get all these reads because it will miss the "primary" alignment of the split my short guide on this https://cmdcolin.github.io/posts/2022-02-06-sv-sam#what-is-the-sa-tag (<-- that specific section discusses SA a bit, but the whole article has some more of my thoughts on the matter) filtering short read BAM files for informative readsfor short reads, you can filter out all reads that are NOT " read mapped in proper pair" which gives you all "badly paired reads", which are often too far apart or in the wrong orientation https://broadinstitute.github.io/picard/explain-flags.html random other ideas: creating "coverage" files for the above filtered BAM filesby creating a simple BigWig file of coverage of the filtered BAMs, you can quickly zoom in on them, and then load the actual alignments underneath these regions for more info creating a "contact" matrix for read pairingmaybe even better than the above, you can create a "contact matrix" from the read pairing sort of similar to the previous discussion about LD matrix, this can also be done to visualize pairing for large SVs like inversions. this is on place I saw a figure like this from a paper on butterfly part B of this figure shows this this method of a sort of contact matrix for SVs has also been used in programs like Cue for deep learning of SVs random other things: plotting population genetic statistics like Fst, linkage disequilibrium, and recombination ratesI know that plotting population genetic statistics over the genome, for certain circumstances where such population genetic approaches are relevant, can give some insights into weird structural variants like inversions. Hopi Hoekstras work comes to mind https://www.biorxiv.org/content/10.1101/2022.05.25.493470v1.full.pdf ConclusionThis is actually a great topic and i'd be happy to hear more if you find any results. Indeed, I often look at papers for visual inspiration and bring those visualization types to life in the JBrowse UI! part of the challenge is that sometimes it involves a little extra analysis steps or workflows, which sometimes has to be done outside of JBrowse itself, and not as common, but hopefully as more people do it, it can be workflow-ized and made available to everyone! |
Beta Was this translation helpful? Give feedback.
thanks for the kind words and for attending the BGA talk! I'm glad you followed up.
here are some random ideas
filtering long read BAM files for informative reads
one idea to filter for informative reads is to get all reads that have a signature of being split. one of these tags is the "SA:" tag for long reads. the SA tag indicates "split" or "supplementary" alignments. note that filtering by flag 2048 is not sufficient to get all these reads because it will miss the "primary" alignment of the split
my short guide on this https://cmdcolin.github.io/posts/2022-02-06-sv-sam#what-is-the-sa-tag (<-- that specific section discusses SA a bit, but the whole article has some more of my thoughts o…