Mismatch between TElocal counts and BAM file reads #54

ptranvan · 2024-12-09T14:49:30Z

Hi,

I used TElocal to generate counts, and I’m particularly interested in one specific feature. For this feature, I obtained a count of 24. However, when I went back to the BAM file to extract the reads that map to this feature using bedtools intersect, I found only 12 reads (8 uniques) that come from:

6 paired-end reads (representing 4 unique pairs).

How is it possible for the count to be 24 in this case?

Thanks for your help

The text was updated successfully, but these errors were encountered:

olivertam · 2024-12-09T17:01:24Z

Hi,

Thank you for your interest in the software.
Would you be able to the send an excerpt of the BAM file for that particular genomic region (and the corresponding TE annotation)?
For example, if your TE is located at chr1:1-1000:

$ samtools view [BAM] chr1:1-1000 > TE_overlapping_reads.sam

Thanks.

ptranvan · 2024-12-09T17:11:00Z

Thanks @olivertam for your prompt answer, here is the file
TE_overlapping_reads.sam.zip

olivertam · 2024-12-09T17:16:18Z

Hi,

Thank you for that file.
Could you also provide the TElocal command line, as well as the BAM headers?

$ samtools view -H [BAM] > BAM_header.txt

Thanks.

talithaforcier · 2024-12-09T17:16:25Z

Hi! Which gene and TE annotation files were you using with this?
Thanks!

ptranvan · 2024-12-09T17:29:21Z

BAM_header.txt
TE_overlapping_reads.sam.zip

The TE_overlapping_reads.sam and BAM_header.txt are from the initial mapping, then I sorted by read name:

samtools view -bS bam1.bam | samtools sort -n - > bam2.bam

Then TElocal

TElocal -b bam2.bam --GTF Homo_sapiens-GCA_009914755.4-2022_07-genes_chr.gtf --TE T2T-CHM13v2_rmsk_TE.gtf.locInd --stranded reverse --mode multi

Gene annotation is from

https://rapid.ensembl.org/Homo_sapiens_GCA_009914755.4/Info/Index
with custom modifcation adding "chr" before scaffold number

and TE annotation:

https://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/T2T-CHM13v2_rmsk_TE.gtf.gz

The feature that has 24 counts is ALR/Alpha_dup1372

olivertam · 2024-12-09T17:35:30Z

Thank you for providing the information.

We'll take a closer look.

Thanks.

olivertam · 2024-12-09T18:00:38Z

Hi,

Sorry for bothering you again, but could you rerun the samtools command with the following change:

$ samtools view -P [BAM] chrY:10562600-10883787 > TE_overlapping_reads.sam

Thanks.

ptranvan · 2024-12-09T18:14:25Z

No problem, it looks a bit more logical now indeed.

TE_overlapping_reads2.sam.zip

should I still run with with --sortByPos ??

olivertam · 2024-12-09T18:24:54Z

Hi,

I don't think you need to rerun with --sortByPos.

Based on the new SAM, it does appear that there are 25 distinct read pairs that contributes to the region (23 pairs that align uniquely, and 2 pairs with 2 alignments). The way that TElocal works on paired end reads is that it checks both ends of the fragment to determine if it overlaps strongly with the annotation. I suspect that out of all these, all but one of them have at least one end overlapping significantly with the ALR (the last one might have less than 50% overlap on one of the ends), and since there are no other gene or TE annotations that the fragment could be annotated to, the ALR annotation would be given to those.

I think that might give rise to the count of 24 for ALR/Alpha_dup1372.

The previous samtools command would ignore read pairs where one or more portion lie outside the region of interest. I am surprised that these reads did not come out from bedtools intersect, though that might be due to some parameters in that program.

Sorry for the inconvenience. Let us know if you have further questions.

Thanks.

ptranvan · 2024-12-17T10:15:06Z

Dear @olivertam

I come back to you with an other similar issue. This is my command:

samtools view -P x.bam chr3:95726522-95727149 > TE_overlapping_reads2.sam

It seems to have one paired end mapped.

However in my TElocal analysis, I don't see any count for that element (L1PA15_dup4168:L1PA15:L1:LINE)

TE_overlapping_reads2.txt

olivertam · 2024-12-17T13:21:24Z

Hi,

This is a case where you would need to see where the other 5 alignments for the read is located (as indicated by the NH:i:5 tag in the alignment output. It is possible (highly likely) that the other alignment position for this read are also overlapping TE instances (e.g. another L1PA15), and based on the EM, this read was redistributed to another copy.

Please let me know if that does not address your question.

ptranvan · 2024-12-17T13:28:07Z

Oh, I see. How do you decide which copy gets the count assignment?
I thought the counts were split between each mapped copy.

olivertam · 2024-12-17T13:33:41Z

The counts are initially split between each mapped copy, but then through the EM loop, the distribution of the counts are recalculated. This is done iteratively until the overall difference between subsequent EM iterations falls below a threshold (i.e. converged). To answer your question, the copy that got this particular read assignment is most likely the copy with the highest count where one of the 5 alignment overlapped.

For more details on the methodology of the EM, section 3.3 of the paper might be helpful.

Thanks.

olivertam added the invalid This doesn't seem right label Dec 9, 2024

olivertam added question and removed invalid This doesn't seem right labels Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch between TElocal counts and BAM file reads #54

Mismatch between TElocal counts and BAM file reads #54

ptranvan commented Dec 9, 2024 •

edited

Loading

olivertam commented Dec 9, 2024

ptranvan commented Dec 9, 2024

olivertam commented Dec 9, 2024

talithaforcier commented Dec 9, 2024

ptranvan commented Dec 9, 2024 •

edited by olivertam

Loading

olivertam commented Dec 9, 2024

olivertam commented Dec 9, 2024

ptranvan commented Dec 9, 2024

olivertam commented Dec 9, 2024 •

edited

Loading

ptranvan commented Dec 17, 2024

olivertam commented Dec 17, 2024

ptranvan commented Dec 17, 2024

olivertam commented Dec 17, 2024 •

edited

Loading

Mismatch between TElocal counts and BAM file reads #54

Mismatch between TElocal counts and BAM file reads #54

Comments

ptranvan commented Dec 9, 2024 • edited Loading

olivertam commented Dec 9, 2024

ptranvan commented Dec 9, 2024

olivertam commented Dec 9, 2024

talithaforcier commented Dec 9, 2024

ptranvan commented Dec 9, 2024 • edited by olivertam Loading

olivertam commented Dec 9, 2024

olivertam commented Dec 9, 2024

ptranvan commented Dec 9, 2024

olivertam commented Dec 9, 2024 • edited Loading

ptranvan commented Dec 17, 2024

olivertam commented Dec 17, 2024

ptranvan commented Dec 17, 2024

olivertam commented Dec 17, 2024 • edited Loading

ptranvan commented Dec 9, 2024 •

edited

Loading

ptranvan commented Dec 9, 2024 •

edited by olivertam

Loading

olivertam commented Dec 9, 2024 •

edited

Loading

olivertam commented Dec 17, 2024 •

edited

Loading