You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug is in VN:v0.5.2 compiled on Nov 15 2017 at 11:37:33.
The MD:Z tag in the SAM output can be a very helpful supplement to the CIGAR string field because it details which bases have been inserted, deleted, and mismatched, whereas the CIGAR string only details the positions of indels. Because of the extra information supplied in the MD:Z field, one can re-create full alignments of reads to reference from the SAM file only (without having to re-read sequences from the reference genome). I however, I have noticed some irregularities in the MD:Z field produced by graphmap. Below I will detail how to reproduce the error I'm seeing.
I created a reference file of two unrelated, random sequences.
Of importance, you'll notice that the CIGAR string is correct, and the MD:Z field has the correct base counts (they both indicate 4 matches, 1 deletion, 114 matches), however, the base indicated as deleted in the MD:Z field is T, when it should be G. Coincidentally, T happens to be the 5th base of ref0, which made me suspect that the MD:Z is being constructed using the wrong reference. And in-fact, when I change the 5th base of ref0 to A, the MD:Z string changes to MD:Z:4^A114 to match ref0 (even though the read is mapped to ref1). Also, if I completely eliminate ref0 from my library, the correct MD:Z string is produced. Again, the composition or presence of ref0 in my library should have no bearing on the MD:Z field for my read, as it mapped to ref1. So, it appears that graphmap is using the wrong reference sequences to determine the bases that have been deleted for the MD:Z field.
The text was updated successfully, but these errors were encountered:
mdcao
added a commit
to mdcao/graphmap
that referenced
this issue
May 10, 2018
This bug is in
VN:v0.5.2 compiled on Nov 15 2017 at 11:37:33
.The MD:Z tag in the SAM output can be a very helpful supplement to the CIGAR string field because it details which bases have been inserted, deleted, and mismatched, whereas the CIGAR string only details the positions of indels. Because of the extra information supplied in the MD:Z field, one can re-create full alignments of reads to reference from the SAM file only (without having to re-read sequences from the reference genome). I however, I have noticed some irregularities in the MD:Z field produced by graphmap. Below I will detail how to reproduce the error I'm seeing.
I created a reference file of two unrelated, random sequences.
I then created a single simulated read from
ref1
, with a single deletion of the 5th base (G).I used the following command to align
myRead
to the reference file containing bothref1
andref2
.Here is the SAM output:
Of importance, you'll notice that the CIGAR string is correct, and the MD:Z field has the correct base counts (they both indicate 4 matches, 1 deletion, 114 matches), however, the base indicated as deleted in the MD:Z field is T, when it should be G. Coincidentally, T happens to be the 5th base of
ref0
, which made me suspect that the MD:Z is being constructed using the wrong reference. And in-fact, when I change the 5th base ofref0
to A, the MD:Z string changes toMD:Z:4^A114
to matchref0
(even though the read is mapped toref1
). Also, if I completely eliminateref0
from my library, the correct MD:Z string is produced. Again, the composition or presence ofref0
in my library should have no bearing on the MD:Z field for my read, as it mapped toref1
. So, it appears that graphmap is using the wrong reference sequences to determine the bases that have been deleted for the MD:Z field.The text was updated successfully, but these errors were encountered: