Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pwalign results doesn't match with the Emboss needle output #9

Open
Rohit-Satyam opened this issue Oct 14, 2024 · 0 comments
Open

pwalign results doesn't match with the Emboss needle output #9

Rohit-Satyam opened this issue Oct 14, 2024 · 0 comments

Comments

@Rohit-Satyam
Copy link

Rohit-Satyam commented Oct 14, 2024

@jwokaty @hpages

I realized that pwalign::pairwiseAlignment doesn't have functionality to use the EDNAFULL matrix that Embos Needleman-Wunsch utility needle uses. Even when provided with custom EDNAFULL matrix, I am unable to reproduce the results. A guy opened a similar issue 22 months ago on Bioconductor forum here but it wasn't answered so I am raising a similar issue here.

Besides, I realize the Emboss calculates the Percent identity very differently as 100 * sum(No. of matches or mismatches) / (sum(nchar(Primer1),nchar(Primer2)) - sum(No. of matches or mismatches)).

# Aligned_sequences: 2
# 1: Primer1
# 2: Primer2
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 16
# Identity:       4/16 (25.0%)
# Similarity:     4/16 (25.0%)
# Gaps:          12/16 (75.0%)
# Score: 20.0
# 
#
#=======================================

Primer1             1 TTACGAATA-------      9
                                    ||||       
Primer2            1 -----AATAAATGAAA     11

As can be seen the scoring is different.

temp <- pairwiseAlignment(Primer1, Primer2,gapOpening=10,gapExtension=0.5,substitutionMatrix = ednafull_matrix)
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: TTACGAATA-------
subject: -----AATAAATGAAA
score: -6 

In some cases even the alignment is also different as can be seen below:

> pairwiseAlignment(DNAStringSet("CGAAAAAATA"), DNAStringSet("CGTATCG"),gapOpening=10,gapExtension=0.5,substitutionMatrix = as.matrix(ednafull_matrix))
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: CGAAAAAATA---
subject: CG------TATCG
score: -4.5 

While emboss gives me the following

image

# Length: 15
# Identity:       2/15 (13.3%)
# Similarity:     2/15 (13.3%)
# Gaps:          13/15 (86.7%)
# Score: 10.0

This is the EDNAFULL matrix that I am using

ednafull_matrix <- matrix(c(
  # A   T   G   C   S   W   R   Y   K   M   B   V   H   D   N
  5, -4, -4, -4, -4,  1,  1, -4, -4,  1, -4, -1, -1, -1, -2,  # A
  -4,  5, -4, -4, -4,  1, -4,  1,  1, -4, -1, -4, -1, -1, -2,  # T
  -4, -4,  5, -4,  1, -4,  1, -4,  1, -4, -1, -1, -4, -1, -2,  # G
  -4, -4, -4,  5,  1, -4, -4,  1, -4,  1, -1, -1, -1, -4, -2,  # C
  -4, -4,  1,  1, -1, -4, -2, -2, -2, -2, -1, -1, -3, -3, -1,  # S
  1,  1, -4, -4, -4, -1, -2, -2, -2, -2, -3, -3, -1, -1, -1,  # W
  1, -4,  1, -4, -2, -2, -1, -4, -2, -2, -3, -1, -3, -1, -1,  # R
  -4,  1, -4,  1, -2, -2, -4, -1, -2, -2, -1, -3, -1, -3, -1,  # Y
  -4,  1,  1, -4, -2, -2, -2, -2, -1, -4, -1, -3, -3, -1, -1,  # K
  1, -4, -4,  1, -2, -2, -2, -2, -4, -1, -3, -1, -1, -3, -1,  # M
  -4, -1, -1, -1, -1, -3, -3, -1, -1, -3, -1, -2, -2, -2, -1,  # B
  -1, -4, -1, -1, -1, -3, -1, -3, -3, -1, -2, -1, -2, -2, -1,  # V
  -1, -1, -4, -1, -3, -1, -3, -1, -3, -1, -2, -2, -1, -2, -1,  # H
  -1, -1, -1, -4, -3, -1, -1, -3, -1, -3, -2, -2, -2, -1, -1,  # D
  -2, -2, -2, -2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1   # N
), nrow = 15, byrow = TRUE)

# Set row and column names for the EDNAFULL matrix
rownames(ednafull_matrix) <- colnames(ednafull_matrix) <- c("A", "T", "G", "C", "S", "W", "R", "Y", "K", "M", "B", "V", "H", "D", "N")

I am using pwalign 1.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant