Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignedSubject/Pattern takes a long time to run #5

Open
LTLA opened this issue Jun 8, 2018 · 0 comments
Open

alignedSubject/Pattern takes a long time to run #5

LTLA opened this issue Jun 8, 2018 · 0 comments

Comments

@LTLA
Copy link

LTLA commented Jun 8, 2018

It seems that alignedSubject and alignedPattern take an unexpectedly long time to run:

library(Biostrings)
system.time(aln <- pairwiseAlignment(subject=DNAString(c("AAACGATCAGCTACGAACACT")), 
      DNAStringSet(rep("AACGAGGGCCACCTAGGAAGAAT", 1000))))
##   user  system elapsed 
##  0.208   0.008   0.219 
system.time(X <- alignedPattern(aln))
##   user  system elapsed 
## 16.622   0.008  16.783 
system.time(Y <- alignedSubject(aln))
##   user  system elapsed 
## 15.862   0.008  16.011 

Almost 100 times slower than the alignment itself, which I would have expected to be the most computationally intensive part of the process! This is a shame as we've been using the full alignment strings for large-scale processing of Nanopore data. I assume that the slowness is because the addition of -s to the end of the aligned sequence is done in a lapply loop in get_aligned_pattern, rather than in C.

R version 3.5.0 Patched (2018-04-30 r74679)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /home/cri.camres.org/lun01/Software/R/R-3-5-branch/lib/libRblas.so
LAPACK: /home/cri.camres.org/lun01/Software/R/R-3-5-branch/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.49.0   XVector_0.21.1      IRanges_2.15.13    
[4] S4Vectors_0.19.11   BiocGenerics_0.27.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.27.0 compiler_3.5.0 
@hpages hpages transferred this issue from Bioconductor/Biostrings Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant