Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitocondrial chromosome naming, seqlevelsStyle() #8

Open
Roleren opened this issue Dec 5, 2019 · 4 comments
Open

Mitocondrial chromosome naming, seqlevelsStyle() #8

Roleren opened this issue Dec 5, 2019 · 4 comments

Comments

@Roleren
Copy link

Roleren commented Dec 5, 2019

I thanks for a nice function, I use seqlevelsStyle a lot.

One thing has always bothered me, is for the mitocondrial chromosome.

Two objects with same naming convention as these:

> seqlevelsStyle(bamFile)
[1] "UCSC"
> seqlevelsStyle(cds)
[1] "UCSC"

Can show seqlevels as this:

seqlevels(bamFile)
[1] "MT" 
seqlevels(cds)
[1] "chrM"

So even though they use the same seqlevelsStyle, the objects does not name the mitochondrial chromosome the same.
Is there a reason for this ? :)

@hpages
Copy link
Contributor

hpages commented Dec 5, 2019

I can't reproduce this :(

library(GenomicRanges)
seqlevelsStyle(GRanges("chrM:1-5"))
# [1] "UCSC"
seqlevelsStyle(GRanges("MT:1-5"))
# [1] "NCBI"    "Ensembl"

sessionInfo()

R version 3.6.0 Patched (2019-05-02 r76454)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-3.6.r76454/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.6.r76454/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.39.1 GenomeInfoDb_1.22.0  IRanges_2.21.1      
[4] S4Vectors_0.24.0     BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
[1] zlibbioc_1.32.0        compiler_3.6.0         XVector_0.26.0        
[4] GenomeInfoDbData_1.2.2 RCurl_1.95-4.12        bitops_1.0-6          

@Roleren
Copy link
Author

Roleren commented Dec 6, 2019

Basic reproduction is this:

# Normal case works
NCBI <- GRanges(seqnames = c("MT", paste0(1:2)), 1)
seqlevelsStyle(NCBI)
# "NCBI"    "Ensembl"
seqlevelsStyle(NCBI) <- "UCSC"
seqlevels(NCBI)
# [1] "chrM" "chr1" "chr2"

# This does not ( This is the case for my bam file)
mixed <- GRanges(seqnames = c("MT", paste0("chr", 1:2)), 1)
seqlevelsStyle(mixed)
# "UCSC"
seqlevelsStyle(mixed) <- "UCSC"
seqlevels(mixed)
# [1] "MT"    "chr1"  "chr2"  (<- It should swap MT for chrM here I think when you say "UCSC")

And this one gives a warning

# One of each (this works)
mixed <- GRanges(seqnames = c("MT", paste0("chr", 1)), 1)
seqlevelsStyle(mixed)
"NCBI"    "UCSC"    "Ensembl"
seqlevelsStyle(mixed) <- "UCSC"
# Warning message:
# In .replace_seqlevels_style(x_seqlevels, value) :
#   found more than one best sequence renaming map compatible with seqname style "UCSC" for # this object, using the first one
seqlevels(mixed)
# [1] "chrM" "chr1"
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ORFik_1.7.6                 reshape2_1.4.3              rtracklayer_1.46.0         
 [4] GenomicAlignments_1.22.1    Rsamtools_2.2.1             Biostrings_2.54.0          
 [7] XVector_0.26.0              SummarizedExperiment_1.16.0 DelayedArray_0.12.0        
[10] BiocParallel_1.20.0         matrixStats_0.55.0          GenomicFeatures_1.38.0     
[13] AnnotationDbi_1.48.0        Biobase_2.46.0              GenomicRanges_1.38.0       
[16] GenomeInfoDb_1.22.0         IRanges_2.20.1              S4Vectors_0.24.0           
[19] BiocGenerics_0.32.0         ggplot2_3.2.1               data.table_1.12.6          

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     progress_1.2.2        
 [5] httr_1.4.1             tools_3.6.0            backports_1.1.5        R6_2.4.1              
 [9] rpart_4.1-15           Hmisc_4.3-0            DBI_1.0.0              lazyeval_0.2.2        
[13] colorspace_1.4-1       nnet_7.3-12            withr_2.1.2            tidyselect_0.2.5      
[17] gridExtra_2.3          prettyunits_1.0.2      GGally_1.4.0           DESeq2_1.26.0         
[21] bit_1.1-14             curl_4.2               compiler_3.6.0         htmlTable_1.13.2      
[25] labeling_0.3           scales_1.1.0           checkmate_1.9.4        genefilter_1.68.0     
[29] askpass_1.1            rappdirs_0.3.1         stringr_1.4.0          digest_0.6.23         
[33] foreign_0.8-72         base64enc_0.1-3        pkgconfig_2.0.3        htmltools_0.4.0       
[37] dbplyr_1.4.2           htmlwidgets_1.5.1      rlang_0.4.2            rstudioapi_0.10       
[41] RSQLite_2.1.2          farver_2.0.1           acepack_1.4.1          dplyr_0.8.3           
[45] RCurl_1.95-4.12        magrittr_1.5           GenomeInfoDbData_1.2.2 Formula_1.2-3         
[49] Matrix_1.2-18          Rcpp_1.0.3             munsell_0.5.0          lifecycle_0.1.0       
[53] stringi_1.4.3          zlibbioc_1.32.0        plyr_1.8.4             BiocFileCache_1.10.2  
[57] grid_3.6.0             blob_1.2.0             crayon_1.3.4           lattice_0.20-38       
[61] splines_3.6.0          annotate_1.64.0        hms_0.5.2              locfit_1.5-9.1        
[65] zeallot_0.1.0          knitr_1.26             pillar_1.4.2           geneplotter_1.64.0    
[69] biomaRt_2.42.0         XML_3.98-1.20          glue_1.3.1             latticeExtra_0.6-28   
[73] BiocManager_1.30.10    vctrs_0.2.0            gtable_0.3.0           openssl_1.4.1         
[77] purrr_0.3.3            reshape_0.8.8          assertthat_0.2.1       xfun_0.11             
[81] xtable_1.8-4           survival_3.1-7         tibble_2.1.3           memoise_1.1.0         
[85] cluster_2.1.0      

@Roleren
Copy link
Author

Roleren commented Jan 29, 2020

So to sum up, this is the case:

seqlevelsStyle support shifting of style when the cases are clear.

When there are a mix of styles with only 2 chromosomes, it switches correctly but gives you a warning.

When there are a mix of styles with > 2 chromosomes, it switches incorrectly. This is the case I would like, and the most logical is that it behaves as the case for 2 chromosomes, giving a warning, but switching correctly.

That is:
if : "MT", "chr1" gives -> "chrM" "chr1"
then: "MT", "chr1", "chr2" should give -> "chrM" "chr1", "chr2"

Which does not happen now.

@Roleren
Copy link
Author

Roleren commented Mar 26, 2020

I think this is quite a serious issue, I get it all the time warnings like this:
Warning messages:

1: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrY, chrX
 - in 'y': X, Y

This is after they should have had equal seqlevels naming.

So I think a lot of people are losing the sex chromosomes + sometimes mitochondrial, if they use different seqnames and trust that seqlevelsstyle will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants