Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading show() method for XStringSet objects #25

Open
hpages opened this issue Jul 2, 2019 · 4 comments
Open

Misleading show() method for XStringSet objects #25

hpages opened this issue Jul 2, 2019 · 4 comments
Assignees

Comments

@hpages
Copy link
Contributor

hpages commented Jul 2, 2019

This is a follow up of https://support.bioconductor.org/p/122340/#122400

The show() method for XStringSet objects currently suggests the existence of a seq() getter for these objects:

library(Biostrings)
library(drosophila2probe)
dna <- DNAStringSet(drosophila2probe)
dna
#   A DNAStringSet instance of length 265400
#          width seq
#      [1]    25 CCTGAATCCTGGCAATGTCATCATC
#      [2]    25 ATCCTGGCAATGTCATCATCAATGG
#      [3]    25 ATCAGTTGTCAACGGCTAATACGCG
#      [4]    25 ATCAATGGCGATTGCCGCGTCTGCA
#      [5]    25 CCGCGTCTGCAATGTGAGGGCCTAA
#      ...   ... ...
# [265396]    25 TACTACTTGAGCCACAACCATCTGA
# [265397]    25 AGGGACTAAAGAGGCCCCATGCTCT
# [265398]    25 CATGCTCTGTCTGGTGTCAGCGCTA
# [265399]    25 GTCAGCGCTACATGGTCCAGGACAA
# [265400]    25 CCAGGACAAGTATGGACTTCCCCAC

but there is no such getter.

Same issue with the show() method for XString objects:

dna[[1]]
#  25-letter "DNAString" instance
# seq: CCTGAATCCTGGCAATGTCATCATC

Also it would be good to make these show() methods more consistent with other show() methods in S4Vectors/IRanges/GenomicRanges:

library(IRanges)
IRanges(1:3, 10, names=LETTERS[1:3], score=runif(3))
# IRanges object with 3 ranges and 1 metadata column:
#         start       end     width |             score
#     <integer> <integer> <integer> |         <numeric>
#   A         1        10        10 | 0.267148569226265
#   B         2        10         9 | 0.106218574102968
#   C         3        10         8 | 0.649568639695644

In particular the names on a DNAStringSet object should be displayed on the left. Also its metadata columns should be displayed (right now they are not):

dna2 <- dna[1:3]
names(dna2) <- LETTERS[1:3]
mcols(dna2)$score <- runif(3)
dna2
#   A DNAStringSet instance of length 3
#     width seq                                               names               
# [1]    25 CCTGAATCCTGGCAATGTCATCATC                         A
# [2]    25 ATCCTGGCAATGTCATCATCAATGG                         B
# [3]    25 ATCAGTTGTCAACGGCTAATACGCG                         C
@hpages hpages self-assigned this Jul 2, 2019
@mtmorgan
Copy link

mtmorgan commented Jul 3, 2019

Somehow related is the initial value displayed for mcols()

> mcols(DNAStringSet())
NULL
> mcols(GRanges())
DataFrame with 0 rows and 0 columns

@hpages
Copy link
Contributor Author

hpages commented Jul 3, 2019

This has not much to do with the show() method but with the fact that the mcols() are allowed to be NULL for some Vector derivatives like Hits, Rle, IRanges, DNAStringSet, etc... For other Vector derivatives like GRanges, GRangesList, SummarizedExperiment, etc... mcols() is forced to be a DataFrame. An inconsistency situation that we should discuss in a different issue if we think it should be addressed.

@FelixErnst
Copy link
Contributor

There also some other inconsistencies for showing the name of elements. The length of names seems to be treated differently. Probably a historic reason based on the positioning of the names (left vs. right.)

library(Biostrings)
library(GenomicRanges)
seq <- RNAStringSet(c("UAUCUGGUUGAUCCUGCCAGUAGUCAUAUGCUUGUCUCAAAGAUUAAGCCAUGCAUGUCUAAGUAUAAGCAAUUUAUACAGUGAAACUGCGAAUGGCUCA",
                      "CCGAGAGGUCUUGGUAAUCUUGUGAAACUCCGUCGUGCUGGGGAUAGAGCAUUGUAAUUAUUGCUCUUCAACGAGGAAUUCCUAGUAAGCGCAAGUCAUCA"))
names(seq) <- c("TheFirstVeryLongNameAndItIsGettingEvenLongerByTheLetter",
                "TheSecondVeryLongNameAndItIsGettingEvenLongerByTheLetter")
gr <- GRanges(c("chr1:5-10:+","chr1:6-10:+"))
names(gr) <- names(seq)
seq
#>   A RNAStringSet instance of length 2
#>     width seq                                          names               
#> [1]   100 UAUCUGGUUGAUCCUGCCAGU...GUGAAACUGCGAAUGGCUCA TheFirstVeryLongN...
#> [2]   101 CCGAGAGGUCUUGGUAAUCUU...CUAGUAAGCGCAAGUCAUCA TheSecondVeryLong...
gr
#> GRanges object with 2 ranges and 0 metadata columns:
#>                                                            seqnames
#>                                                               <Rle>
#>    TheFirstVeryLongNameAndItIsGettingEvenLongerByTheLetter     chr1
#>   TheSecondVeryLongNameAndItIsGettingEvenLongerByTheLetter     chr1
#>                                                               ranges
#>                                                            <IRanges>
#>    TheFirstVeryLongNameAndItIsGettingEvenLongerByTheLetter      5-10
#>   TheSecondVeryLongNameAndItIsGettingEvenLongerByTheLetter      6-10
#>                                                            strand
#>                                                             <Rle>
#>    TheFirstVeryLongNameAndItIsGettingEvenLongerByTheLetter      +
#>   TheSecondVeryLongNameAndItIsGettingEvenLongerByTheLetter      +
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

@hpages
Copy link
Contributor Author

hpages commented Jul 3, 2019

Right, long names are truncated. But maybe that's a good thing and we should keep that when we move them to the left. I don't know.

Yeah, these things predate GRanges. The show() methods for XStringSet, XStringViews, and XString objects are actually my first show() methods ever. I implemented them more than 13 years ago when I took over the refactoring and maintenance of Biostrings. At that time we didn't have any of the IRanges, GenomicRanges, or S4Vectors packages yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants