Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable 'makeTxDbFromEnsembl()' to fill up genome version #8

Merged

Conversation

rcastelo
Copy link
Contributor

@rcastelo rcastelo commented Nov 15, 2024

Motivated by this issue opened in the gDNAx package, and related to this other issue about adding genome version information to TxDb objects, this PR enables makeTxDbFromEnsembl() to fill up the genome version information available in the Ensembl database. Currently, this does not happen:

library(txdbmaker) ## current release 1.2.0 version

txdb <- makeTxDbFromEnsembl(organism="Mus musculus", circ_seqs="MT")
seqinfo(txdb)
Seqinfo object with 61 sequences (1 circular) from an unspecified genome:
  seqnames   seqlengths isCircular genome
  1           195154279      FALSE   <NA>
  2           181755017      FALSE   <NA>
  3           159745316      FALSE   <NA>
  4           156860686      FALSE   <NA>
  5           151758149      FALSE   <NA>
  ...               ...        ...    ...
  JH584300.1     182347      FALSE   <NA>
  JH584301.1     259875      FALSE   <NA>
  JH584302.1     155838      FALSE   <NA>
  JH584303.1     158099      FALSE   <NA>
  JH584304.1     114452      FALSE   <NA>

and AFAICS there is no workaround, because the function makeTxDbFromEnsembl() does not take a metadata argument as in makeTxDbFromGFF(). This PR allows makeTxDbFromEnsembl() to fill up the genome version:

library(txdbmaker) ## PR version

txdb <- makeTxDbFromEnsembl(organism="Mus musculus", circ_seqs="MT")
seqinfo(txdb)
Seqinfo object with 61 sequences (1 circular) from GRCm39 genome:
  seqnames   seqlengths isCircular genome
  1           195154279      FALSE GRCm39
  2           181755017      FALSE GRCm39
  3           159745316      FALSE GRCm39
  4           156860686      FALSE GRCm39
  5           151758149      FALSE GRCm39
  ...               ...        ...    ...
  JH584300.1     182347      FALSE GRCm39
  JH584301.1     259875      FALSE GRCm39
  JH584302.1     155838      FALSE GRCm39
  JH584303.1     158099      FALSE GRCm39
  JH584304.1     114452      FALSE GRCm39

The implementation adds the column coord_system_version from the seq_region table in the Ensembl database to a column called genome in the internal chrominfo data.frame object. This is passed to the makeTxDb() function, which uses that column to add that genome information as metadata when creating the TxDb object.

@vjcitn
Copy link

vjcitn commented Nov 21, 2024

@hpages ??

@hpages
Copy link
Contributor

hpages commented Nov 21, 2024

Thanks for the PR @rcastelo. I'm back from travelling since yesterday and didn't have time to take a look at it earlier. This is a nice feature to add to makeTxDbFromEnsembl() and the execution is very clean. Thanks again!

@hpages hpages merged commit d6cdef5 into Bioconductor:devel Nov 21, 2024
@rcastelo rcastelo deleted the add-genome-version-in-makeTxDbFromEnsembl branch November 21, 2024 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants