-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providing access to protein annotations and general note #43
Comments
Thanks! Also thanks for making the package and huge thanks for making this great resource available in such a reasonable format!
Yup, we've actually already got some of this working right now (using some stuff in #31) import genomic_features as gf
ensdb = gf.ensembl.annotation("Hsapiens", "108")
ensdb.genes(
cols=["gene_id", "gene_name", "tx_id", "uniprot_id"],
filter=(gf.filters.CanonicalFilter() & gf.filters.GeneBioTypeFilter("protein_coding"))
).head()
I have to confess that I'm not terribly familiar with common uses or access patterns for this protein information, so any tips would be appreciated!
Honestly, so far it has been super self explanatory and easy to figure out. I would be interested in knowing if you had any plans for schema updates, or anything like that we should be aware of. One "feature" requestThere was one thing that came up during our testing that I'd like to request. In #16, we saw that there were some versions of ensembl missing from annotation hub, that were instead bundled with the bioconductor annotation packages. Would it be possible/ easy to upload these versions to AnnotationHub? |
As long as Ensembl IDs are used (ENSG..., ENST, ..., ENSP...) all is pretty simple. There is (AFAIK) one ensembl protein ID (ENSP...) assigned to one transcript ID (ENST...) - so straight forward 1:1 mapping. With Uniprot it tricky, because there is no 1:1 mapping between Ensembl proteins and Uniprot. One Ensembl protein can be annotated to none, one or multiple Uniprot IDs... so, if possible, try to do the joins based on Ensembl IDs.
No schema changes planned - in the past I also tried to keep the main schema the same and just add e.g. new columns to individual tables. Regarding missing Ensembl versions - it would be possible to create |
Description of feature
Dear developers! Great work you're doing!
Just to introduce myself: I'm the developer of the
ensembldb
package and am maintaining and adding newEnsDb
databases to Bioconductor'sAnnotationHub
for each new Ensembl release.As a general note: the
EnsDb
s would also provide protein annotations (amino acid sequences, (functional) protein domains and some mapping to Uniprot identifiers). These things are not generalGenomicFeatures
features, but more specific to theensembldb
package - and it allows also to map directly between positions within the protein to the transcript to the genome (and vice versa).Also, please don't hesitate to ask if something about the
EnsDb
database layouts is unclear or if you have feature requests.The text was updated successfully, but these errors were encountered: