Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virus protein accessions without ensemble mapping #19

Open
jeet-vora opened this issue May 19, 2022 · 4 comments
Open

Virus protein accessions without ensemble mapping #19

jeet-vora opened this issue May 19, 2022 · 4 comments
Assignees
Labels
CFDE Waiting on CFDE helpdesc

Comments

@jeet-vora
Copy link
Contributor

Hi Jessica and Arthur,
In GlyGen we have protein and glycan data for medically important virus species like SARS-CoV, and HCV. We are planning to submit the data for these species however the proteins do not have ENSEMBL mapping as viruses do not have chromosomes.

Do you have any suggestions on how we can tackle this in order to submit the data? Thanks
We are also adding mouse and rat data, so if any issue arises we will bring to your attention. Maybe in few days we can get together on a call to discuss these few issues including the ones reported by Rene.

@ReneRanzinger
Copy link
Member

Arthur Brady commented:

You could submit UniProt accessions, if they exist, and describe the proteins just as proteins, not genes. We do not in principle support draft genomic data because of its basic instability, although if there’s some entity or organization governing covid gene nomenclature, using data from such a source might be a possibility. Ensembl only provides IDs for genes for selected model organisms (although there are a large number of them) – for genes from organisms not represented in Ensembl, we can import IDs from other spaces (as we have done for GlyTouCan IDs not present in PubChem), but we would still need some sort of ID-issuing authority to have created stable identifiers for the genetic objects in question. Until/unless that’s done, we won’t be able to integrate draft (or anonymous) data alongside stable identifiers, for obvious reasons.

@nsuvarnaiari
Copy link

Hi @jeet-vora and @ReneRanzinger

I think this is still an open issue. As Arthur mentioned in his comment, if you know a reliable, stable authoritative source for viral genes (COVID and HCV), we can import those IDs to include in our controlled vocabulary so that you can start using them in your next submission (next year). Let us know.

Thanks,
Suvvi
@jonathancrabtree
@mgiglio99

@jeet-vora
Copy link
Contributor Author

Hi @nsuvarnaiari

There is no gene nomenclature resource as of now for viruses. We use UniProt for virus genes as it currently the best and curated resource for virus proteins and genes. Another option is to use NCBI Gene. In GlyGen the protein and gene related information comes from UniProt and most of them can be mapped to Ensembl Gene ID and/or NCBI GeneID.

@ReneRanzinger @jonathancrabtree @mgiglio99

@nsuvarnaiari
Copy link

HI @jeet-vora

NCBI Gene sounds like a good option if that works for you. Will you be able to provide us a list of NCBI Gene IDs for your set of viral genes?

Thanks,
Suvvi
@jonathancrabtree @mgiglio99

@ReneRanzinger ReneRanzinger added the CFDE Waiting on CFDE helpdesc label Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CFDE Waiting on CFDE helpdesc
Projects
None yet
Development

No branches or pull requests

3 participants