-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HELP-217 Missing Ensembl IDs #7
Comments
Arthur Brady All set! Instructions: go to https://osf.io/bq6k9/ and download the latest prepare_C2M2_submission.py go to the "external_CV_reference_files" subfolder of that directory and get the latest protein.tsv.gz, compound.tsv.gz and substance.tsv.gz rerun the prep script using your desired protein IDs Note: of the list of 1,270 error-throwing UniProtKB accessions that you tried to submit in April, all but 15 will now work. The remaining 15 are legitimate errors, in one of three categories: obsolete/deleted ID (e.g. Q6ZRZ4: https://www.uniprot.org/uniprot/Q6ZRZ4) accession is for the wrong DB (e.g. Q6ZW33 is a PRO ID, not a UniProtKB accession; cf. https://www.uniprot.org/uniprot/O94851) accession is a secondary accession (e.g. P35544); please use the primary accession instead (e.g. P62861 is the primary accession for P35544: cf. https://www.uniprot.org/uniprot/P62861). (Note in this case that the secondary accessions will still be automatically loaded, as C2M2 synonyms, when the primary accession is processed by the prep script – so users will be able to search "P35544" and come up with records relating to "P62861" – but only primary accessions should be used for direct data entry when you're creating C2M2 datapackages for submission). Final note: the script will now list all offending IDs for all term types (proteins, genes, etc.) and will not exit until it's printed all the IDs that caused problems, which will hopefully simplify error handling in the future. Arthur Brady Arthur Brady Arthur Brady Rene Ranzinger I think I did not see any Ensemble ID errors in the resent error dump I sent you. We should be good on this front. But the protein Id problem remains. If you want to make a separate ticket for this to avoid confusion that is fine with me. I just posted all protein related ID errors into this thread independent if it was protein ID or gene ID. Best, |
Hi @ReneRanzinger, the notes I have from Arthur indicate that this case was closed (in June, I'm assuming) and your notes above seem to confirm this (with the exception of the protein id problem, which I'm guessing was filed as the separate case HELP-357, aka issue #1 in this repository). Can this case be closed and, if not, which Ensembl ids are causing submission problems? |
This is related to glygener/glygen.cfde.generator#21 and glygener/glygen.cfde.generator#17. |
When running the prep-script it works fine with a protein having the ensemble ID "ENSG00000214826". However when running the submission tool it will fail. This ID is also not present in the ensemble TSV in the reference CV folder. There are probably other IDs as but the submission stopped after the first error.
Best,
René
The text was updated successfully, but these errors were encountered: