HELP-217 Missing Ensembl IDs #7

jeremywalter · 2022-11-21T22:46:31Z

When running the prep-script it works fine with a protein having the ensemble ID "ENSG00000214826". However when running the submission tool it will fail. This ID is also not present in the ensemble TSV in the reference CV folder. There are probably other IDs as but the submission stopped after the first error.

Best,
René

jeremywalter · 2022-11-21T22:47:17Z

Arthur Brady
June 15, 2022 at 9:50 PM

All set! Instructions:

go to https://osf.io/bq6k9/ and download the latest prepare_C2M2_submission.py

go to the "external_CV_reference_files" subfolder of that directory and get the latest protein.tsv.gz, compound.tsv.gz and substance.tsv.gz

rerun the prep script using your desired protein IDs

Note: of the list of 1,270 error-throwing UniProtKB accessions that you tried to submit in April, all but 15 will now work. The remaining 15 are legitimate errors, in one of three categories:

obsolete/deleted ID (e.g. Q6ZRZ4: https://www.uniprot.org/uniprot/Q6ZRZ4)

accession is for the wrong DB (e.g. Q6ZW33 is a PRO ID, not a UniProtKB accession; cf. https://www.uniprot.org/uniprot/O94851)

accession is a secondary accession (e.g. P35544); please use the primary accession instead (e.g. P62861 is the primary accession for P35544: cf. https://www.uniprot.org/uniprot/P62861). (Note in this case that the secondary accessions will still be automatically loaded, as C2M2 synonyms, when the primary accession is processed by the prep script – so users will be able to search "P35544" and come up with records relating to "P62861" – but only primary accessions should be used for direct data entry when you're creating C2M2 datapackages for submission).

Final note: the script will now list all offending IDs for all term types (proteins, genes, etc.) and will not exit until it's printed all the IDs that caused problems, which will hopefully simplify error handling in the future.

Arthur Brady
June 15, 2022 at 9:11 PM
Confirmed, I have located and fixed the problem and am updating the prep script and our reference files on OSF now. Stay tuned for detailed instructions on retrying submission as soon as I complete my data transfers.

Arthur Brady
June 15, 2022 at 7:25 PM
I believe I have found the problem with the missing protein IDs and am testing a fix which I believe will work. Assuming all goes well I will push relevant updates to reference TSVs and the submission prep script by tomorrow and notify you here; stand by.

Arthur Brady
June 2, 2022 at 5:41 PM
Cool. I’ll report on the protein issue here when I finish my tests – there’s another ticket mentioning it as well, but I already closed that one and I’m risking straining everyone’s necks with all the ticket consolidation ping pong.

Rene Ranzinger
June 2, 2022 at 5:32 PM
Hi,

I think I did not see any Ensemble ID errors in the resent error dump I sent you. We should be good on this front. But the protein Id problem remains. If you want to make a separate ticket for this to avoid confusion that is fine with me. I just posted all protein related ID errors into this thread independent if it was protein ID or gene ID.

Best,
René

jonathancrabtree · 2022-11-22T20:34:12Z

Hi @ReneRanzinger, the notes I have from Arthur indicate that this case was closed (in June, I'm assuming) and your notes above seem to confirm this (with the exception of the protein id problem, which I'm guessing was filed as the separate case HELP-357, aka issue #1 in this repository). Can this case be closed and, if not, which Ensembl ids are causing submission problems?

ReneRanzinger · 2022-11-22T20:57:43Z

This is related to glygener/glygen.cfde.generator#21 and glygener/glygen.cfde.generator#17.

jeremywalter assigned jonathancrabtree and ReneRanzinger Nov 21, 2022

jeremywalter changed the title ~~HELP-217~~ HELP-217 Missing Ensemble IDs Nov 21, 2022

jonathancrabtree changed the title ~~HELP-217 Missing Ensemble IDs~~ HELP-217 Missing Ensembl IDs Nov 22, 2022

ReneRanzinger closed this as completed Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HELP-217 Missing Ensembl IDs #7

HELP-217 Missing Ensembl IDs #7

jeremywalter commented Nov 21, 2022

jeremywalter commented Nov 21, 2022

jonathancrabtree commented Nov 22, 2022

ReneRanzinger commented Nov 22, 2022 •

edited

Loading

HELP-217 Missing Ensembl IDs #7

HELP-217 Missing Ensembl IDs #7

Comments

jeremywalter commented Nov 21, 2022

jeremywalter commented Nov 21, 2022

jonathancrabtree commented Nov 22, 2022

ReneRanzinger commented Nov 22, 2022 • edited Loading

ReneRanzinger commented Nov 22, 2022 •

edited

Loading