You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We created an API for SuppKG in #55 and biothings/biothings_explorer#706. We previously noted that SuppKG created UMLS-like identifiers (which have the format "DCXXXXXXX" instead of "CXXXXXXX"). At the time, we decided to treat them as if they were UMLS IDs, but now that is resulting in some confusing results (e.g., NCATSTranslator/Feedback#836), so it's time to adjust this behavior.
Vlado helped map these fake UMLS "DC" IDs to more common identifiers, the results of which are in supp_kg_chem_nodes.txt. To summarize those results, there were 56636 IDs for suppkg nodes, 53707 of which start with "C" -- we assume these are valid UMLS. Of the remaining 2928 whose IDs that start with "DC", Vlado mapped 841 of those to CHEBI, CID, UNII, MESH, etc. In our parser script, let's replace the "DC" IDs for these IDs in our API. For the remaining 2087 nodes for which Vlado could not find mappings, let's delete records using those IDs in our API.
An analysis of the namespaces used for the 841 (262 are mapped to multiple identifiers):
Can we map the 6 CHEMBL.TARGET entities to a different ID namespace? Or remove them? It's an odd identifier for a chemical and NodeNorm doesn't really support that ID namespace (example automated test issue).
I also wonder about adjusting some ID-prefixes to the Translator format:
We created an API for SuppKG in #55 and biothings/biothings_explorer#706. We previously noted that SuppKG created UMLS-like identifiers (which have the format "DCXXXXXXX" instead of "CXXXXXXX"). At the time, we decided to treat them as if they were UMLS IDs, but now that is resulting in some confusing results (e.g., NCATSTranslator/Feedback#836), so it's time to adjust this behavior.
Vlado helped map these fake UMLS "DC" IDs to more common identifiers, the results of which are in supp_kg_chem_nodes.txt. To summarize those results, there were 56636 IDs for suppkg nodes, 53707 of which start with "C" -- we assume these are valid UMLS. Of the remaining 2928 whose IDs that start with "DC", Vlado mapped 841 of those to CHEBI, CID, UNII, MESH, etc. In our parser script, let's replace the "DC" IDs for these IDs in our API. For the remaining 2087 nodes for which Vlado could not find mappings, let's delete records using those IDs in our API.
An analysis of the namespaces used for the 841 (262 are mapped to multiple identifiers):
The text was updated successfully, but these errors were encountered: