Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated results #953

Open
khanspers opened this issue Sep 21, 2024 · 2 comments
Open

Duplicated results #953

khanspers opened this issue Sep 21, 2024 · 2 comments
Assignees
Labels
chemical conflation normalization relates to or otherwise handled by node norm or name resolver - separate from autocomplete

Comments

@khanspers
Copy link

I'm seeing something similar to what was reported here with some results being reported twice, with different identifiers:

Screen Shot 2024-09-20 at 5 07 30 PM

The query is "What genes' activity may be decreased by Imatinib Mesylate": https://ui.test.transltr.io/results?l=Imatinib%20Mesylate&i=CHEBI:31690&t=4&r=0&q=c1469eda-0209-4d12-98ad-952498685545

@sstemann sstemann added the normalization relates to or otherwise handled by node norm or name resolver - separate from autocomplete label Sep 24, 2024
@sstemann
Copy link

@gaurav could you please take a look?

@gaurav
Copy link

gaurav commented Sep 26, 2024

Thanks for poking me about this one, Sarah!

C-KIT Gene

UMLS:C0920288 "C-KIT Gene" does look like it should be combined with NCBIGene:3815 "KIT". There are two ways of connecting these two concepts via UMLS:

We don't currently ingest NCI or LOINC, so that's probably why these are missing. I've opened an issue to look into whether we should include NCIT mappings for genes (TranslatorSRI/Babel#350). We already have a ticket to ingest LOINC mappings (TranslatorSRI/Babel#295), but even if we were to do that, I'm not sure we would include gene mappings from there. I don't think there's a quick fix here apart from ingesting those mappings.

Note that there is one UMLS identifiers already associated with that gene in NodeNorm Guppy, which is UMLS:C1416655 "KIT gene". So we could also think of this as an error on UMLS' part for not combining UMLS:C1416655 and UMLS:C0920288 into a single concept. I've sent a message to their helpdesk to see if they agree.

FWIW both UMLS IDs are coming back from SemMedDB, but I don't think there's any clever way of catching the duplication there.

ABL1 gene/protein

Similar situation: the only thing UMLS:C1439337 "Tyrosine-Protein Kinase ABL1, human" is connected to is NCIT:C17390 "Tyrosine-Protein Kinase ABL1", which is connected to OMIM:189980 and SwissPort P00519, both of which resolve to NCBIGene:25 on NodeNorm Guppy. So including NCIT mappings would fix this as well.

Overall resolution

Depends on adding NCIT mapping to genes in NodeNorm (TranslatorSRI/Babel#350), which I'll try to get into the NodeNorm November release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chemical conflation normalization relates to or otherwise handled by node norm or name resolver - separate from autocomplete
Projects
None yet
Development

No branches or pull requests

3 participants