Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble linking aeolus compounds #25

Open
cbizon opened this issue Jun 27, 2018 · 3 comments
Open

Trouble linking aeolus compounds #25

cbizon opened this issue Jun 27, 2018 · 3 comments

Comments

@cbizon
Copy link

cbizon commented Jun 27, 2018

If I do a simple query q=siltuximab, I get 5 results, with these identifiers and keys:

57894-421 ['_id', '_score', 'ndc']
57894-420 ['_id', '_score', 'ndc']
CHEMBL1743070 ['_id', '_score', 'chembl', 'drugcentral']
DB09036 ['_id', '_score', 'drugbank']
T4H8FMA7IM ['_id', '_score', 'aeolus', 'unii']

The way I actually want to query this data is by asking for compounds that have a particular aeolus outcome. So if I come in and query for a particular outcome, and it matches siluximab, I will get back only aeolus and unii information. I won't get chembl or drugcentral, making it hard to give this compound an identifier that I can integrate other data with.

I don't know if this is a general feature or if I just found one, but it seemed in testing that I often didn't get either a chembl or chebi node when querying by aeolus.

@kevinxin90
Copy link
Contributor

@newgene @andrewsu This is indeed a case regarding how we merge MyChem.info docs. By default, we merge docs based on the InchiKey. However, in the case of 'siltuximab', it's a peptide without available InchiKey. The 5 results return when making queries like http://mychem.info/v1/query?q=siltuximab all refers to the same drug. But it's shown as 5 separate docs in MyChem.info. Potential solution is to group them based on drugname when InchiKey is not available.

@newgene
Copy link
Member

newgene commented Jun 27, 2018

That's true. We are working on an id mapping utility function to merge these docs into one. Essentially when InchiKey is not available, we will use a priority list to define the primary key ("_id" field), e.g. drugbank id would be preferred, then chebi, then chembl, etc. As long as we keep this priority order consistent for all data sources in mychem.info, different sources can still be merged even when InchiKey is not available.

We are undergoing a major refactoring of mychem.info, these issues are on our list to be fixed.

@greg-k-taylor
Copy link
Collaborator

@cbizon It took me a while to understand what you are asking for. Does this query solve your problem?

http://mychem.info/v1/query?q=aeolus.outcomes.name:Hostility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants