-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tune the use of AEOLUS indications from mychem.info #727
Comments
Thanks for posting this Andrew - a closer look at AEOLUS has been on my list for a while. From a quick review of their Nature Scientific Data paper, and looking at example records of AEOLUS data in mychem - I concluded that the 'indications' AEOLUS reports are based on FAERS self-reporting data, and reflect what the patient reporting the adverse event said they took the drug for, when reporting the adverse events they experienced. @andrewsu do you agree with this assessment? If true, I would agree that AEOLUS is not the best source of 'treats' statements - given the existence of other more reliable sources you mention for this type of knowledge. That said, it could be an interesting source of potential novel off-label usages of drugs - in cases where we see may patients self-reporting taking a drug for a particular non-indicated disease - so it may be worth keeping in Translator. The key will be to clearly advertise the dubious nature of these claims, to ensure end users and reasoning/scoring tools are appropriately cautious when using this information. As you suggest, knowledge level/agent type tags will play a big role here - as may other 'at-a-glace' EPC properties we have proposed such as 'evidence type'. I think these types of statements would fall into the Finally, note that we have previously documented the AEOLUS use case as an example of how knowledge level and other EPC / AAG properties would work together to represent this information under the refactored approach to modeling |
super @mbrush, I think we are on the same page. And yes, we will definitely follow whatever is specified in the EPC modeling document you linked. Perhaps a suggestion on that... The And now that we are out of code freeze, I do think we should implement a (hopefully) quick-to-implement stop-gap measure on CI/TEST. @colleenXu can you adjust the aeolus query to include a filter like this? https://mychem.info/v1/query?q=ranibizumab&fields=aeolus.indications&jmespath=aeolus.indications|[?count>`20`] |
@andrewsu to confirm, you'd like the limit to be > 20? |
yes, absent evidence to more confidently set that threshold, I think 20 will considerably improve the precision while not substantially degrading recall... |
I'm having trouble figuring out the reverse-operation "aeolus MEDDRA disease ID -(treated_by)-> chem". This matters because it's what BTE actually uses in creative-mode "treats", since creative-mode's starting ID is the disease. @newgene Here's the details. Can you help? (But I'm not sure if we can solve this. This is similar to a prior discussion on list_filter. Then, we decided that it wasn't really viable: one could do list_filter + JQ OR batch-query starting IDs, but not both) This is the intended behavior
I want to take a query like this, and only keep the hits (the aeolus field?) when the nested object in aeolus.indication meets the criteria: (1) meddra_code is one of the 3 listed (but it can be up to 1000 IDs in a batch), and (2) the count > 20.
For example, this hit for
What I tried, and how I know it isn't doing what I intend
First, I tried doing setting jmespath to So the query would be:
But the example unii:F0P408N6V4 is still in the hits, even though its nested object that matched click to see the unii:F0P408N6V4 hit
Trying the following didn't work either:
|
issue with adding this constraint to the reverse operation, see https://github.com/biothings/biothings_explorer/issues/727\#issuecomment-1776677828
Updates: I've implemented However, the reverse operation may be more important (as I said in the previous post). And while I'm making some progress (see below), I'm still not able to implement the count constraint for the reverse operation. Query for testing: Escitalopram
Based on Andrew's first post on this issue
Got 110 results before, should now get 29. The low-count hits like Tinnitus (meddra code 10043882) should no longer be in the result set. Query for testing: Ranibizumab
Based on Andrew's post above
Got 120 results before, should now get 41. The low-count hits like thrombosis (meddra code 10043607) should no longer be in the result set. I still need your help, but I think I've made some progress:
click to see what I have
Setting jmespath to The MyChem query is:
Then the response looks like this for hits that fulfill the criteria:
And like this for elements that don't fit the criteria (including the same F0P408N6V4 chemical I had in the last post):
Notes for myself on generating queries like this with x-bte/BTE
|
@colleenXu |
@newgene I tried adding this two ways: using a "no-scopes" query and post_filter. Both didn't seem to work: the responses were basically the same as before. The responses are basically the same as above "no-scopes" query and response
Response still has the hits that don't meet the criteria:
post-filter
Added post_filter parameter, set to
Response still has the hits that don't meet the criteria:
|
@colleenXu you have additional filter criteria in |
Okay....but I still can't figure out: if the hit's aeolus.indications is empty, how to remove the aeolus.unii field or remove the hit... (ref: this earlier post) |
(CC @newgene) This is the info from our conversation:
We tried setting the click for info
So the jmespath parameter is: And we set the request body to something very similar:
so the full query was:
And the responses have the same issue:
|
The MyChem-query-level limit ( Adding the new parameter
Thanks to @newgene @DylanWelzel for the BioThings SDK/MyChem update So the current situation in Dev/CI:
|
I know we've been discussing the aeolus edge-attribute format (flattening arrays into ints) in the edge-attribute constraint issue (part 1 here, and decision here). But I think it'd be make sense to add it to this issue and track its deployment here. What do you think? |
And a note - because the hard-coded limit of > 20 is for individual records, BTE won't return an edge for the following theoretical edge case:
I asked Andrew, and he said that this is fine for now. |
Addressed by this commit directly to main: biothings/bte_trapi_query_graph_handler@b0fc94d I've confirmed that the flattening/summation works as-intended :) Example based on the example in Part 1 here Example query
Send to MyChem thru BTE:
Previously, we'd get edges from the aeolus operations that look like this:
After the commit, these edges look like this: the edge-attribute values are ints and sums if there were values from multiple records.
|
The flattening/summing code was deployed today to Prod as part of the Octopus release. I tested and it's live. Summary of what was done in this issue:
Noting one edge case (pasted from above comment):
|
AEOLUS is a standardized version of the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) data. According to https://www.fda.gov/drugs/surveillance/questions-and-answers-fdas-adverse-event-reporting-system-faers:
So essentially it's a community-contributed database that has lots of good stuff, but it also has lots of junk. For example, here is an example record for Escitalopram, a medication used to manage and treat major depressive and generalized anxiety disorders: https://mychem.info/v1/chem/WSEQXVZVJXJVFP-FQEVSTJZSA-N?fields=aeolus. Among the listed "indications" are
These generally look good, but lower down, we see this:
These are probably extreme off-label uses as best, and data errors at worst.
Given that we have indications from multiple other sources through mychem.info (like ChEMBL and DrugCentral), we could probably remove these edges from the SmartAPI annotations without much loss in content to BTE. Alternatively, we could figure out an appropriate threshold on the
count
field (using a similar strategy to what we did in NCATSTranslator/Feedback#100. Eventually, this should also be assigned a relatively weakknowledge_level
(#715) so our scoring can account for it appropriately...The text was updated successfully, but these errors were encountered: