tune the use of AEOLUS indications from mychem.info #727

andrewsu · 2023-09-15T17:11:45Z

AEOLUS is a standardized version of the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) data. According to https://www.fda.gov/drugs/surveillance/questions-and-answers-fdas-adverse-event-reporting-system-faers:

The FDA Adverse Event Reporting System (FAERS) is a database that contains adverse event reports, medication error reports and product quality complaints resulting in adverse events that were submitted to FDA.

So essentially it's a community-contributed database that has lots of good stuff, but it also has lots of junk. For example, here is an example record for Escitalopram, a medication used to manage and treat major depressive and generalized anxiety disorders: https://mychem.info/v1/chem/WSEQXVZVJXJVFP-FQEVSTJZSA-N?fields=aeolus. Among the listed "indications" are

  "indications": [
    {
      "count": 765,
      "id": "36918942",
      "meddra_code": "10012378",
      "name": "Depression"
    },
    {
      "count": 219,
      "id": "36918858",
      "meddra_code": "10002855",
      "name": "Anxiety"
    },
    {
      "count": 106,
      "id": "42890454",
      "meddra_code": "10070592",
      "name": "Product used for unknown indication"
    },
    {
      "count": 71,
      "id": "36918945",
      "meddra_code": "10057840",
      "name": "Major depression"
    },
    {
      "count": 33,
      "id": "36918855",
      "meddra_code": "10018075",
      "name": "Generalised anxiety disorder"
    },
    ...
  ]

These generally look good, but lower down, we see this:

    {
      "count": 1,
      "id": "35205038",
      "meddra_code": "10013968",
      "name": "Dyspnoea"
    },
    {
      "count": 1,
      "id": "35306119",
      "meddra_code": "10036476",
      "name": "Prader-Willi syndrome"
    },
    {
      "count": 1,
      "id": "35406391",
      "meddra_code": "10043882",
      "name": "Tinnitus"
    },
    {
      "count": 1,
      "id": "35707962",
      "meddra_code": "10069049",
      "name": "Gastrointestinal viral infection"
    },
    {
      "count": 1,
      "id": "35708108",
      "meddra_code": "10021518",
      "name": "Impaired gastric emptying"
    }

These are probably extreme off-label uses as best, and data errors at worst.

Given that we have indications from multiple other sources through mychem.info (like ChEMBL and DrugCentral), we could probably remove these edges from the SmartAPI annotations without much loss in content to BTE. Alternatively, we could figure out an appropriate threshold on the count field (using a similar strategy to what we did in NCATSTranslator/Feedback#100. Eventually, this should also be assigned a relatively weak knowledge_level (#715) so our scoring can account for it appropriately...

The text was updated successfully, but these errors were encountered:

mbrush · 2023-10-23T22:02:23Z

Thanks for posting this Andrew - a closer look at AEOLUS has been on my list for a while.

From a quick review of their Nature Scientific Data paper, and looking at example records of AEOLUS data in mychem - I concluded that the 'indications' AEOLUS reports are based on FAERS self-reporting data, and reflect what the patient reporting the adverse event said they took the drug for, when reporting the adverse events they experienced. @andrewsu do you agree with this assessment?

If true, I would agree that AEOLUS is not the best source of 'treats' statements - given the existence of other more reliable sources you mention for this type of knowledge.

That said, it could be an interesting source of potential novel off-label usages of drugs - in cases where we see may patients self-reporting taking a drug for a particular non-indicated disease - so it may be worth keeping in Translator.

The key will be to clearly advertise the dubious nature of these claims, to ensure end users and reasoning/scoring tools are appropriately cautious when using this information. As you suggest, knowledge level/agent type tags will play a big role here - as may other 'at-a-glace' EPC properties we have proposed such as 'evidence type'. I think these types of statements would fall into the observation knowledge level bucket.

Finally, note that we have previously documented the AEOLUS use case as an example of how knowledge level and other EPC / AAG properties would work together to represent this information under the refactored approach to modeling treats relationships. Worth a look at the proposal in the screenshot below (and source document here). - to see how we might ultimately choose to handle a source like AEOLUS.

andrewsu · 2023-10-23T22:57:57Z

super @mbrush, I think we are on the same page. And yes, we will definitely follow whatever is specified in the EPC modeling document you linked. Perhaps a suggestion on that... The Ranibizumab - treats - AMD example is helpful (1955 reports in AEOLUS), but just so people don't get tempted to over-trust AEOLUS, it might be useful to also add a poor AEOLUS "prediction" to that doc as well. Many examples to choose from in https://mychem.info/v1/query?q=ranibizumab&fields=aeolus.indications: Ranibizumab - treats - Thrombosis (9 reports) or Ranibizumab - treats - Type 2 diabetes mellitus (1 report) and Ranibizumab - treats - Phlebotomy (1 report)...

And now that we are out of code freeze, I do think we should implement a (hopefully) quick-to-implement stop-gap measure on CI/TEST. @colleenXu can you adjust the aeolus query to include a filter like this? https://mychem.info/v1/query?q=ranibizumab&fields=aeolus.indications&jmespath=aeolus.indications|[?count>`20`]

colleenXu · 2023-10-24T06:05:01Z

@andrewsu to confirm, you'd like the limit to be > 20?

andrewsu · 2023-10-24T06:12:18Z

yes, absent evidence to more confidently set that threshold, I think 20 will considerably improve the precision while not substantially degrading recall...

colleenXu · 2023-10-24T07:34:38Z

@andrewsu

I'm having trouble figuring out the reverse-operation "aeolus MEDDRA disease ID -(treated_by)-> chem". This matters because it's what BTE actually uses in creative-mode "treats", since creative-mode's starting ID is the disease.

@newgene Here's the details. Can you help?

(But I'm not sure if we can solve this. This is similar to a prior discussion on list_filter. Then, we decided that it wasn't really viable: one could do list_filter + JQ OR batch-query starting IDs, but not both)

This is the intended behavior

I want to take a query like this, and only keep the hits (the aeolus field?) when the nested object in aeolus.indication meets the criteria: (1) meddra_code is one of the 3 listed (but it can be up to 1000 IDs in a batch), and (2) the count > 20.

curl --location 'https://mychem.info/v1/query?size=1000&fields=aeolus.indications%2Caeolus.unii' \
--header 'Content-Type: application/json' \
--data '{
  "q": ["10018304", "10058990", "10038867"],
  "scopes": "aeolus.indications.meddra_code"
}'

For example, this hit for 10018304 (chemical is unii:F0P408N6V4) doesn't meet the criteria because the specific nested object with 10018304 has a count less than 20. So I'd like to remove this hit completely from the response (or at least the entire aeolus field for this hit).

    {
        "query": "10018304",
        "_id": "F0P408N6V4",
        "_score": 7.2257814,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [
                {
                    "count": 19893,
                    "id": "43053715",
                    "meddra_code": "10035226",
                    "name": "Plasma cell myeloma"
                },
...
                {
                    "count": 1,
                    "id": "35606985",
                    "meddra_code": "10018304",
                    "name": "Glaucoma"
                },
...
            ],
            "unii": "F0P408N6V4"
        }
    },

What I tried, and how I know it isn't doing what I intend

First, I tried doing setting jmespath to aeolus.indications|[?count>`20`]

So the query would be:

curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=aeolus.indications%2Caeolus.unii&jmespath=aeolus.indications%7C[%3Fcount%3E%6020%60]' \
--header 'Content-Type: application/json' \
--data '{
  "q": ["10018304", "10058990", "10038867"],
  "scopes": "aeolus.indications.meddra_code"
}'

But the example unii:F0P408N6V4 is still in the hits, even though its nested object that matched 10018304 is missing (it was filtered out because its count was less than 20).

click to see the unii:F0P408N6V4 hit

    {
        "query": "10018304",
        "_id": "F0P408N6V4",
        "_score": 7.2257814,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [
                {
                    "count": 19893,
                    "id": "43053715",
                    "meddra_code": "10035226",
                    "name": "Plasma cell myeloma"
                },
                {
                    "count": 2306,
                    "id": "35104397",
                    "meddra_code": "10028533",
                    "name": "Myelodysplastic syndrome"
                },
                {
                    "count": 1123,
                    "id": "35104667",
                    "meddra_code": "10028228",
                    "name": "Multiple myeloma"
                },
                {
                    "count": 425,
                    "id": "35104364",
                    "meddra_code": "10008958",
                    "name": "Chronic lymphocytic leukaemia"
                },
                {
                    "count": 364,
                    "id": "35104461",
                    "meddra_code": "10025310",
                    "name": "Lymphoma"
                },
                {
                    "count": 348,
                    "id": "42890454",
                    "meddra_code": "10070592",
                    "name": "Product used for unknown indication"
                },
                {
                    "count": 201,
                    "id": "35104394",
                    "meddra_code": "10068532",
                    "name": "5q minus syndrome"
                },
                {
                    "count": 201,
                    "id": "35104532",
                    "meddra_code": "10061275",
                    "name": "Mantle cell lymphoma"
                },
                {
                    "count": 196,
                    "id": "35104351",
                    "meddra_code": "10000880",
                    "name": "Acute myeloid leukaemia"
                },
                {
                    "count": 186,
                    "id": "36009859",
                    "meddra_code": "10002022",
                    "name": "Amyloidosis"
                },
                {
                    "count": 146,
                    "id": "35104490",
                    "meddra_code": "10012818",
                    "name": "Diffuse large B-cell lymphoma"
                },
                {
                    "count": 142,
                    "id": "35104252",
                    "meddra_code": "10028537",
                    "name": "Myelofibrosis"
                },
                {
                    "count": 138,
                    "id": "35104643",
                    "meddra_code": "10029547",
                    "name": "Non-Hodgkin's lymphoma"
                },
                {
                    "count": 130,
                    "id": "35104465",
                    "meddra_code": "10003899",
                    "name": "B-cell lymphoma"
                },
                {
                    "count": 86,
                    "id": "35124300",
                    "meddra_code": "10068361",
                    "name": "MDS"
                },
                {
                    "count": 58,
                    "id": "35125677",
                    "meddra_code": "10028233",
                    "name": "Multiple myeloma without mention of remission"
                },
                {
                    "count": 56,
                    "id": "43053717",
                    "meddra_code": "10073133",
                    "name": "Plasma cell myeloma recurrent"
                },
                {
                    "count": 47,
                    "id": "35104405",
                    "meddra_code": "10020206",
                    "name": "Hodgkin's disease"
                },
                {
                    "count": 45,
                    "id": "37522153",
                    "meddra_code": "10057097",
                    "name": "Drug use for unknown indication"
                },
                {
                    "count": 38,
                    "id": "43053713",
                    "meddra_code": "10035222",
                    "name": "Plasma cell leukaemia"
                },
                {
                    "count": 34,
                    "id": "35125678",
                    "meddra_code": "10028566",
                    "name": "Myeloma"
                },
                {
                    "count": 33,
                    "id": "35104669",
                    "meddra_code": "10035484",
                    "name": "Plasmacytoma"
                },
                {
                    "count": 29,
                    "id": "35124041",
                    "meddra_code": "10009310",
                    "name": "CLL"
                },
                {
                    "count": 27,
                    "id": "36617702",
                    "meddra_code": "10060862",
                    "name": "Prostate cancer"
                },
                {
                    "count": 27,
                    "id": "42888924",
                    "meddra_code": "10060880",
                    "name": "Monoclonal gammopathy"
                },
                {
                    "count": 26,
                    "id": "35104567",
                    "meddra_code": "10047801",
                    "name": "Waldenstrom's macroglobulinaemia"
                },
                {
                    "count": 25,
                    "id": "35104382",
                    "meddra_code": "10025270",
                    "name": "Lymphocytic leukaemia"
                },
                {
                    "count": 23,
                    "id": "35123953",
                    "meddra_code": "10000886",
                    "name": "Acute myeloid leukemia"
                }
            ],
            "unii": "F0P408N6V4"
        }
    },

Trying the following didn't work either:

aeolus|[?indications.count>`20`] : then all the hits had aeolus: null which is incorrect since I know some hits met the criteria (like unii:1O6WQ6T7G3 for 10018304)
.|[?aeolus.indications.count>`20`] : then it seemed like the jmespath statement did nothing (no nested objects filtered out)

issue with adding this constraint to the reverse operation, see https://github.com/biothings/biothings_explorer/issues/727\#issuecomment-1776677828

colleenXu · 2023-10-27T07:29:20Z

Updates:

@andrewsu

I've implemented jmespath: aeolus.indications|[?count>`20`] for the aeolus-treats operation (chemical X -(treats)-> disease).

However, the reverse operation may be more important (as I said in the previous post). And while I'm making some progress (see below), I'm still not able to implement the count constraint for the reverse operation.

Query for testing: Escitalopram

Based on Andrew's first post on this issue

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["UNII:4O4S742ANY"],
                    "categories":["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:treats"]
                }
            }
        }
    }
}

Got 110 results before, should now get 29. The low-count hits like Tinnitus (meddra code 10043882) should no longer be in the result set.

Query for testing: Ranibizumab

Based on Andrew's post above

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["UNII:ZL1R02VT79"],
                    "categories":["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:treats"]
                }
            }
        }
    }
}

Got 120 results before, should now get 41. The low-count hits like thrombosis (meddra code 10043607) should no longer be in the result set.

@newgene

I still need your help, but I think I've made some progress:

I've found a way to only keep the elements in the aeolus.indication array that have both (1) meddra_code value is one of the ones I asked for, and (2) the count > 20
but I can't figure out how to remove the aeolus.unii field when the criteria above are met (or remove the hit, both will work for BTE's purposes). This is the main thing left to figure out. Can you help?
- I saw the Slack convo for this feature, but it doesn't seem to be live on MyChem? And I'm not clear if it's related and will help...

click to see what I have

Setting jmespath to aeolus.indications|[?(count>`20`) && (meddra_code=='10018304' ||meddra_code=='10038867')] (using biothings/biothings.api@31898fa as reference)

The MyChem query is:

curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=aeolus.indications%2Caeolus.unii&jmespath=aeolus.indications%7C[%3F(count%3E%6020%60)%20%26%26%20(meddra_code%3D%3D%2710018304%27%20%7C%7Cmeddra_code%3D%3D%2710038867%27)]' \
--header 'Content-Type: application/json' \
--data '{
  "q": ["10018304", "10038867"],
  "scopes": "aeolus.indications.meddra_code"
}'

Then the response looks like this for hits that fulfill the criteria:

    {
        "query": "10018304",
        "_id": "WSNODXPBBALQOF-VEJSHDCNSA-N",
        "_score": 7.2257814,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [
                {
                    "count": 157,
                    "id": "35606985",
                    "meddra_code": "10018304",
                    "name": "Glaucoma"
                }
            ],
            "unii": "1O6WQ6T7G3"
        }
    },

    {
        "query": "10038867",
        "_id": "1RXS4UE564",
        "_score": 8.809106,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [
                {
                    "count": 26,
                    "id": "35607414",
                    "meddra_code": "10038867",
                    "name": "Retinal haemorrhage"
                }
            ],
            "unii": "1RXS4UE564"
        }
    },

And like this for elements that don't fit the criteria (including the same F0P408N6V4 chemical I had in the last post):

    {
        "query": "10018304",
        "_id": "F0P408N6V4",
        "_score": 7.2257814,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [],
            "unii": "F0P408N6V4"
        }
    },

    {
        "query": "10038867",
        "_id": "2S9ZZM9Q9V",
        "_score": 9.657343,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [],
            "unii": "2S9ZZM9Q9V"
        }
    },

Notes for myself on generating queries like this with x-bte/BTE

I think doing this as non-batch is easier:
- To add the input IDs: {{ queryInputs }} can be used in parameters (think external apis like biolink/monarch)
- May involve some wrap, playing around with quotation marks and escaping \ to get the single-quotes
I'm less sure about being able to generate the batch-queries properly...even though batch-queries are theoretically possible (my example uses 2 meddra_code values)
- how many unique values can this BioThings feature handle?
- can I figure out how to get the multiple IDs formatted correctly? (wrap to generate a string, setting the delimiter to ||...)
- batch-size-limit: caused by the url-character limit
  - and this'll be set for the whole-api, unless we implement something for individual operations (which may be a bit complicated by the deployment situation?)

newgene · 2023-10-27T23:06:44Z

@colleenXu jmespath does not add or remove hits, only transform hits given some critieria. If you want to modify the hits, you should modify your query. In your case above, you can include aeolus.indications.count:>20 into your query, then all hits should contain at least one count>20 item under indications array. This should serve the purpose if I understand correctly.

colleenXu · 2023-10-30T04:50:24Z

@newgene I tried adding this two ways: using a "no-scopes" query and post_filter. Both didn't seem to work: the responses were basically the same as before.

The responses are basically the same as above

"no-scopes" query and response

curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=aeolus.indications%2Caeolus.unii&jmespath=aeolus.indications%7C[%3F(count%3E%6020%60)%20%26%26%20(meddra_code%3D%3D%2710018304%27%20%7C%7Cmeddra_code%3D%3D%2710038867%27)]' \
--header 'Content-Type: application/json' \
--data '{
  "q": [
          "aeolus.indications.meddra_code:10018304 AND aeolus.indications.count:>20", 
          "aeolus.indications.meddra_code:10038867 AND aeolus.indications.count:>20"
        ],
  "scopes": []
}'

Response still has the hits that don't meet the criteria:

    {
        "query": "aeolus.indications.meddra_code:10018304 AND aeolus.indications.count:>20",
        "_id": "F0P408N6V4",
        "_score": 8.225781,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [],
            "unii": "F0P408N6V4"
        }
    },

    {
        "query": "aeolus.indications.meddra_code:10018304 AND aeolus.indications.count:>20",
        "_id": "2S9ZZM9Q9V",
        "_score": 7.137364,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [],
            "unii": "2S9ZZM9Q9V"
        }
    },

post-filter

Added post_filter parameter, set to aeolus.indications.count:>20

curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=aeolus.indications%2Caeolus.unii&post_filter=aeolus.indications.count%3A%3E20&jmespath=aeolus.indications%7C[%3F(count%3E%6020%60)%20%26%26%20(meddra_code%3D%3D%2710018304%27%20%7C%7Cmeddra_code%3D%3D%2710038867%27)]' \
--header 'Content-Type: application/json' \
--data '{
  "q": ["10018304", "10038867"],
  "scopes": "aeolus.indications.meddra_code"
}'

Response still has the hits that don't meet the criteria:

    {
        "query": "10018304",
        "_id": "F0P408N6V4",
        "_score": 7.2257814,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [],
            "unii": "F0P408N6V4"
        }
    },

    {
        "query": "10018304",
        "_id": "2S9ZZM9Q9V",
        "_score": 6.137364,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [],
            "unii": "2S9ZZM9Q9V"
        }
    },

newgene · 2023-10-31T03:44:55Z

@colleenXu you have additional filter criteria in jmespath as jmespath=aeolus.indications|[?(count>20) && (meddra_code=='10018304' ||meddra_code=='10038867')], so if indications returns as empty, it's due to these criteria, not the count:>20 which you have already filtered out.

colleenXu · 2023-10-31T17:49:06Z

@newgene

Okay....but I still can't figure out: if the hit's aeolus.indications is empty, how to remove the aeolus.unii field or remove the hit...

(ref: this earlier post)

colleenXu · 2023-11-01T17:54:12Z

(CC @newgene)

This is the info from our conversation:

the hits are tied to the q part of the query, so modifying that may be useful
But the logic in q works differently from the logic in jmespath:
- we want logic like jmespath: a single aeolus.indication element should fulfill both criteria: (1) meddra_code value is one of the ones I asked for, and (2) the count > 20.
- But when using q, some hits are problematic: they don't have any aeolus.indication elements that meet both criteria at the same element (some elements have the meddra_code and others have the count >20).
I'm unsure on whether post_filter / filter would be helpful here. I know filter isn't live yet (upcoming biothings sdk update) and I dunno if post_filter is live...

We tried setting the q field to be identical to the jmespath info, but it seemed to result in the same behavior as the previous tries.

click for info

So the jmespath parameter is: aeolus.indications|[?(count>`20`) && (meddra_code==`10018304`||meddra_code==`10038867`)]

And we set the request body to something very similar:

{
  "q": [
          "aeolus.indications.count:>20 AND (aeolus.indications.meddra_code:10018304 OR aeolus.indications.meddra_code:10038867)"
        ],
  "scopes": []
}

so the full query was:

curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=aeolus.indications%2Caeolus.unii&jmespath=aeolus.indications%7C[%3F(count%3E%6020%60)%20%26%26%20(meddra_code%3D%3D%6010018304%60%7C%7Cmeddra_code%3D%3D%6010038867%60)]%20' \
--header 'Content-Type: application/json' \
--data '{
  "q": [
          "aeolus.indications.count:>20 AND (aeolus.indications.meddra_code:10018304 OR aeolus.indications.meddra_code:10038867)"
        ],
  "scopes": []
}'

And the responses have the same issue:

    {
        "query": "aeolus.indications.count:>20 AND (aeolus.indications.meddra_code:10018304 OR aeolus.indications.meddra_code:10038867)",
        "_id": "F0P408N6V4",
        "_score": 8.225781,
        "aeolus": {
            "_license": "http://bit.ly/2DIxWwF",
            "indications": [],
            "unii": "F0P408N6V4"
        }
    },

colleenXu · 2024-04-09T21:10:50Z

The MyChem-query-level limit (aeolus.indications.count > 20) is now implemented in the reverse direction too in Dev/CI!

Adding the new parameter jmespath_exclude_empty: true removed the hits that didn't match both criteria (count > 20 AND meddra field's value matches the input ID) - so BTE can parse the API response without issues. Commits:

this commit + merge fix in biolink-4-update (used by dev/CI in override for biolink-4/treats-refactor work)
this commit in special-reverses branch (development stuff)

Thanks to @newgene @DylanWelzel for the BioThings SDK/MyChem update

So the current situation in Dev/CI:

BTE now retrieves aeolus.indications.count for aeolusTreats/aeolusTreats-rev operations (ref: commit). The x-bte annotation maps this field to a TRAPI edge-attribute biolink:evidence_count. The value of this edge-attribute is currently always an array of ints (noted in issue 1 of this comment)
BTE has a "hard-coded"/MyChem-query-level limit for those operations: aeolus.indications.count > 20.

colleenXu · 2024-05-02T03:57:19Z

@tokebe @andrewsu

I know we've been discussing the aeolus edge-attribute format (flattening arrays into ints) in the edge-attribute constraint issue (part 1 here, and decision here). But I think it'd be make sense to add it to this issue and track its deployment here.

What do you think?

colleenXu · 2024-05-02T03:59:28Z

And a note - because the hard-coded limit of > 20 is for individual records, BTE won't return an edge for the following theoretical edge case:

individual record counts are <20
but BTE/NodeNorm would have merged records together and after the flattening/summation, the edge's count would have been > 20

I asked Andrew, and he said that this is fine for now.

colleenXu · 2024-05-02T20:15:41Z

Addressed by this commit directly to main: biothings/bte_trapi_query_graph_handler@b0fc94d

I've confirmed that the flattening/summation works as-intended :)

Example based on the example in Part 1 here

Example query

Send to MyChem thru BTE: http://localhost:3000/v1/smartapi/8f08d1446e0bb9c2b323713ce83e2bd3/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["UNII:01K63SUP8D"],
                    "categories":["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:applied_to_treat"]
                }
            }
        }
    }
}

Previously, we'd get edges from the aeolus operations that look like this:

                "dd9daae5b03bcad0698ff6669090f36b": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MEDDRA:10070592",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": [
                                875
                            ]
                        }
                    ],


                "1feea171db6394cfd9bcb20deae0ad9a": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MONDO:0002050",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": [
                                733,
                                42
                            ]
                        }
                    ],

After the commit, these edges look like this: the edge-attribute values are ints and sums if there were values from multiple records.

                "dd9daae5b03bcad0698ff6669090f36b": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MEDDRA:10070592",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": 875
                        },


                "1feea171db6394cfd9bcb20deae0ad9a": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MONDO:0002050",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": 775
                        },

colleenXu · 2024-06-14T01:51:55Z

The flattening/summing code was deployed today to Prod as part of the Octopus release. I tested and it's live.

Summary of what was done in this issue:

aeolusTreats/aeolusTreats-rev operations (ref):
- now include aeolus.indications.count field, mapped to biolink:evidence_count
- only return documents/records with aeolus.indications.count > 20 (x-bte uses jmespath)
BTE updated to flatten the biolink:evidence_count value into an int (sum if multiple values). ref: described in part 1, decision, and implement/test comment directly above this one

Noting one edge case (pasted from above comment):

And a note - because the hard-coded limit of > 20 is for individual records, BTE won't return an edge for the following theoretical edge case:

individual record counts are <20

but BTE/NodeNorm would have merged records together and after the flattening/summation, the edge's count would have been > 20

I asked Andrew, and he said that this is fine for now.

andrewsu mentioned this issue Sep 15, 2023

AEOLUS as source of incorrect 'treats' assertions NCATSTranslator/Feedback#546

Closed

colleenXu added the data source label Sep 28, 2023

colleenXu added the x-bte label Oct 21, 2023

colleenXu mentioned this issue Oct 27, 2023

summary: x-bte-refactoring related issues #750

Open

colleenXu referenced this issue in NCATS-Tangerine/translator-api-registry Oct 27, 2023

mychem: add count constraint to aeolus-treats

ac076b8

issue with adding this constraint to the reverse operation, see https://github.com/biothings/biothings_explorer/issues/727\#issuecomment-1776677828

colleenXu mentioned this issue Dec 22, 2023

Data source: repoDB biothings/pending.api#77

Closed

colleenXu mentioned this issue Mar 18, 2024

implement edge attribute constraints #795

Open

colleenXu mentioned this issue Apr 2, 2024

jmespath: removing higher-level objects based on lower-level matches biothings/biothings.api#325

Open

colleenXu added the On CI Related changes are deployed to CI server label Apr 9, 2024

colleenXu added On CI -> Test and removed On CI Related changes are deployed to CI server labels May 3, 2024

tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels May 9, 2024

colleenXu mentioned this issue May 22, 2024

for entity-based record structures (BioThings APIs), "reverse" operations cannot retrieve the same information as "forward" operations #316

Open

colleenXu closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tune the use of AEOLUS indications from mychem.info #727

tune the use of AEOLUS indications from mychem.info #727

andrewsu commented Sep 15, 2023

mbrush commented Oct 23, 2023

andrewsu commented Oct 23, 2023

colleenXu commented Oct 24, 2023

andrewsu commented Oct 24, 2023

colleenXu commented Oct 24, 2023 •

edited

Loading

colleenXu commented Oct 27, 2023 •

edited

Loading

newgene commented Oct 27, 2023

colleenXu commented Oct 30, 2023

newgene commented Oct 31, 2023

colleenXu commented Oct 31, 2023 •

edited

Loading

colleenXu commented Nov 1, 2023 •

edited

Loading

colleenXu commented Apr 9, 2024 •

edited

Loading

colleenXu commented May 2, 2024

colleenXu commented May 2, 2024

colleenXu commented May 2, 2024 •

edited

Loading

colleenXu commented Jun 14, 2024 •

edited

Loading

tune the use of AEOLUS indications from mychem.info #727

tune the use of AEOLUS indications from mychem.info #727

Comments

andrewsu commented Sep 15, 2023

mbrush commented Oct 23, 2023

andrewsu commented Oct 23, 2023

colleenXu commented Oct 24, 2023

andrewsu commented Oct 24, 2023

colleenXu commented Oct 24, 2023 • edited Loading

colleenXu commented Oct 27, 2023 • edited Loading

newgene commented Oct 27, 2023

colleenXu commented Oct 30, 2023

newgene commented Oct 31, 2023

colleenXu commented Oct 31, 2023 • edited Loading

colleenXu commented Nov 1, 2023 • edited Loading

colleenXu commented Apr 9, 2024 • edited Loading

colleenXu commented May 2, 2024

colleenXu commented May 2, 2024

colleenXu commented May 2, 2024 • edited Loading

colleenXu commented Jun 14, 2024 • edited Loading

colleenXu commented Oct 24, 2023 •

edited

Loading

colleenXu commented Oct 27, 2023 •

edited

Loading

colleenXu commented Oct 31, 2023 •

edited

Loading

colleenXu commented Nov 1, 2023 •

edited

Loading

colleenXu commented Apr 9, 2024 •

edited

Loading

colleenXu commented May 2, 2024 •

edited

Loading

colleenXu commented Jun 14, 2024 •

edited

Loading