Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question results in "DriverImportance" = 1 everywhere #22

Open
halirutan opened this issue Jan 4, 2025 · 1 comment
Open

Question results in "DriverImportance" = 1 everywhere #22

halirutan opened this issue Jan 4, 2025 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@halirutan
Copy link
Collaborator

The following question returns a dataset where all "DriverImportance" values seem to be 1:

Find the 10 most frequent driver chemicals above a driver importance of 0.6

The CypherQAChain uses the following query

cypher
MATCH (s:Substance)-[r:IS_DRIVER]->(l:Site)
WHERE r.driver_importance > 0.6
RETURN s.name AS ChemicalName, // plot title
r.driver_importance AS DriverImportance, // [0,1], continuous point color from blue to red, with midpoint at 0.5
l.name AS SiteName, // point hover
l.water_body AS WaterBody, l.river_basin AS RiverBasin, // point hover
l.lat AS Lat, l.lon AS Lon // x,y coordinates
ORDER BY r.driver_importance DESC
LIMIT 10
@halirutan halirutan added this to the LangGraph transition milestone Jan 4, 2025
@halirutan halirutan added the question Further information is requested label Jan 4, 2025
@mai00fti
Copy link
Contributor

mai00fti commented Jan 5, 2025

This is a correct result for this question.
A driver_importance of 1 indicates that there is a single substance explaining the total risk for a certain species at the site.
The term for these substances is "single risk driver".

Substances with driver_importance below 1 but above 0.4 (empirical value) are those that contribute only partially to the total risk observed at a site for a given species.
The term for those is "mixture risk drivers".

There are 88 substances with driver_importance=1 distributed across 1208 sites resulting in 5475 records that will be returned by this query.

To avoid confusion here, we may think of an alternative questions.
Suggestions:

  1. Which substances are most frequently observed as single risk drivers?
  2. Which substances are most frequently observed as mixture risk drivers?

I will rethink this example and provide appropriate questions and cypher queries asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants