Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Use Case]: Geocoding, then linkage with a spatial indicator dataset based on temporal linkage criteria #311

Open
2 tasks
jphuong opened this issue Jan 26, 2024 · 4 comments
Assignees
Labels
Use Case A development-driving use case

Comments

@jphuong
Copy link
Collaborator

jphuong commented Jan 26, 2024

Description

I have a dataset (i.e., Environmental Justice Index, Rural-Urban Commuting Area Codes) that I want to use to represent the social-environmental factors (multiple columns). The ideal output is perhaps multiple mapping tables, but the main item should be a longitudinal table where each record is a person, the time-frame, and the place with its social-environmental factors.

The issue is that I have temporal information of where people lived, but they might move around, so I need to have their locations and location history mapped in a way that can spatially join with the spatial indicator dataset (and other spatial datasets). The linkages need to address the location history for the amount of time they were living in each place, where location histories prior to 2010 will be excluded from the linkage unless they lived in the same place continuously. This likely requires geocoding to take place for each place in the timeline, then linkages to be established to the dataset given by a time-constraint.

Infrastructure

This could technically be documented within the exposure_occurrence tables, and utilize the location history, geocoding, and dataset metadata attribute tables. How to execute the temporal linkage may need some structure. How to assess data quality from the space and time linkages might need some consideration (e.g., count of temporal changes by a person's location history, number of hub node locations, over and under representations).

Timeline

In the next 6-months by August 2024

Credit

Very open to discussing this. We can be active design and testing partners. We could provide sampled data of places of service to start the geocoding. Generating a report is something we can collaborate on as well. Generating a vignette about spatial-temporal linkage assumptions, design with a priori research questions, or limitations in the dataset variables interpretations and use.

Support

Table architecture and deliverable output structure and metadata is definitely something that my team would need help with.

Datasets of Interest

Environmental Justice Index, Rural-Urban Commuting Area, Area Deprivation Index

Depends On

No response

Tasks

@jphuong jphuong added the Use Case A development-driving use case label Jan 26, 2024
@kzollove kzollove moved this to 📃 Proposed in GIS Project Management Feb 9, 2024
@kzollove
Copy link
Collaborator

kzollove commented Mar 15, 2024

Last week: address location/ location_history datasets
pulled HIFLD, randomized address, randomly assigned users. ETL from this table as CSV into location/ location_history

  • sampled 24K of 94K addresses (ideally) coming from rural, urban, and tribal location
  • Should "junk" addresses be injected into the test dataset? PO boxes

Address conventions have changed in US since early 80s, geocoder might not be able to process this (First street vs 1 St)

  • Kristen and Jimmy: is the vocabulary necessary for EJI available in OMOP? If not, what do we need?
  • Select small number of data points from EJI and try to work through them using Polina's SDOH-OMOP Vocabulary
  • Set of heuristics for choosing geocoder and understanding tradeoffs - does this exist? (Daniel Goldberg 2008 geocoding best practices manual)

@kzollove
Copy link
Collaborator

@kzollove
Copy link
Collaborator

Geocoding test dataset

  • parsing ETL into location and location_history
  • Some progress, looking into another address parsing python library
  • Is it accurate parsing? Needs unit tests

Vocabulary from EJI

  • Kristen and Jimmy discussed if vocabulary is all represented in OMOP
  • Some vocabularies may be in the OMOP vocabularies
  • will triage vocabularies first and then consult with Polina

Heuristics of choosing geocoder

  • will need to look more deeply
  • Trying to answer the question: when do you choose which type of geocoder
  • interpolated vs parcel based methods - parcel-based may perform better in rural areas
  • Meta-interpretation of how well is this geocoder performing in this area? Do we have a ground truth to test this performance against
  • When you don't have ground truth how do you need to compare and select from an ensemble of geocoders

How do we ensure all geocoders meet cybersecurity vulnerability requirements

  • How do we upgrade and maintain these software?

@kzollove
Copy link
Collaborator

kzollove commented May 3, 2024

Uma and Jim met with Polina a few weeks back, but still nee to follow up with the EJI variables that they need added to OMOP Vocabularies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Use Case A development-driving use case
Projects
Status: 📃 Proposed
Development

No branches or pull requests

5 participants