Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] How To Geocode on US State Level? #321

Open
TheCedarPrince opened this issue Feb 22, 2024 · 2 comments
Open

[QUESTION] How To Geocode on US State Level? #321

TheCedarPrince opened this issue Feb 22, 2024 · 2 comments

Comments

@TheCedarPrince
Copy link

Is it possible to geocode just on the US state level rather than specific coordinates? I have often found I do not have coordinate or address level data but rather state level at best and sometimes ZIP level if lucky.

I was looking at AddressCoder but it seems to only support coordinate level data. Would it be possible to write a method that uses the geographic coordinates of a state boundary as the latitude and longitude instead? Indexing on the respective shapefile for a state?

@kzollove
Copy link
Collaborator

It would be possible but we don't support it with a dedicated method.

For reference, this is the table that stores geocoded information and is used in downstream processes like creating CDM extension table exposure_occurrence,

While we recommend using our Gaia function for geocoding, this table can be created and populated by any means.

In your case, you could put any coordinates to represent a state or zip (centroid seems like the ideal choice) and then those patients or care sites that are geocoded could now be used in transformations at that level of granularity.

What we don't have is any way to maintain provenance for those geocoded addresses (populated via function, address-level vs zip-level), which could be nice.

We don't have plans to introduce a function to handle geocoding state or zip level information, though we have heard this brought up before and would accept a PR.

I will make a point to update geocoding documentation to better explain the geom_omop_location table and possibly include some examples to illustrate this.

@TheCedarPrince
Copy link
Author

In your case, you could put any coordinates to represent a state or zip (centroid seems like the ideal choice) and then those patients or care sites that are geocoded could now be used in transformations at that level of granularity.

That's what I was thinking! But I didn't see how to then next filter by state level data using gaia -- would I just have to look through the variables.csv?

What we don't have is any way to maintain provenance for those geocoded addresses (populated via function, address-level vs zip-level), which could be nice.

Like a label to say "this data's lat/long were created artificially" versus "geocoded with degauss"?

We don't have plans to introduce a function to handle geocoding state or zip level information, though we have heard this brought up before and would accept a PR.

What would a PR like this look like? I'd only be interest in state at the moment so in my mind, it would be a function like stateCoder which would:

  1. Grab the location table for a cohort
  2. Look at the state table
  3. Allow a user parameter for exact location handling method
    1. Centroid
    2. Specific location (this could handle the ZIP question too potentially)
  4. Encode location using specified method

What do you think?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants