Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify input requirements for initial geocoders for address push - UW, PostGIS, Degauss, Nominatim #330

Open
rtmill opened this issue Mar 8, 2024 · 3 comments
Labels
exploratory Changes that require some research first.

Comments

@rtmill
Copy link
Collaborator

rtmill commented Mar 8, 2024

No description provided.

@rtmill rtmill added exploratory Changes that require some research first. Use Case A development-driving use case and removed Use Case A development-driving use case labels Mar 8, 2024
@tibbben
Copy link
Collaborator

tibbben commented Mar 8, 2024

I have some detailed notes on my effort to stand this up in a containerized local environment ... Kubernetes. Please reach out: [email protected].

@kzollove
Copy link
Collaborator

kzollove commented May 3, 2024

- ID input requirements for geocoder testing
	○ Jim's analyst is running these tests, modified DeGauss, all testing in a jupyter notebook
	○ Benchmarking project: Where do these geocoders perform well, where they are off, where they totally fail
- Test dataset for geocoding:
	○ 104K public service locations, ~30K rural, ~3K Rural-Tribal and 26K non-tribal Rural
	○ Next step: parsing of address strings for "junk" addresses… will keep them in as negative control.
	○ How do we have an address standardization software
		§ Address standardization: formatting, and tests is this a real address
		§ Tim: no one address standardizer will work for all four of those geocoders
			□ The PostGIS geocoder has a standardizer in it
			□ Wrote one for Nominatim
			□ TODO (where do we start this"grid"/catalog of standardizers) :Andrew suggests a catalog of standardizers to suit everyone's needs, at least start collecting them, understanding their performance, but not fully described on our end.
		§ This is a pre-geocoder workflow step… standardized and then fed in to the geocoder.
		§ The three geocoders do not have an ingrained standardizer
		§ ArcGIS streetmap was recommended… not OSS, not free, requires license.
	○ Cybersecurity vulnerability assessment of the geocoders:
		§ Each are deployed on docker, which is fine
		§ However, the docker dependencies are >5years old and have some vulnerability elements
		§ Will reach out to Cole Brokamp (DeGauss), who to reach out to for Nominatim?
		§ Jimmy thinks they will maintain docker image locally and then pass over to Palantir
		§ Have decided on the upgraded version of the linux for the image, then will pass it back to Palantir
		§ CLAD project goal is the get the best geocode across any location, using multiple geocoders… the more geocoders the harder to maintain
		§ Tim suggests: If youre ready to pay for ArcGIS, go there. If not use OSM.

Tims approach is to do whatever is possible with OSM and then fall back on ArcGIS

@jaygee-on-github
Copy link
Collaborator

@kzollove, @jphuong, @AEW0330: it looks like from Jim's presentation and Kyle's write up that longitude and latitude will have a provenance. Where are you thinking we might capture this provenance? It could be both a workflow and its execution environment(s). Given Andrew's suggestion that we might use RO-Crate (which describes provenance and workflows on top of schema.org and JSON-LD), where might these knowledge graphs go? And will we be able to author them with the same authoring tool we are exploring at #324?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exploratory Changes that require some research first.
Projects
None yet
Development

No branches or pull requests

4 participants