-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSM-POI: Include brand property #25
Comments
@IritaSee, we already parse the This is the file where we process the Parquet files after running the The idea would be to find the best match of a brand or operator in the name-suggestion-index. Those are the necessary steps:
|
More details about how Spark UDFs are used are in the PR discussion: #69 (comment) |
After doing some initial tests with the brand and operator name matching, it turns out that including the matching in the OSM-POI pipeline directly would increase the runtime significantly. Therefore, we have decided to store the consolidated list of brand and operator names in a separate table in Postgres, which can then be used later in transformation blocks (e.g., on a filtered set of POIs and thus drastically reduce the runtime). Since the canvas development currently has a higher priority for the core team, this issue is up for grabs again. |
OSM objects may include a brand or operator tag, which you can use to derive the brand of a POI.
The issue that exists is that the values of those tags can be spelled differently across several entities (e.g., "McDonalds", "Mc Donald's", or "McDonald's").
There exists a repo that tries to unify the spelling across OSM: https://github.com/osmlab/name-suggestion-index/.
Otherwise, it is an option to find a clean list of worldwide brand names and use string distance measures to connect a POI to a brand.
The text was updated successfully, but these errors were encountered: