Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use parent field values from same layer when deduplicating #1554

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

In some cases the parent hierarchy contains tokens which are relevant for deduplication.

For example if we have a geonames record on layer=locality with the name.default="Land Berlin" and also parent.locality=["Berlin"].

In this case we can use the parent field provided by the PIP service to assist in the deduplication process.

@missinglink
Copy link
Member Author

Coming back to this now after thinking it over...

I think it's good to merge, the concept that the parent names at the same layer as the feature can be used for deduplication is logical and unlikely to cause error.

The issue description was missing the concrete example which was included in the tests:

Given the feature geonames:region:2950157 ("Land Berlin", en: "State of Berlin") along with another feature at the same level whosonfirst:region:85682499 ("Berlin") in the results.

We can use the parent.region = ["Berlin"] property of geonames:region:2950157 to establish that it is equivalent to whosonfirst:region:85682499

The parent.region_id = [85682499] property of geonames:region:2950157 further indicates that the two are duplicate, so we could consider only using the IDs, I'm open to both methods.

@missinglink missinglink force-pushed the diff-using-parent-fields branch from 8984c03 to c5021b4 Compare November 11, 2021 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant