Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate entries #10

Open
ibuda opened this issue Jul 11, 2019 · 4 comments
Open

Duplicate entries #10

ibuda opened this issue Jul 11, 2019 · 4 comments

Comments

@ibuda
Copy link

ibuda commented Jul 11, 2019

I spotted 912 duplicated entries, for example:
12318 | 27.86 | 46.50 | albesti | VASLUI | VS | 1171.0 | Nord-Est
27.55 | 46.70 | albesti | VASLUI | VS | 239.0 | Nord-Est
22.51 | 47.32 | almasu mic | BIHOR | BH | 552.0 | Nord-Vest
22.14 | 47.17 | almasu mic | BIHOR | BH | 209.0 | Nord-Vest

Most probably they were inserted from different data sources at different points in time.
I think it would be good to drop all duplicates, and keeping the ones with the maximum population value.

I'm open to discussion if there is anything I misunderstood.

@necenzurat
Copy link
Member

necenzurat commented Jul 30, 2019

I think it would be great to update the source with the official one: http://data.gov.ro/dataset/siruta

@sergiubologa
Copy link

sergiubologa commented Jul 31, 2019

http://data.gov.ro/dataset/siruta

@necenzurat Your link to official data is broken, although the text displayed is the correct link.

@zhgabor
Copy link

zhgabor commented Sep 7, 2019

it would be nice to have some sort of automation from here
(http://colectaredate.insse.ro/senin/classifications.htm?selectedClassification=&action=&classificationName=SIRUTA)

i found the latest list from 2018

@necenzurat
Copy link
Member

@zhgabor yea, MDB and DBF.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants