Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organization name and identifier consistency across dataset #28

Open
duncandewhurst opened this issue Feb 26, 2020 · 4 comments
Open

Organization name and identifier consistency across dataset #28

duncandewhurst opened this issue Feb 26, 2020 · 4 comments
Labels
dataset checks Relating to dataset-level checks new check

Comments

@duncandewhurst
Copy link

In the Bandung data there are examples where the same identifier is used for different organization names and also examples where different identifiers are used for the same organization name, e.g.

id name name_en contracting_processes
1.03.01.01 Dinas Bina Marga dan Pengairan Highways Agency and Irrigation 3342
1.03.01.01 Dinas Pekerjaan Umum public Works Service 450
name name_en id contracting_processes
Dinas Perhubungan Department of Transportation 1.07.01.01 607
Dinas Perhubungan Department of Transportation 2.09.01.01 542

It would be good to have checks for these issues in Pelican.

@jpmckinney
Copy link
Member

jpmckinney commented Feb 26, 2020

In the case of same ID, my understanding is:

  1. The compile step will have merged the parties objects with the same id into one, thus preserving only one name
  2. If there is an organization reference using the other name, the relevant name consistency check will then fail

So, I think the first case is already checked.

In the case of same name in the parties array, that is a new check.

@duncandewhurst
Copy link
Author

Sorry, I should have been clearer I was proposing a dataset level check rather than a compiled release level check.

In the case of Bandung they don't use .identifier but they do put what look like 'real' identifiers in .id, so for the general case it would be good to check for these issues across .identifier/.name and .id/.name, although for the latter there may be false positives if the publisher is just using sequential ids per release.

@jpmckinney
Copy link
Member

Aha, yes, consistent party name-ID pairs across contracting processes is a good check.

@jpmckinney jpmckinney changed the title New checks: organization name and identifier consistency Organization name and identifier consistency across dataset Mar 4, 2021
@duncandewhurst
Copy link
Author

In the Bandung data there are examples where the same identifier is used for different organization names

This issue is also present in the UK FTS data, e.g.

identifier/id names
na NHS England,NHS England & NHS Improvement ,NHS Kent & Medway CCG,NHS Derby and Derbyshire Clinical Commissioning Group,NHS Coventry & Rugby CCG,NHS England - Specialised Commissioning,Midlands Partnership Foundation Trust,NHS Coventry & Rugby Clinical Commissioning Group,NHS Kent and Medway Clinical Commissioning Group ,The National Health Service Commissioning Board (which uses the operational name“NHSEngland”),Nottingham CityCare Partnership,NHS Nottingham & Nottinghamshire CCG,NHS Derby and Derbyshire CCG,NHS Kent and Medway CCG ,The NHS Commissioning Board operating as NHS England,NHS Lincolnshire Clinical Commissioning Group,NHS Warwickshire North Clinical Commissioning Group,NHS England and NHS Improvement,NHS Lincolnshire CCG,Norfolk and Waveney Clinical Commissioning Group,NHS England & NHS Improvement,NHS Nottingham & Nottinghamshire Clinical Commissioning Group,NHS England - Midlands
UK Hyde Housing Association (The Hyde Group),Mace,Ministry of Defence, Land Equipment, Dismounted Close Combat (DCC),Department for Business Energy and Industrial Strategy,Devon County Council

@jpmckinney jpmckinney transferred this issue from open-contracting-archive/pelican Sep 14, 2021
@jpmckinney jpmckinney added dataset checks Relating to dataset-level checks new check labels Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset checks Relating to dataset-level checks new check
Projects
None yet
Development

No branches or pull requests

2 participants