-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Organisation identifiers (for discussion) #41
Comments
Wouldn't it be great to think in URLs here? E.g, Oxfam could then be |
@marians In the IATI ORG ID standard we've designed it to work for legacy systems where URLs are not valid values of a database field, and to avoid being tied to any particular domain for resolving identifiers, but so that the pattern of identifiers can be very easily converted into URLs and resolved by any number of services. See for example: http://opencirce.org/org/ So - URL compatible string-based IDs has been guiding principle. |
@practicalparticipation Thanks for the comment & info! |
@markbrough Easy questions first: 3) How should the identifiers be identified as being created by Public Bodies - just a prefix like PB-? MISC-PB-{ID} At the moment the registry of namespaces for IATI is just a spreadsheet - but there's the goal of making this a shared list and getting some better services for managing it in future, including services that can help resolve namespace prefixes into URLs for getting more information on any named entity. Now the trickier ones: *(1) Does this sound sensible? Is it a good idea? Is there a better alternative? * Obviously the best case is whenever official lists of bodies actually exist - but we know this is often not the case. But presumably the mapping element of this would mean that if an official list did become available it could be mapped to these 'incubator ids' - and if the service provided a 'canonical ID' API that when called with an ID would check if a better one had come along, or if the ID requested had been merged with another - would return a canonical ID, we would get to a far better place in terms of users being able to find when they are dealing with data about the same organisation. I doubt we'll get many original govt publishers of data using these IDs, but the potential for them to harmonise how re-users of the data represent the information they have is interesting. The risk of false positives and bad matches getting into data and leading to wrong conclusions 'downstream' is fairly big with this - so thinking about provenance or 'certainty' information that an API might return could be important. *(2) Will the fuzzy matching be accurate enough to be useful? Is it likely to assign organisations an incorrect code? * I think this is going to be a challenge and a risk. When we get down to names of schools, health services etc. then real problems of name clashes are likely to occur. At the level of departments the problem is less likely. Thinking about the other data that might be used in fuzzy matching, like 'city of head office', 'website address' etc. that could help firm up matches might be useful. |
For IATI this issue is fast heading away from being a problem towards becoming a road crash. So my current twopence is: We've spent the last year or so searching for a methodology that has both pragmatic logic and political traction. There's nothing substantial out there and a depressing lack of interest from a range of bodies whom one would think would need this as badly as IATI does. We've always said that it is not IATI's business to curate such a methodology: it needs a wider home. But we're reaching a point where we've got to do something. The way we are going to solve this problem is by thrashing as many ideas around as possible - so this is an excellent thread. Good stuff @markbrough ! I'm not convinced that a system based on the machine interpretation of spelling, however sophisticated the algorithm is, is going to be efficient enough.
Here's another imperfect idea... https://docs.google.com/spreadsheet/ccc?key=0AnWngmdQt3stdGNDVDB5SlZrWVNkd0w4a1FWX0xTY2c#gid=0 I've scraped the CIA Heads of States list and built a (tidied) list of names of current departments and added a code which is a mixture of the name and a counter (which allows new names to be added manually in at least some kind of logical order). In IATI the Rwandan Ministry of Finance and Economic Planning would become something like MISC-PB-RW-FI18 Problem with this coding is that the code is language specific. Not a good idea for a global list. With this approach the list is centrally curated and manual intervention would be required to create a new code. Is this a good or bad thing? While names of government departments may be maintained with relative ease, government agencies are whole different ball game. |
The Sunlight Foundation proposes using a UUID (and possibly scoping the UUID to a country) and then using a reconciliation/ID resolution service to avoid duplicates: https://github.com/opencivicdata/opencivicdata/wiki/Entity-ID-Resolution-Service |
Note connection here with #23 and discussion around keys ... |
Considering that government structure changes quite frequently in most countries, I think this project should have some instructions or guidelines on how to handle the merge, split, and transformation of public bodies. We could take as an example the policy paper from OpenCorporates on How OpenCorporates should handlecompany number problems. There should be some identifiable parallels on how they deal with company data and how we deal with public organizations data. |
Hey @augusto-herrmann thanks for bringing this thread back to life, I had forgotten about it :) A couple of years ago @practicalparticipation was commissioned by IATI to write a discussion paper on this which is worth taking a look at. It explores a number of different approaches. There is a bunch of discussion on that paper here. My own view now is that we should be using (existing) government Charts of Accounts as the primary source for these codelists (rather than the approach I had set out above). I know that this approach would be imperfect, but my argument is that it is at least a solid start to dealing with this problem. I haven't really seen anything to dissuade me of this argument over the last couple of years. |
An update on this issue: we now have codes for 50 countries, based on country budgets or charts of accounts, extracted and published here: The source repository is here: According to the methodology detailed on the site, the organisation identifier for |
This is an idea that I've been thinking about for a while. I discussed it with @rgrp a couple of weeks ago and wanted to share it with the list to see what everyone thinks.
The short version: could public bodies be used to generate usable organisation identifiers?
Background
The IATI Standard is an XML based format for sharing detailed information about aid projects. Fundamentally, the model shows resource flows from one organisation to another, with various classifications in between and many financial transactions as part of each project. So like this:
For the private sector and NGOs, the methodology for uniquely identifying organisations is:
Jurisdiction
-National registration body
-Number
e.g. for Oxfam GB, registered at the Charity Commission, with reg number 202918:
GB-CHC-202918
For governments, the following methodology is used:
Jurisdiction
-OECD/DAC Agency code
e.g. for the UK's Department for International Development:
GB-1
For multilaterals, we use the following methodology:
OECD/DAC Channel code
e.g. for the World Bank's International Development Association (IDA):
44002
Problems
Agency codes
Miscellaneous
.Channel codes
a) World Health Organisation - core voluntary contributions account
b) World Health Organisation - assessed contributions
--> but there isn't one for just "World Health Organisation", for example if you're contracting them to deliver a project.
Many organisations publishing IATI data will therefore struggle to provide unique organisation identifiers for many of the public sector / international organisations that they are working with.
Rationale
BW-1
orBW-21
, the Botswana Ministry of Finance just needs a code).Proposal
Fuzzy reconciliation / text matching of organisations, with an API that assigns an existing identifier where available, and creates a new one where it's not available
MINISTRY OF FINANCE
BW
(for Botswana)en
2013-07-05
the API responds with one of the following (possibly using HTTP status codes?):
a) Organisation found => use code
BW-1
b) Organisation not found => created code
BW-21
it also stores the data about the last recorded transaction, so that other people know that that organisation may have existed on that date.
Another source could be Charts of Accounts, existing lists (like those that exist on PB already), budget documents, and structured spending data, e.g. from OpenSpending.
Dealing with duplicates
This will probably lead to some duplicates being created. There could be some manual reconciliation for this. Organisations could have a primary identifier and several secondary identifiers that were used by duplicate organisations..
Dealing with changing organisations
Organisations can be created / deleted / merged in the real world. This should probably lead to:
a) created - a new identifier gets created;
b) merged - a new identifier gets created for the new organisation; and (manually) the old organisations are linked / related to the new organisation;
c) deleted - the identifier continues to exist, because old (and possibly future) data will still refer to it. However, it should be (manually) marked as no longer existing, pointing to a successor organisation of one exists (with some flag to explain whether it's a wholly .
Questions
PB-
?OECD-DAC codelists:
IATI Standard:
The text was updated successfully, but these errors were encountered: