-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New term - recordedByID #102
Comments
dwciri:recordedBy could be used for a single HTTP URI. But it wouldn't work for a list like this, since there is supposed to be a separate dwciri:recordedBy property for each person. Same with dwciri:identifiedBy as listed in Issue #101. See http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#2.5_Terms_in_the_dwciri:_namespace . The RDF guide was specifically designed to facilitiate RDF, but I don't think it was really discussed whether a dwciri: term could or should be used in something like a DwC archive that's full of strings to be parsed out. |
To avoid confusion - and since this has been open for nearly 5 years - I will close this issue.
Work is underway to create an AgentAction extension for Darwin Core archives to accommodate more expressive roles. |
Re-opening based on discussions about the valid uses of ID terms without using dwciri: versions, for example, here. |
"ID" terms do not have |
On behalf of CETAF ISTC (a group of informatics experts representing the CETAF and DiSSCo community in Europe): the group supports and recommends the implementation of this proposal, with the remark that they would like this implemented such, that the definition of identifiedByID and recordedByID support multiple references with a defined order. |
Could someone please confirm if there is a mapping to ABCD? |
ABCD 2 has no IDs for agents. |
Thanks @nielsklazenga Updated. |
Suggest change to definition as is align with what is presently shown to users of the IPT in demo mode:
|
This comment also applies here. |
I strongly support the implementation of recordedByID and identifiedByID using ORCIDs. |
We endorse this proposal on behalf of @SiBColombia, adopting the last proposed changes of @dshorthouse |
Done. |
Could someone provide links to GBIF occurrence examples where I was guessing those would mostly be Wikidata identifiers (example) but I just discovered there are also ORCID institutional IDs. Related, but not the same:Which are the proper DwC concepts to reflect source/destiny institutions which are exchanging specimens:
I never used it but I am guessing the Darwin Core Resource Relationship extension might be the answer. Thanks a lot for any hints. |
@abubelinha I'm not aware of any examples of institutional IDs used in either recordedByID or identifiedByID. There are ~17k unique values for these in the downloads from GBIF that Bionomia processes (most recent example: https://doi.org/10.15468/dl.tb97qj). There are heaps of non-identifier values here (eg integers, dates), many malformed URIs, but the majority are ORCIDs and wikidata entity URIs. I suppose some of the wikidata variety could be organizations, but Bionomia does in fact resolve these and limits to "instance of human". I believe ORCID recommends use of identifiers like RoR to declare one's affiliation(s), but does not itself generate ORCID-like IDs for organizations. While I recognize that an institution ID can be used in recordedByID, I'm a little less clear on reasons for similar in identifiedByID despite its definition. |
Thanks a lot @dshorthouse
I have occasionally found specimens received in exchange, where identification labels are not signed by a named person but a group or lab. Some examples with images show up in this GBIF search. Only in one of those occurrences the "related records" tab reveals those SPIP-identified specimens were collected by another institution which later distributed samples worlwide. The problem with ROR is that not that many GBIF providers got one (I mean identifiers for small museums or herbaria themselves, not for the universities they belong to). When searching herbarium only 3 are returned by ROR (all of them in Australia). But I am willing to provide IDs for groups like the aforementioned "SPIP", or -more commonly- collaborative multi-institutional research projects which sometimes appear in provider data as a |
There are two related fields that do not exactly cover what you area asking and a third that is a solution to what you are asking. The related fields are ownerInstitutionCode and otherCatalogNumbers. The ownerInstitutionCode term only works to designate a different source if that source is also the owner, so not a complete solution. The otherCatalogNumbers can contain information about all the other institutions that have a catalog (accession in Botany) number for the same organism, but it is not just the institution, and doesn't help if you don't have the source's catalogNumber. The third term that will definitely work is dynamicProperties, in which you can put a key:value pair to capture the source institution with something like
This doesn't lend itself to the same ease of searchability as a Darwin Core term, because the key itself isn't a standard (the community would have to be very careful about using the same key to mean "received from"), plus the data may be in a JSON string with lots of key:value pairs.
The otherCatalogNumbers term could work for this as well, with the same limitations and caveats about the destination as about the source described above. The dynamicProperties term could work for this too, and thereby distinguish a source institution from destination institutions with something like
where the '|' is used to separate values in a list. |
@abubelinha We did a breakdown of the usage of dwc:recordedByID and dwc:identifiedByID on GBIF last year. You can find a preprint on some of this work here. We didn't spot any institutional IDs used in these two fields on GBIF at the time, but it's possible some ORCIDs or Wikidata IDs (the most commonly used) are not for people. There is a lot of strange data in the I know @dshorthouse that you subset ORCID (based on keywords) for Bionomia, so maybe the ORCIDs submitted to GBIF but not found in your subset (if any) may shed some more light? |
I've run a couple of queries and other than possible Wikidata IDs (as Mat mentions) nothing stands out: 10 random recordedById values by host, excluding invalid URLs, first column is gbifid:
And the same for identifiedById:
The SQL for my own reference:
|
Thanks @MattBlissett. Interesting that there's apparent appetite for inclusion of commercial, for-profit entities like LinkedIn, Google Scholar, Xing, and ResearchGate as through they were identity providers. |
Thanks to all of you for looking into this and providing so interesting answers. I am surprised by the answer to my other question (thanks @tucotuco). I tried a GBIF facet search for ownerInstitutionCode, but it must be a non-searchable concept since nothing comes out. So I can't see examples of how it is currently being used:
Anyway it is not suitable for reflecting exchange of duplicated specimens (where all concerned institutions own one of them). Having the option of providing / searching this institutional info would improve a lot our possibilities of linking those specimens (if I can search GBIF and download a table of foreign institutions' specimens which cite my own institution, it will be much easier for me to join them against our own datasets using i.e. scientificName, fuzzy collector & date ... and then when republishing our datasets I could provide lots of otherCatalogNumbers constructed from that previous GBIF download). Should I create an issue to ask if there is room for this in DwC? Thanks a lot to you all again |
I would say that, before creating issues for new terms, have a look at GBIF clustering (example) to see if that already satisfies the use cases you are thinking of. GBIF is able to suggest specimens that are likely to from the same Organism in the same collecting Event by using the matching tricks you mentioned. What it wouldn't cover are references to specimens elsewhere whose data have not been shared via GBIF. |
Thanks @tucotuco , I am already a big fan of GBIF clustering (I had linked an example of "related records" tab usage above too). But I am also very interested in people helping to establish those cluster relationships too. And that's the most important use case for me: I want to be aware of those changes in our collection's duplicates, and I want to give other curators the chance to be aware of our taxonomic revisions. |
@abubelinha OK, great. In case you are not aware, the means to recommend changes and additions to Darwin Core is explained in the Darwin Core Guidelines for contributing. |
New Term
Submitter: Tim Robertson
Justification: There is no way to identify individuals by e.g. ORCIDs
Proponents: GBIF (already in production), CETAF, DiSCCO
Definition: A list (concatenated and separated) of the globally unique identifiers for the person, people, groups, or organizations responsible for recording the original Occurrence.
Comment: Recommended best practice is to provide a single identifier that disambiguates the details of the identifying agent. If a list is used, it is recommended to separate the values in the list with space vertical bar space ( | ). The order of the identifiers on any list for this term can not be guaranteed to convey any semantics.
Examples:
https://orcid.org/0000-0002-1825-0097
(for an individual);https://orcid.org/0000-0002-1825-0097 | https://orcid.org/0000-0002-1825-0098
(for a list of people).Refines: None
Replaces: None
ABCD 2.06: not in ABCD
The text was updated successfully, but these errors were encountered: