advice for sharing ORCIDs via dwc - question submitted via dwchour form 7/31/2019 12:47:41 #144

iDigBioBot · 2019-07-31T16:47:45Z

A user submitted this information via the Darwin Core Hour webform:
Timestamp: 7/31/2019 12:47:41
Please provide a topic of interest: How to put an ORCID in dwc for dwc:recordedBy
Are you capable of and interested in participating: Yes
Who else would you recommend to participate in the presentation: David Shorthouse, Rod Page, John Wieczorek, Quentin Groom, Steve Baskauf, Stan Blum
What resources can you point to: See https://docs.google.com/spreadsheets/d/1E9SZCb8Yvjf4xLlSDW6JHxV971eNV1CDcni8OaAGOFI/edit#gid=0 and this tweet https://twitter.com/FrostMuseum/status/1153732132591853570
Your name: Debbie Paul
Your email: [email protected]
Your GitHub username: @debpaul

debpaul · 2019-07-31T16:57:43Z

Hi @tdwg/dwc-qa @stanblum @dshorthouse Rod Page, @baskaufs please see great move toward better standard of practice for sharing/documenting vouchering expectations and guidelines with researchers -- in this tweet from entomologist Andy Deans at the Frost Museum (Penn State) https://twitter.com/FrostMuseum/status/1153732132591853570 in the guidelines, Andy shares a link to a sample data collection sheet he recommends. Note this string in the spreadsheet "orchidID":"https://orcid.org/0000-0002-2119-4663" My question is the o-r-c-h-i-d part? is this standard ("orchidID")?

kcopas · 2019-07-31T18:23:17Z

Not standard. The issuing organization (facing a clear branding/ identity challenge) are clear that these identifiers are intended to be referred to as ‘ORCID IDs’.

_{Sent with GitHawk}

debpaul · 2019-07-31T18:52:09Z

So what recommendation do we make to Andy Deans for his protocol form? He's encouraging use of an ORCID, yay. Where does it go in DwC? Does this require dwciri @baskaufs?

dshorthouse · 2019-07-31T19:22:03Z

There's the ORCID branding issue - they prefer it be called ORCID ID when referring to the identifier and not the organization - but there's also how to best express the content of data cells for our own uses.

A key:value pair as indicated in Andy's sample spreadsheet in dwc:recordedBy or dwc:identifiedBy would be buried because we don't expect it (many as an array?) to be present. It's unlikely anyone or any machine would take action on these unless a consumer were to write a custom regex.

We do have dwciri:recordedBy and dwciri:identifiedBy, a namespace for non-literal objects in which we CAN put content like https://orcid.org/0000-0002-2119-4663. That allows us to have many collectors per specimen record and subsequently permits something like a JSON-LD representation, eg https://bloodhound-tracker.net/occurrence/477976412.json. AFAIK, no one is doing anything with these dwciri non-literal equivalents, including GBIF.

But...we're slipping into 1:many territory here. Andy desires a single spreadsheet view so as to simplify the task for users and processors of his spreadsheet. I recommend that he stick with the literal strings as is always done in recordedBy, eg "Andy Deans; Daniel H. Janzen". If he desires more to help push along our need for formal, machine-readable recognition thru ORCID IDs (or other), a spreadsheet is not the best place to capture this.

As it happens, I have finally been (quietly) plodding on a DwC-A extension for AgentActions to be used in an IPT. Anne Thessen @diatomsRcool would like to see this task completed as a product of the RDA/TDWG Interest Group. This is nowhere near ready for prime-time. You'll at least see where we're going with: https://github.com/tdwg/attribution/tree/master/dwc. I'm hoping this will be done in time for a demo at the biodiversity_next pre-conference workshop on Authority Management of People Names, WT65 https://biodiversitynext.org/pre-conference/

debpaul · 2019-07-31T21:30:09Z

Thanks @dshorthouse. So, in other words, there's no good place right now inside DwC for Andy to put the ORCID. What do we suggest to him now? I see we need at least: - [ ] capture the literal strings (eg "Andy Deans; Daniel H. Janzen") - [ ] suggestions for Andy for storing the ORCID for each person in his own database, so he can share it when the AgentsActions extension - [ ] what to call the field in the meanwhile (not orchidID) in the spreadsheet Andy shares with students/researchers, etc. (and note that with this "email" reply, I learned that Markdown is not supported via email. Hence, the "would-be" check boxes - haha).

MattBlissett · 2019-08-01T07:27:22Z

It doesn't answer your questions, but iNaturalist and GBIF have some discussion on this here: gbif/occurrence#89, and a plan for an interim GBIF-namespace term.

Today, iNaturalist's exported research-grade observations contain 93 unique ORCIDs across around 74,000 observations.

qgroom · 2019-08-01T10:40:18Z

One of the tasks for the pre-conference workshop on people's names is to scope out the need for a TDWG task group on this subject. The question being, do we need to make changes to the existing standards to better support name information? It seems from this conversation that the answer is yes.

debpaul · 2019-08-01T13:35:45Z

Thanks @MattBlissett @timrobertson100. I'm wondering if we need an "interim" protocol/method for TDWG standards. I get why GBIF came up with an interim solution using GBIF-namespace term (we did similar at iDigBio. But, seems that TDWG could have a protocol for doing just this when needed. (Something to discuss).

debpaul · 2019-08-01T13:45:09Z

To @adeans, note the above conversation relevant to your ORCID ID capture in your spreadsheet. Also see gbif/occurrence#89 and gbif/portal16#342 for some insights on ORCID ID data capture and potential use.

dshorthouse · 2019-08-01T13:49:28Z

@MattBlissett I'm surprised that iNaturalist and GBIF have taken this route. While I think incorporation of ORCID IDs in occurrence data is exactly what we want, a branded, DwC look-alike is best avoided (why wasn't this term called "ORCID ID"?!). It may have set a precedent here for Andy Dean's approach and it confuses the DwC standard.

We can perhaps get away with it with iNaturalist => GBIF because the former makes use of OAuth2 & the response from ORCID transparently provides that ORCID ID on the user's behalf. iNaturalist users need not ever know their ORCID ID - it's a horrible string to type. Plus, I'm assuming that there will only ever be one agent in iNaturalist's recordedByOrcid. But, what about that which makes iNaturalist observations "research grade"? - wouldn't it be great if those ORCID IDs for people who confirmed the identification of others' observations also flow to GBIF? Would you then make a 1:many identifiedByOrcid for the 3+ confirmed determinations? This may get messy in a real hurry.

In Andy's case here, users will need to copy/paste their ORCID IDs & so mistakes will be made. Andy will also have to deal with 1:many in recordedBy and identifiedBy. And, he'll also then have to deal with collector numbers when botanists want to play too. What then about others who want credit for other ways specimens have been handled or prepared?

I realize the above is messy and there isn't a solution now. But, I think we best do this with care. A DwC extension appears to be the best way to do this even though the 1:many issue is not pleasing for a spreadsheet tdwg/dwc#101

debpaul · 2019-08-01T13:55:34Z

Great insights @dshorthouse, thanks for elaborating on the (current) workflow and potential issues. see also my related comments gbif/occurrence#89 (comment)

adeans · 2019-08-01T14:07:00Z

Cool, cool. Watching this and other spaces for recommendations. I think for now I will use names: "Andrew R. Deans | D. H. Janzen", etc.

debpaul · 2019-08-01T14:27:04Z

To @adeans, don't stop collecting ORCID IDs though! Just create a new column for now. And if possible, store these in your collection mgmt software. Each person (Agent) in your CMS could have a ORCID ID. That way, when applications can make use of them, you'll have them to share. Maybe your column name for now is ORCID_ID (no spaces), and you can put multiple ORCID IDs in there too (separated by | as well). (Yes, copy/paste errors might happen as @dshorthouse points out, but a better problem to have than no ORCID_ID at all). The other challenges (how to share them on export, for example) can be addressed serially. You can't share ORCID IDs if you aren't collecting them. See tdwg/dwc#101 for even more of the challenges surrounding gathering and using people IDs like ORCID ID (or other similar).

baskaufs · 2019-08-01T14:32:37Z

I am happy to read these comments on this interesting and important topic.

With respect to use of dwciri:recordedBy, the relevant specification is Section 2.5.1 of the Darwin Core RDF Guide. To paraphrase:

The value of a dwciri: property will be a single resource identified by an IRI (a.k.a. URI, blank nodes without an IRI are also allowed).
If there are multiple values for the property, it can be repeated for each value (see Example 20). However, this approach does not provide any easy way to indicate ordering, or the exact role of the person identified by the IRI.
"Alternatively, a single triple can be used to describe the subject if the object is a single resource composed of component resources described using additional RDF triples.", i.e. there is a single value for the property and that value can represent a group. The composition of the group and relationship among the members of the group would be described by other machine-readable statements (not specified by the guide).

So I think it's this last option that we are talking about here: using dwciri:recordedBy to link to a description of a group of people and their relationships. There are probable a number of ways to create that description - the important thing is to get consensus on how we will all do it. I'm of the general opinion that if there's a way to create a DwC extension for something, there's probably an easy way to convert it to machine readable RDF or JSON-LD (but not necessarily the reverse). So it would make sense to me to develop the spreadsheet extension simultaneously with the graph model for linked data.

One final note about the dwciri: properties. Despite their name, the value of those terms doesn't have to be identified by an IRI. They can link to a blank (a.k.a. anonymous) node that is then subsequently described. See Example 16 in the guide. The important point here is that if we create a process for converting something like a DwC extension using spreadsheets to Linked Data, that doesn't necessarily require creating an infrastructure for minting and maintaining identifiers for groups of people. The links and relationships among the (hopefully ORCID ID-identified) people can be established without requiring that.

dshorthouse · 2019-08-01T14:40:00Z

The important point here is that if we create a process for converting something like a DwC extension using spreadsheets to Linked Data, that doesn't necessarily require creating an infrastructure for minting and maintaining identifiers for groups of people.

Phew! Thank goodness. We may still need to declare one's role in the context of the action executed. Botanists have a primary collector and others listed in recordedBy sat in the truck :)~ We could use a "role" here to contain an integer that describes that pecking order.

baskaufs · 2019-08-01T15:08:21Z

As an example of an approach for handling the problem of ordering of machine-readable data, we can look at the Getty Thesaurus of Geographic Names (TGN) record for China. The TGN maintains a particular order in which names should be displayed, and also notes whether a name is preferred. This is kind of an analog for order lists of collectors or authors, where there is a special note of the primary collector or first author. In the RDF we can see that there is a skosxl:prefLabel link to each of the names, and the RDF describing those names includes a gvp:displayOrder property that has a positive integer as its value. For example, tgn_term:159-zh-Latn which has the literal form "Zhongguo" has displayOrder = 1 and we see it as the first item on the list on the human-readable page. You can also see that in the description of the subject resource tgn:1000111, there is the property:value pair gvp:prefLabelGVP tgn_term:159-zh-Latn and that's how we can know that "Zhongguo" should get labeled as "preferred" on the human-readable page.

The point here is that there are relatively simple approaches to making up for the deficiencies that RDF has in describing the order and special characteristics of items on a list. The TGN doesn't live natively as RDF - the RDF is generated from a relational database (I believe). But the Getty has nevertheless managed to expose a relatively large dataset as Linked Data and via a public SPARQL endpoint. With a bit of effort, we could, too.

iDigBioBot added the form submission label Jul 31, 2019

debpaul added the new label Jul 31, 2019

debpaul changed the title ~~Darwin Core Hour Input Form 7/31/2019 12:47:41~~ advice for sharing ORCIDs via dwc - question submitted via dwchour form 7/31/2019 12:47:41 Jul 31, 2019

tucotuco added answered term - Occurrence Pertaining to a term organized in the Occurrence class. labels Sep 6, 2019

tucotuco removed the new label Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

advice for sharing ORCIDs via dwc - question submitted via dwchour form 7/31/2019 12:47:41 #144

advice for sharing ORCIDs via dwc - question submitted via dwchour form 7/31/2019 12:47:41 #144

iDigBioBot commented Jul 31, 2019

debpaul commented Jul 31, 2019

kcopas commented Jul 31, 2019

debpaul commented Jul 31, 2019

dshorthouse commented Jul 31, 2019

debpaul commented Jul 31, 2019 via email •

edited

Loading

MattBlissett commented Aug 1, 2019 •

edited

Loading

qgroom commented Aug 1, 2019

debpaul commented Aug 1, 2019

debpaul commented Aug 1, 2019

dshorthouse commented Aug 1, 2019 •

edited

Loading

debpaul commented Aug 1, 2019

adeans commented Aug 1, 2019

debpaul commented Aug 1, 2019

baskaufs commented Aug 1, 2019 •

edited

Loading

dshorthouse commented Aug 1, 2019

baskaufs commented Aug 1, 2019

advice for sharing ORCIDs via dwc - question submitted via dwchour form 7/31/2019 12:47:41 #144

advice for sharing ORCIDs via dwc - question submitted via dwchour form 7/31/2019 12:47:41 #144

Comments

iDigBioBot commented Jul 31, 2019

debpaul commented Jul 31, 2019

kcopas commented Jul 31, 2019

debpaul commented Jul 31, 2019

dshorthouse commented Jul 31, 2019

debpaul commented Jul 31, 2019 via email • edited Loading

MattBlissett commented Aug 1, 2019 • edited Loading

qgroom commented Aug 1, 2019

debpaul commented Aug 1, 2019

debpaul commented Aug 1, 2019

dshorthouse commented Aug 1, 2019 • edited Loading

debpaul commented Aug 1, 2019

adeans commented Aug 1, 2019

debpaul commented Aug 1, 2019

baskaufs commented Aug 1, 2019 • edited Loading

dshorthouse commented Aug 1, 2019

baskaufs commented Aug 1, 2019

debpaul commented Jul 31, 2019 via email •

edited

Loading

MattBlissett commented Aug 1, 2019 •

edited

Loading

dshorthouse commented Aug 1, 2019 •

edited

Loading

baskaufs commented Aug 1, 2019 •

edited

Loading