-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
advice for sharing ORCIDs via dwc - question submitted via dwchour form 7/31/2019 12:47:41 #144
Comments
Hi @tdwg/dwc-qa @stanblum @dshorthouse Rod Page, @baskaufs please see great move toward better standard of practice for sharing/documenting vouchering expectations and guidelines with researchers -- in this tweet from entomologist Andy Deans at the Frost Museum (Penn State) https://twitter.com/FrostMuseum/status/1153732132591853570 in the guidelines, Andy shares a link to a sample data collection sheet he recommends. Note this string in the spreadsheet "orchidID":"https://orcid.org/0000-0002-2119-4663" My question is the o-r-c-h-i-d part? is this standard ("orchidID")? |
Not standard. The issuing organization (facing a clear branding/ identity challenge) are clear that these identifiers are intended to be referred to as ‘ORCID IDs’. Sent with GitHawk |
So what recommendation do we make to Andy Deans for his protocol form? He's encouraging use of an ORCID, yay. Where does it go in DwC? Does this require dwciri @baskaufs? |
There's the ORCID branding issue - they prefer it be called ORCID ID when referring to the identifier and not the organization - but there's also how to best express the content of data cells for our own uses. A key:value pair as indicated in Andy's sample spreadsheet in dwc:recordedBy or dwc:identifiedBy would be buried because we don't expect it (many as an array?) to be present. It's unlikely anyone or any machine would take action on these unless a consumer were to write a custom regex. We do have dwciri:recordedBy and dwciri:identifiedBy, a namespace for non-literal objects in which we CAN put content like https://orcid.org/0000-0002-2119-4663. That allows us to have many collectors per specimen record and subsequently permits something like a JSON-LD representation, eg https://bloodhound-tracker.net/occurrence/477976412.json. AFAIK, no one is doing anything with these dwciri non-literal equivalents, including GBIF. But...we're slipping into 1:many territory here. Andy desires a single spreadsheet view so as to simplify the task for users and processors of his spreadsheet. I recommend that he stick with the literal strings as is always done in recordedBy, eg "Andy Deans; Daniel H. Janzen". If he desires more to help push along our need for formal, machine-readable recognition thru ORCID IDs (or other), a spreadsheet is not the best place to capture this. As it happens, I have finally been (quietly) plodding on a DwC-A extension for AgentActions to be used in an IPT. Anne Thessen @diatomsRcool would like to see this task completed as a product of the RDA/TDWG Interest Group. This is nowhere near ready for prime-time. You'll at least see where we're going with: https://github.com/tdwg/attribution/tree/master/dwc. I'm hoping this will be done in time for a demo at the biodiversity_next pre-conference workshop on Authority Management of People Names, WT65 https://biodiversitynext.org/pre-conference/ |
Thanks @dshorthouse. So, in other words, there's no good place right now inside DwC for Andy to put the ORCID. What do we suggest to him now? I see we need at least:
- [ ] capture the literal strings (eg "Andy Deans; Daniel H. Janzen")
- [ ] suggestions for Andy for storing the ORCID for each person in his own database, so he can share it when the AgentsActions extension
- [ ] what to call the field in the meanwhile (not orchidID) in the spreadsheet Andy shares with students/researchers, etc.
(and note that with this "email" reply, I learned that Markdown is not supported via email. Hence, the "would-be" check boxes - haha).
|
It doesn't answer your questions, but iNaturalist and GBIF have some discussion on this here: gbif/occurrence#89, and a plan for an interim GBIF-namespace term. Today, iNaturalist's exported research-grade observations contain 93 unique ORCIDs across around 74,000 observations. |
One of the tasks for the pre-conference workshop on people's names is to scope out the need for a TDWG task group on this subject. The question being, do we need to make changes to the existing standards to better support name information? It seems from this conversation that the answer is yes. |
Thanks @MattBlissett @timrobertson100. I'm wondering if we need an "interim" protocol/method for TDWG standards. I get why GBIF came up with an interim solution using GBIF-namespace term (we did similar at iDigBio. But, seems that TDWG could have a protocol for doing just this when needed. (Something to discuss). |
To @adeans, note the above conversation relevant to your ORCID ID capture in your spreadsheet. Also see gbif/occurrence#89 and gbif/portal16#342 for some insights on ORCID ID data capture and potential use. |
@MattBlissett I'm surprised that iNaturalist and GBIF have taken this route. While I think incorporation of ORCID IDs in occurrence data is exactly what we want, a branded, DwC look-alike is best avoided (why wasn't this term called "ORCID ID"?!). It may have set a precedent here for Andy Dean's approach and it confuses the DwC standard. We can perhaps get away with it with iNaturalist => GBIF because the former makes use of OAuth2 & the response from ORCID transparently provides that ORCID ID on the user's behalf. iNaturalist users need not ever know their ORCID ID - it's a horrible string to type. Plus, I'm assuming that there will only ever be one agent in iNaturalist's recordedByOrcid. But, what about that which makes iNaturalist observations "research grade"? - wouldn't it be great if those ORCID IDs for people who confirmed the identification of others' observations also flow to GBIF? Would you then make a 1:many identifiedByOrcid for the 3+ confirmed determinations? This may get messy in a real hurry. In Andy's case here, users will need to copy/paste their ORCID IDs & so mistakes will be made. Andy will also have to deal with 1:many in recordedBy and identifiedBy. And, he'll also then have to deal with collector numbers when botanists want to play too. What then about others who want credit for other ways specimens have been handled or prepared? I realize the above is messy and there isn't a solution now. But, I think we best do this with care. A DwC extension appears to be the best way to do this even though the 1:many issue is not pleasing for a spreadsheet tdwg/dwc#101 |
Great insights @dshorthouse, thanks for elaborating on the (current) workflow and potential issues. see also my related comments gbif/occurrence#89 (comment) |
Cool, cool. Watching this and other spaces for recommendations. I think for now I will use names: "Andrew R. Deans | D. H. Janzen", etc. |
To @adeans, don't stop collecting ORCID IDs though! Just create a new column for now. And if possible, store these in your collection mgmt software. Each person (Agent) in your CMS could have a ORCID ID. That way, when applications can make use of them, you'll have them to share. Maybe your column name for now is ORCID_ID (no spaces), and you can put multiple ORCID IDs in there too (separated by | as well). (Yes, copy/paste errors might happen as @dshorthouse points out, but a better problem to have than no ORCID_ID at all). The other challenges (how to share them on export, for example) can be addressed serially. You can't share ORCID IDs if you aren't collecting them. See tdwg/dwc#101 for even more of the challenges surrounding gathering and using people IDs like ORCID ID (or other similar). |
I am happy to read these comments on this interesting and important topic. With respect to use of dwciri:recordedBy, the relevant specification is Section 2.5.1 of the Darwin Core RDF Guide. To paraphrase:
So I think it's this last option that we are talking about here: using dwciri:recordedBy to link to a description of a group of people and their relationships. There are probable a number of ways to create that description - the important thing is to get consensus on how we will all do it. I'm of the general opinion that if there's a way to create a DwC extension for something, there's probably an easy way to convert it to machine readable RDF or JSON-LD (but not necessarily the reverse). So it would make sense to me to develop the spreadsheet extension simultaneously with the graph model for linked data. One final note about the dwciri: properties. Despite their name, the value of those terms doesn't have to be identified by an IRI. They can link to a blank (a.k.a. anonymous) node that is then subsequently described. See Example 16 in the guide. The important point here is that if we create a process for converting something like a DwC extension using spreadsheets to Linked Data, that doesn't necessarily require creating an infrastructure for minting and maintaining identifiers for groups of people. The links and relationships among the (hopefully ORCID ID-identified) people can be established without requiring that. |
Phew! Thank goodness. We may still need to declare one's role in the context of the action executed. Botanists have a primary collector and others listed in recordedBy sat in the truck :)~ We could use a "role" here to contain an integer that describes that pecking order. |
As an example of an approach for handling the problem of ordering of machine-readable data, we can look at the Getty Thesaurus of Geographic Names (TGN) record for China. The TGN maintains a particular order in which names should be displayed, and also notes whether a name is preferred. This is kind of an analog for order lists of collectors or authors, where there is a special note of the primary collector or first author. In the RDF we can see that there is a The point here is that there are relatively simple approaches to making up for the deficiencies that RDF has in describing the order and special characteristics of items on a list. The TGN doesn't live natively as RDF - the RDF is generated from a relational database (I believe). But the Getty has nevertheless managed to expose a relatively large dataset as Linked Data and via a public SPARQL endpoint. With a bit of effort, we could, too. |
A user submitted this information via the Darwin Core Hour webform:
Timestamp: 7/31/2019 12:47:41
Please provide a topic of interest: How to put an ORCID in dwc for dwc:recordedBy
Are you capable of and interested in participating: Yes
Who else would you recommend to participate in the presentation: David Shorthouse, Rod Page, John Wieczorek, Quentin Groom, Steve Baskauf, Stan Blum
What resources can you point to: See https://docs.google.com/spreadsheets/d/1E9SZCb8Yvjf4xLlSDW6JHxV971eNV1CDcni8OaAGOFI/edit#gid=0 and this tweet https://twitter.com/FrostMuseum/status/1153732132591853570
Your name: Debbie Paul
Your email: [email protected]
Your GitHub username: @debpaul
The text was updated successfully, but these errors were encountered: