-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recordedBy and identifiedByID using OrcID, Wikidata, Library of Congress #3623
Comments
Good suggestion-- what's the timeframe on this do you think? is this part of the current DwC makeover? (sorry didnt dig into the thread too much). This could be part of Genna's work this summer as she's reviewing projects |
This is currently in the comment period for TDWG. https://www.tdwg.org/news/2021/public-review-of-darwin-core-maintenance-proposals/ |
#2141, although that doesn't seem to quite align with what's being proposed for DWC. Burying that in all of the places we might share Agents could get really big really fast. We should revisit #2131 in light of #2141 (comment) before we get too crazy with JSON, should our response be getting crazy with JSON. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This would be one way we could collaborate with @dshorthouse If we pass ORCiD or Wikidata identifier in recordedByID and identifiedByID, could he magically assign our records to the correct people in Bionomia? |
I like magic. Making magic happen would be great. |
Yes! I like magic too. There are ~2.5M examples from Harvard, University of Oslo (HT @rukayaj), and elsewhere that now use Besides ORCID and wikidata, Bionomia also recognizes VIAF, ISNI, ZooBank person IDs, BHL creator IDs, and Library of Congress IDs if shared in Addendum: Forgot to mention that if a wikidata URI comes in, I also check for a DOB >120 years ago or a DOD there before it's allowed to slip through. This doesn't mean you cannot use wikidata URIs for the living if you so desired, it's just one of Bionomia's soft rules. Another soft rule is that if an ORCID ID is used in |
Since Harvard is doing this already (Arctos), it should be a thing.... |
Teresa J. Mayfield-Meyer |
From Paul at MCZ Starting with where the information is stored: We added pairs of standardized fields to hold GUIDs for agent, taxonomy, and geog_auth_rec. One field holds the guid, the other field holds the authority. In AGENT this is AGENTGUID and AGENTGUID_GUID_TYPE: https://github.com/MCZbase/DDL/blob/master/TABLE/AGENT.sql The behavior of these pairs of fields is controlled using values in a code table, CTGUID_TYPE https://github.com/MCZbase/DDL/blob/master/TABLE/CTGUID_TYPE.sql We allow two and only two guid athorities for agents, ORCID and VIAF, the corresponding entries in the code table are: "GUID_TYPE","DESCRIPTION","APPLIES_TO","PLACEHOLDER","PATTERN_REGEX","RESOLVER_REGEX","RESOLVER_REPLACEMENT","SEARCH_URI" The GUID controls are presented in the user interface for adding/editing as a set, one control to pick which guid authority is to be used (on selection, another control links to a search on that authority for the relevant entity), a control into which to paste the guid, which requires it to match the expected pattern, and a control which shows a current guid, linked out to the resolving authority. Thus we store a guid in a form that fits the pattern_regex for the selected guid type, and know from the resolver_regex/resolver_replacement how to translate the stored value into a resolvable reference. The value of AGENTGUID for a collector and a determiner are mapped into FLAT as RECORDEDBYID and IDENTIFIEDBYID https://github.com/MCZbase/DDL/blob/cec4447d35c2bd44d07cec2b8cf470777568cb6b/TABLE/FLAT.sql#L147 by invoking a pair of functions, each of which returns the guid for an agent if that agent is the sole collector or the sole determiner: https://github.com/MCZbase/DDL/blob/master/FUNCTION/GET_SOLE_COLLECTOR_GUID.sql This is done in UPDATE_FLAT, e.g. And these two fields are carried into FILTERED_FLAT and DIGIR_FILTERED_FLAT, and queried from there in IPT and mapped onto dwc:recordedByID and dwc:identifiedByID in the Occurrence core. Neither dwc:recordedByID nor dwc:identifiedByID allow for multiplicity within the term, so in the flat darwin core of the Occurrence core in IPT, we chose to map the agent guids if there was only one collector agent or only one determiner agent. It is a much larger task to get agent guids populated, so we decided that it was better to focus on filling in that information where we could do so cleanly than worrying about multiplicity in terms that aren't intended to handle multiple values. We haven't (yet) mapped the agent guid for the determier into the identification history extension, but that would have the same concern, an identification row has one identifiedByID which takes only a single value. To handle multiplicity of agents, we could, but haven't yet, map multiple instances of recordedByID and identifiedByID into the (currently minimal, proof of concept) RDF representation of the occurrence that we provide via content negotiaion if a mczbase.mcz.harvard.edu/guid/ IRI is requested with an accept header of text/turtle, application/rdf-xml, or application/json-ld having priority over text/html: https://github.com/MCZbase/MCZbase/blob/master/rdf/Occurrence.cfm There we are free to follow the open world and repeat the recordedByID term for an occurrence. What we would likely do for a list of collector agents is return one dwc:recordedBy with the human readable string list of collectors as its value, and then a list of dwc:recordedByID properties for the Occurrence, one for each agent in the list of collectors that has a guid. Other approaches are possible, particularly if you reference a guid authority (such as the HUH Botanist index) which mints guids for team agents, and then link to team agents as the collector. This approach does risk not retaining the order of collectors. In the HUH, and I believe generally in Botany, the first collector in a sequence is treated as primary, and then the order of subsequent collectors in the list doesn't matter. -Paul |
Check out the CETAF Botany Pilot! |
This comment was marked as abuse.
This comment was marked as abuse.
From Bionomia webinar today - From D. Shorthouse - Please do include full URIs for people identifiers in recordedByID and identifiedByID. Also - why can't we DO this? |
This comment was marked as off-topic.
This comment was marked as off-topic.
See definitions of these Darwin Core terms that complement their non-ID (string-based) counterparts: http://rs.tdwg.org/dwc/terms/recordedByID http://rs.tdwg.org/dwc/terms/identifiedByID Full URI for ORCID IDs are is in the examples there, but if you were to use Wikidata, the "entity" URI is http://www.wikidata.org/entity/Q5331679 (note the absence of the 's' in http) |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Tagging the IPT mapping project. Enthusiasm for this and maybe can be included in any updated mapping |
Merge --> #7348 |
See tdwg/dwc#102 (comment)
We should be prepared to share unique IDs for agents as much as possible. It would be a great project to have an intern add these to agents.
The text was updated successfully, but these errors were encountered: