Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Use Case]: Evaluating fitness of WorldFAIR for OHDSI/GIS #324

Open
4 tasks
kzollove opened this issue Feb 23, 2024 · 10 comments
Open
4 tasks

[Use Case]: Evaluating fitness of WorldFAIR for OHDSI/GIS #324

kzollove opened this issue Feb 23, 2024 · 10 comments
Assignees
Labels
Use Case A development-driving use case

Comments

@kzollove
Copy link
Collaborator

kzollove commented Feb 23, 2024

Project Lead:

@jaygee-on-github

Purpose:

This is the specification we will be evaluating to determine its fitness to purpose:

DiscoverabilityDraftForZenodo.pdf

The specification proposes some metadata content that we can use to mark up any digital object for the purpose of discovery. The metadata content has been taken from many standards including Dublin Core, ISO19115-1, schema.org conventions from ESIPFed Science on Schema.org and Ocean Data net, DCAT, DCAT-AP, and FDO Kernel Attributes-2.0.

The specification maps this content into a set of JSON-LD nodes in a knowledge graph. Each node has a property and ultimately a value taken from the use case. The knowledge graph is machine readable and can be queried by a software agent. It can also be validated using SHACL rules in specific use cases.

One use case for this specification is a catalog of datasets. In this context the specification provides mark up and a knowledge graph at both the dataset and the variable levels. Variable level metadata can be more or less advanced.

Tasks:

@jaygee-on-github
Copy link
Collaborator

jaygee-on-github commented Feb 24, 2024

In the tasks so far I didn't include development of an upper model based on Wild's exposome that we can use to classify all the catalog entries at the dataset level.

This appears in a presentation I made recently:

image

Here is the presentation

If we had an "upper model" that we could use as buckets in which to break out the datasets, then we would be positioned to create a catalog with three levels following the Arcus schema that the library science group developed at CHOP. In the Arcus model a catalog consists of one or more collections and a collection contains one or more series and a series consists of one or more files/datasets. INSPIRE has begun to engage with the Arcus group at CHOP at least conceptually.

Here is a presentation they recently made to INSPIRE:

@jaygee-on-github jaygee-on-github modified the milestones: Create a proof of concept knowledge graph for a catalog that includes both SDoHs, Create a proof of concept knowledge graph for a catalog that includes both SDoHs and climate data, Put together and demonstrate a toolset that can search the JSON-LD catalog knowledge graph, Infuse Gaia with metadata from the knowledge graph at both the dataset and variable levels Feb 24, 2024
@kzollove kzollove moved this to 🏃‍♀ In Progress in GIS Project Management Feb 29, 2024
@rtmill rtmill added the Use Case A development-driving use case label Mar 8, 2024
@kzollove
Copy link
Collaborator Author

Jay, Doug, and Steve have built a schema.org JSON-LD from LinkML (In different context ). They will meet separately to detail that pipeline and then will update the task to push non-functional metadata into JSON-LD after this meeting

There are other apps that run around that process that may be helpful. Will detail these (db schemas, documentation)

DB tables/ schemas for capturing this metadata can be generated from schema.org JSON-LD. Natural language descriptions can be generated to describe these tables

Doug is exploring using graphs to analyze these

@kzollove kzollove changed the title Evaluating fitness of WorldFAIR for OHDSI/GIS [Use Case] Evaluating fitness of WorldFAIR for OHDSI/GIS Mar 29, 2024
@kzollove
Copy link
Collaborator Author

This Use Case is contributing directly to GIS WG by developing Authoring environment for discovery metadata that will go into staging database alongside catalog entries

  • Meeting Monday and should have contribution directly to GitHub, possibly as early as before next week's meeting

@kzollove kzollove changed the title [Use Case] Evaluating fitness of WorldFAIR for OHDSI/GIS [Use Case]: Evaluating fitness of WorldFAIR for OHDSI/GIS Mar 29, 2024
@jaygee-on-github
Copy link
Collaborator

jaygee-on-github commented Apr 2, 2024

@kzollove, we met on Tuesday. Tim, Doug Fils, Arofan Gregory and Jay were in attendance. We discussed metadata entry using YAML and forms. Tim demonstrated a recently developed DataCite form called DataCite Fabrica. We discussed middleware that would take us to JSON-LD and schema.org. Candidates included LinkML and RML.io technologies.

Doug is going to put together a preliminary proposal working with Tim and the various approaches Tim has either used or wants to consider for the metadata entry. I will check with Doug later this week before our Friday meeting on 4/5 to find out our ETA on the proposal.

@jaygee-on-github
Copy link
Collaborator

@kzollove and @martyalvarez and @AEW0330 and @tibbben and @rtmill, we would like to present next week. We have two candidate authoring solutions. Both will support YAML or spreadsheet input and JSON-LD output right now at the dataset level but extensible to the variable level.

The output is an empty instance of schema.org JSON-LD that can be aligned with any standard (more or less).

In one candidate the mapplng is embedded in some code probably Python if I recall. In the other candidate the mapping is declarative.

We might want to talk about the maintainability of the two candidates.

@jaygee-on-github
Copy link
Collaborator

The design for the output schema.org is a little open-ended as a feature. We have experience with and are interested in following the Science on Schema.org metadata guidance endorsed by the ESIP Partner Assembly a couple of years back. This guidance is remarkably cross-domain.

The guidance can be found here. Note that some of the guidance is experimental developed to address a few special use cases. We are thinking the experimental guidance may apply.

@kzollove
Copy link
Collaborator Author

kzollove commented May 3, 2024

@jaygee-on-github, once you find a time that works for you and Doug Fils (and whoever else should be present), please let us know and @martyalvarez can help set up the presentation on this work.

My preference is for a Friday meeting, but will join whenever! Thanks for all your work on this.

@fils
Copy link

fils commented May 13, 2024

Look forward to talking about these on the scheduled call.

Obviously YAML to JSON-LD (RDF) is doable, but so is CSV or just tabular data to RDF. I've been exploring RML (https://rml.io/) which allows for a declarative mapping from tabular (or structured) to RDF. This would let people work in spreadsheets if they like and that maps better to their current data model.

A forms based approach could also be used. Things like https://www.kobotoolbox.org/ are also possible alternatives to classic Google Forms.

Connecting such transforms with validation via SHACL is another topic that might be of interest.

I'll work up examples for the May 17th call.

@fils
Copy link

fils commented Jun 13, 2024

@kzollove @jaygee-on-github just FYI, we finally published the latest version of the document referenced in the original post on this thread. You can find it here: https://zenodo.org/records/11236871

During the editing of this document I was always keeping in mind how I would connect the UNESCO Ocean InfoHub (OIH) work to these guidelines. Part of the groups follow on work is start looking at implementation examples and documenting those. So, I'm happy to look them over in the context of this work as well as OIH.

Note that guidance scopes the use of https://schema.org/StatisticalVariable along with the standard https://schema.org/variableMeasured.

I am also meeting with the OBIS group (https://obis.org/) next week to talk about how we could align some of the discrete grid approaches we are working on. OBIS is developing what they call speciesgrids (https://github.com/iobis/speciesgrids) and I have been working on a similar to generate resources like the following.

image

I am hoping we can generate these products in line with the CODATA recommendations.

@jaygee-on-github
Copy link
Collaborator

jaygee-on-github commented Jun 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Use Case A development-driving use case
Projects
Status: 🏃‍♀ In Progress
Development

No branches or pull requests

4 participants