Adhere to JSON-LD / codemeta standards #282

zyzzyxdonta · 2024-11-19T10:43:26Z

Each step of the hermes workflow produces output files in the .hermes directory. Most of these are either not valid JSON-LD, or contain errors (e.g., switching up authors and their email addresses).

harvest/cff_contexts.json: Not JSON-LD. I don't know what this is used for.

harvest/cff.json: Not a valid JSON-LD file because @context fields are lists instead of objects, @type fields have the wrong type, the indexed list syntax is not understood.

harvest/toml_contexts.json / harvest/toml.json have the same problems.

process/hermes.json: Valid JSON-LD, but the authors were mixed up. E.g., Michael and Stephan switched name attributes, Oliver and Jeffrey switched name attributes. Not valid codemeta/Schema.org because the url attribute doesn't contain the URL as a string but is an object that contains the URL as an ID, i.e. {"@id": "https://software-metadata.pub"} instead of "https://software-metadata.pub"

process/tags.json: I don't think this is supposed to be JSON-LD. But the occasional @type or @id make it seem like it is.

The text was updated successfully, but these errors were encountered:

led02 · 2024-11-19T10:54:13Z

Related to #153...

xxx.json + xxx_contexts.json = JSON-LD... writing out to two files is a result of an older, even more broken data modell.

Especially, the tags.json contents should be inlined and from our own namespace to make it more standard-conforming. However, I'm not sure whether we can achieve full JSON-LD compliance (e.g., because we want and need those meta-metadata).

However, there should be an export / deposition target that ensures standard conformance.

led02 · 2024-11-19T11:20:07Z

... and yes, the mismatch in the process/hermes.json is known and I believe there is already a patch for it in my internal branch for data model refactoring. The problem was created during some refactoring regarding the plug-in interfaces... Before, the ordering of the datasets was done in a preprocess step allowing to simply merge the author lists from Git and CFF. However, this step is not done anymore, hence the merging must not be done on the lists level but on author level (i.e., first identifying the respective entry in the author list before merging the datasets).

zyzzyxdonta · 2024-11-19T11:40:52Z

Maybe we can store meta-metadata in a different file.

main JSON-LD file:

s1 p1 o1 .
s2 p2 o2 .
...

meta-metadata file:

x1 rdf:type rdf:Statement .
x1 rdf:subject s1 .
x1 rdf:predicate p1 .
x1 rdf:object o1 .
x1 hermes:provenance [ ... ] .

x2 rdf:type rdf:type rdf:Statement .
x2 rdf:subject s2 .
x2 rdf:predicate p2 .
x2 rdf:object o2 .
x2 hermes:provenance [ ... ] .

led02 · 2024-11-19T11:45:56Z

Well, that's what the tags.js currently does. However, this gets complicated very soon...

We want to store meta-data especially for the "objects" and mostly not for the subjects, hence we need some sort of referencing them.
Currently, we also store "alternatives" (i.e., values that cannot be algorithmically merged into the existing values) that themselves have their own metadata.

On top, handling this, e.g. in some curation UI, is way more complex that just having the metadata in place already...

zyzzyxdonta · 2024-11-19T12:09:59Z

I was thinking that the provenance data is attached to statements. If there are multiple (conflicting) statements, these could be stored as alternatives, something like this:

s p o1 .
s p o2 .

With meta-metadata:

x1 rdf:type rdf:Statement .
x1 rdf:subject s .
x1 rdf:predicate p .
x1 rdf:object o1 .
x1 hermes:provenance [ ... ] .

x2 rdf:type rdf:type rdf:Statement .
x2 rdf:subject s .
x2 rdf:predicate p .
x2 rdf:object o2 .
x2 hermes:provenance [ ... ] .

y rdf:type hermes:Alternatives .
y hermes:option x1 .
y hermes:option x2 .

(We could also switch from JSON-LD to N-Quads and store main metadata and meta-metadata as two different graphs. That's sort of in-place ;-))

zyzzyxdonta · 2024-11-19T13:39:19Z

I played around with some stuff. One thing that definitely will be difficult are mismatches that span different fields. E.g.:
when codemeta has information about a persons name, but CITATION.cff has information about a persons first name and last name. So a hermes:Alternatives must have hermes:Option things that each reference potentially multiple statements.

led02 · 2024-11-19T14:43:17Z

So my idea so far was something like the following:

{
  "person": {
    "@type": "Person",
    "name": { "@value": "Hans Wurst", "hermes:prov": ... },
    "firstname": { "@value": "Hans", "hermes:prov": ... },
    "hermes:alt": [
      { "@type": "Person", "name": "Hannes Wurst", "hermes:prov": ... }
    ]
  }
}

zyzzyxdonta · 2024-11-19T15:45:40Z

I don't think that's allowed.

A value object MUST NOT contain any other keys that expand to an IRI or keyword.

-- https://www.w3.org/TR/json-ld/#value-objects

Maybe we can work around this by setting firstname, name etc. to placeholder values (or the values of the first option in hermes:alt) or not setting them at all.

led02 · 2024-11-19T15:50:47Z

Yes, I'm aware of that. However, is it required to be fully conforming in out internal data model...

But you are right when it comes to the Software CaRD report. It would be nice to follow a standard there.

Damn, I need to think further 😭

zyzzyxdonta · 2024-11-19T15:55:44Z

There are definitely libraries that will complain about this 🤷🏻‍♂️

https://json-ld.org/playground/#startTab=tab-expanded&json-ld=%7B%22%40type%22%3A%22Person%22%2C%22schema%3Aname%22%3A%7B%22%40value%22%3A%22Hans%20Wurst%22%2C%22hermes%3Aprov%22%3A%7B%7D%7D%2C%22schema%3Afirstname%22%3A%7B%22%40value%22%3A%22Hans%22%2C%22hermes%3Aprov%22%3A%7B%7D%7D%7D&context=%7B%7D

zyzzyxdonta · 2024-11-19T15:59:02Z

is it required to be fully conforming in out internal data model...

Files written into a directory don't seem that internal to me 😁

led02 · 2024-11-19T16:07:43Z

But it is a hidden directory and hence, not so public

zyzzyxdonta · 2024-11-20T10:05:20Z

It might be hidden but it is the interface to use for everyone who writes plugins for hermes...

zyzzyxdonta added bug Something isn't working Hackathon24 Issues that should be fixed in or until the Hackathon 24 (December) labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adhere to JSON-LD / codemeta standards #282

Adhere to JSON-LD / codemeta standards #282

zyzzyxdonta commented Nov 19, 2024

led02 commented Nov 19, 2024

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 20, 2024

Adhere to JSON-LD / codemeta standards #282

Adhere to JSON-LD / codemeta standards #282

Comments

zyzzyxdonta commented Nov 19, 2024

led02 commented Nov 19, 2024

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024 • edited Loading

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024 • edited Loading

zyzzyxdonta commented Nov 19, 2024 • edited Loading

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024 • edited Loading

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024

zyzzyxdonta commented Nov 19, 2024

led02 commented Nov 19, 2024

zyzzyxdonta commented Nov 20, 2024

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading

zyzzyxdonta commented Nov 19, 2024 •

edited

Loading