Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adhere to JSON-LD / codemeta standards #282

Open
zyzzyxdonta opened this issue Nov 19, 2024 · 13 comments
Open

Adhere to JSON-LD / codemeta standards #282

zyzzyxdonta opened this issue Nov 19, 2024 · 13 comments
Labels
bug Something isn't working Hackathon24 Issues that should be fixed in or until the Hackathon 24 (December)

Comments

@zyzzyxdonta
Copy link
Contributor

Each step of the hermes workflow produces output files in the .hermes directory. Most of these are either not valid JSON-LD, or contain errors (e.g., switching up authors and their email addresses).

harvest/cff_contexts.json: Not JSON-LD. I don't know what this is used for.

harvest/cff.json: Not a valid JSON-LD file because @context fields are lists instead of objects, @type fields have the wrong type, the indexed list syntax is not understood.

harvest/toml_contexts.json / harvest/toml.json have the same problems.

process/hermes.json: Valid JSON-LD, but the authors were mixed up. E.g., Michael and Stephan switched name attributes, Oliver and Jeffrey switched name attributes. Not valid codemeta/Schema.org because the url attribute doesn't contain the URL as a string but is an object that contains the URL as an ID, i.e. {"@id": "https://software-metadata.pub"} instead of "https://software-metadata.pub"

process/tags.json: I don't think this is supposed to be JSON-LD. But the occasional @type or @id make it seem like it is.

@zyzzyxdonta zyzzyxdonta added bug Something isn't working Hackathon24 Issues that should be fixed in or until the Hackathon 24 (December) labels Nov 19, 2024
@led02
Copy link
Member

led02 commented Nov 19, 2024

Related to #153...

xxx.json + xxx_contexts.json = JSON-LD... writing out to two files is a result of an older, even more broken data modell.

Especially, the tags.json contents should be inlined and from our own namespace to make it more standard-conforming. However, I'm not sure whether we can achieve full JSON-LD compliance (e.g., because we want and need those meta-metadata).

However, there should be an export / deposition target that ensures standard conformance.

@led02
Copy link
Member

led02 commented Nov 19, 2024

... and yes, the mismatch in the process/hermes.json is known and I believe there is already a patch for it in my internal branch for data model refactoring. The problem was created during some refactoring regarding the plug-in interfaces... Before, the ordering of the datasets was done in a preprocess step allowing to simply merge the author lists from Git and CFF. However, this step is not done anymore, hence the merging must not be done on the lists level but on author level (i.e., first identifying the respective entry in the author list before merging the datasets).

@zyzzyxdonta
Copy link
Contributor Author

zyzzyxdonta commented Nov 19, 2024

Maybe we can store meta-metadata in a different file.

main JSON-LD file:

s1 p1 o1 .
s2 p2 o2 .
...

meta-metadata file:

x1 rdf:type rdf:Statement .
x1 rdf:subject s1 .
x1 rdf:predicate p1 .
x1 rdf:object o1 .
x1 hermes:provenance [ ... ] .

x2 rdf:type rdf:type rdf:Statement .
x2 rdf:subject s2 .
x2 rdf:predicate p2 .
x2 rdf:object o2 .
x2 hermes:provenance [ ... ] .

@led02
Copy link
Member

led02 commented Nov 19, 2024

Well, that's what the tags.js currently does. However, this gets complicated very soon...

  • We want to store meta-data especially for the "objects" and mostly not for the subjects, hence we need some sort of referencing them.
  • Currently, we also store "alternatives" (i.e., values that cannot be algorithmically merged into the existing values) that themselves have their own metadata.

On top, handling this, e.g. in some curation UI, is way more complex that just having the metadata in place already...

@zyzzyxdonta
Copy link
Contributor Author

zyzzyxdonta commented Nov 19, 2024

I was thinking that the provenance data is attached to statements. If there are multiple (conflicting) statements, these could be stored as alternatives, something like this:

s p o1 .
s p o2 .

With meta-metadata:

x1 rdf:type rdf:Statement .
x1 rdf:subject s .
x1 rdf:predicate p .
x1 rdf:object o1 .
x1 hermes:provenance [ ... ] .

x2 rdf:type rdf:type rdf:Statement .
x2 rdf:subject s .
x2 rdf:predicate p .
x2 rdf:object o2 .
x2 hermes:provenance [ ... ] .

y rdf:type hermes:Alternatives .
y hermes:option x1 .
y hermes:option x2 .

(We could also switch from JSON-LD to N-Quads and store main metadata and meta-metadata as two different graphs. That's sort of in-place ;-))

@zyzzyxdonta
Copy link
Contributor Author

zyzzyxdonta commented Nov 19, 2024

I played around with some stuff. One thing that definitely will be difficult are mismatches that span different fields. E.g.:
when codemeta has information about a persons name, but CITATION.cff has information about a persons first name and last name. So a hermes:Alternatives must have hermes:Option things that each reference potentially multiple statements.

@led02
Copy link
Member

led02 commented Nov 19, 2024

So my idea so far was something like the following:

{
  "person": {
    "@type": "Person",
    "name": { "@value": "Hans Wurst", "hermes:prov": ... },
    "firstname": { "@value": "Hans", "hermes:prov": ... },
    "hermes:alt": [
      { "@type": "Person", "name": "Hannes Wurst", "hermes:prov": ... }
    ]
  }
}

@zyzzyxdonta
Copy link
Contributor Author

zyzzyxdonta commented Nov 19, 2024

I don't think that's allowed.

A value object MUST NOT contain any other keys that expand to an IRI or keyword.

-- https://www.w3.org/TR/json-ld/#value-objects

Maybe we can work around this by setting firstname, name etc. to placeholder values (or the values of the first option in hermes:alt) or not setting them at all.

@led02
Copy link
Member

led02 commented Nov 19, 2024

Yes, I'm aware of that. However, is it required to be fully conforming in out internal data model...

But you are right when it comes to the Software CaRD report. It would be nice to follow a standard there.

Damn, I need to think further 😭

@zyzzyxdonta
Copy link
Contributor Author

is it required to be fully conforming in out internal data model...

Files written into a directory don't seem that internal to me 😁

@led02
Copy link
Member

led02 commented Nov 19, 2024

But it is a hidden directory and hence, not so public

@zyzzyxdonta
Copy link
Contributor Author

It might be hidden but it is the interface to use for everyone who writes plugins for hermes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Hackathon24 Issues that should be fixed in or until the Hackathon 24 (December)
Projects
None yet
Development

No branches or pull requests

2 participants