How to structure a data dump as a DataFeed? #579
Replies: 3 comments 2 replies
-
@ivanmicetic for your IDP resources, I would expect that the JSON-LD file would contain an array of entries that correspond to your pages. This may look something like the following (I've taken a snippet of two of your entries – note that I've changed the dataset to an object rather than a string): [
{
"@context": "https://schema.org",
"includedInDataset": {
"@id":"https://disprot.org/#2021-08",
"@type": "Dataset",
"name": "DisProt (August 2021)"
},
"@type": "Protein",
"@id": "https://disprot.org/DP00004",
"http://purl.org/dc/terms/conformsTo": {
"@id": "https://bioschemas.org/profiles/Protein/0.11-RELEASE",
"@type": "CreativeWork"
},
"identifier": "https://identifiers.org/disprot:DP00004",
"sameAs": "http://purl.uniprot.org/uniprot/P49913",
"name": "Cathelicidin antimicrobial peptide"
},
{
"@context": "https://schema.org",
"includedInDataset": {
"@id": "https://disprot.org/#2021-08",
"@type": "Dataset",
"name": "DisProt (August 2021)"
},
"@type": "Protein",
"@id": "https://disprot.org/DP00072",
"http://purl.org/dc/terms/conformsTo": {
"@id": "https://bioschemas.org/profiles/Protein/0.11-RELEASE",
"@type": "CreativeWork"
},
"identifier": "https://identifiers.org/disprot:DP00072",
"sameAs": "http://purl.uniprot.org/uniprot/Q8WZ42",
"name": "Titin"
}
] For reducing the data load on consumers and to reduce the size of the file, we could use the JSON-LD flattened function which would result in. See how there is only one entry node for the profile and the dataset rather than that information being repeated. {
"@graph": [
{
"@id": "https://bioschemas.org/profiles/Protein/0.11-RELEASE",
"@type": "http://schema.org/CreativeWork"
},
{
"@id": "https://disprot.org/#2021-08",
"@type": "http://schema.org/Dataset",
"http://schema.org/name": "DisProt (August 2021)"
},
{
"@id": "https://disprot.org/DP00004",
"@type": "http://schema.org/Protein",
"http://purl.org/dc/terms/conformsTo": {
"@id": "https://bioschemas.org/profiles/Protein/0.11-RELEASE"
},
"http://schema.org/identifier": "https://identifiers.org/disprot:DP00004",
"http://schema.org/includedInDataset": {
"@id": "https://disprot.org/#2021-08"
},
"http://schema.org/name": "Cathelicidin antimicrobial peptide",
"http://schema.org/sameAs": {
"@id": "http://purl.uniprot.org/uniprot/P49913"
}
},
{
"@id": "https://disprot.org/DP00072",
"@type": "http://schema.org/Protein",
"http://purl.org/dc/terms/conformsTo": {
"@id": "https://bioschemas.org/profiles/Protein/0.11-RELEASE"
},
"http://schema.org/identifier": "https://identifiers.org/disprot:DP00072",
"http://schema.org/includedInDataset": {
"@id": "https://disprot.org/#2021-08"
},
"http://schema.org/name": "Titin",
"http://schema.org/sameAs": {
"@id": "http://purl.uniprot.org/uniprot/Q8WZ42"
}
}
]
} So in essence you need to create a JSON-LD entry for each protein, put them together in an array, and then use the JSON-LD flatten function. |
Beta Was this translation helpful? Give feedback.
-
Using the Disprot sample data that we collected for developing the IDP-KG, I have generated this sample json-ld file. Hopefully that shows what should be in a DataFeed file. |
Beta Was this translation helpful? Give feedback.
-
One suggestion is to make the dump an RO-Crate - which also use flattened JSON-LD, basically the only change from the example you pasted would be to declare a root dataset that it would also be good to use the (otherwise deprecated) See https://gist.github.com/stain/a7143d5276b927571a68f493cd388836 based on @AlasdairGray's sample data above. Changes on top. It may seem strange that from the RO-Crate the One thing you may notice as I reflattened with https://w3id.org/ro/crate/1.1/context is that our JSON-LD context is based on schema.org 10.0, so unmapped keys like (If you flatten with |
Beta Was this translation helpful? Give feedback.
-
The Schema.org community have created a proposal for exchanging markup data as a DataFeed. This feed can be made up of one or many files which should be stored at a well known location.
What the proposal does not specify is what should be in the file(s).
Beta Was this translation helpful? Give feedback.
All reactions