Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a MassBank schema.org DataFeed #357

Open
sneumann opened this issue Aug 17, 2022 · 9 comments
Open

Provide a MassBank schema.org DataFeed #357

sneumann opened this issue Aug 17, 2022 · 9 comments
Labels

Comments

@sneumann
Copy link
Member

Hi,
the 2022 Biohackathon has project 23 to consume schema.org DataFeeds. @albangaignard or @AlasdairGray could point us to what's needed to provide our existing schema markup in such a form.
Yours, Steffen

@sneumann
Copy link
Member Author

There is more information in BioSchemas/specifications#579
Yours, Steffen

@sneumann
Copy link
Member Author

We have a RecordExporter command line tool to convert from the MassBank text records to HTML like in the web application:
https://github.com/MassBank/MassBank-web/blob/main/MassBank-Project/MassBank-lib/src/main/scripts/RecordExporter

To create the JSON dump we can either take the exported HTML and extract the <script type="application/ld+json">, or @meier-rene even adds a command-line switch --json-only to export only that.

Yours, Steffen

@sneumann
Copy link
Member Author

I love command line tools:

MassBank-web/MassBank-Project/MassBank-lib/target/MassBank-lib/MassBank-lib/bin/Inspector  \
  IPB_Halle/MSBNK-IPB_Halle-PB005803.txt /dev/stdout \
  | xmllint -html --xpath 'string(//html/head/script[@type = "application/ld+json"]/text())' - 2> /dev/null

and of course also for the entire MassBank-data (not super-fast, though...)

find . -name "MSBNK-IPB_Halle-PB00048*.txt" -exec sh -c "/vol/massbank/src/MassBank-web/MassBank-Project/MassBank-lib/target/MassBank-lib/MassBank-lib/bin/Inspector  {} /dev/stdout | xmllint -html --xpath 'string(//html/head/script[@type = \"application/ld+json\"]/text())' - 2> /dev/null " \; | jq -s 'add' >DataDump.json

@sneumann
Copy link
Member Author

sneumann commented Nov 23, 2022

jq is quite strict about proper JSON. SMILES with stereo chemistry inside JSON can pose a problem:
"smiles": "C1CC2=C(C(=CC=C2)O)OC3=CC=CC(=C3)/C=C\C4=CC(=C(C(=C4)O)O)OC5C=CC1C=C5"
will give parse error: Invalid escape at line ... as already mentioned by Tobias in #316 (comment)

So to massage the output we need
cat DataDump.json | sed -e 's#\\#\\\\#' | jq -s 'add' >DataDump.jsonld
Assuming that the smiles are the only place that has a \.
This might not be necessary after fixing #316

@sneumann
Copy link
Member Author

sneumann commented Dec 6, 2022

Thanks to @meier-rene we now have a DadaDump created via

MassBank-web/MassBank-Project/MassBank-lib/target/MassBank-lib/MassBank-lib/bin/Msbnk2JSONLD -o MassBank-2006.06.jsonld $(ls -d MassBank-data/* )

As sample is available from
https://msbi.ipb-halle.de/~sneumann/MassBank-2006.06.jsonld
Yours, Steffen

@sneumann
Copy link
Member Author

Now that we know how to create a Data Feed, we need to serve it.
According to https://schema.org/docs/feeds.html this goes into
/.well-known/feeddata-general with the option to split if it becomes too large.
That data feed should be created upon data import.

@sneumann
Copy link
Member Author

We should also find a way to express the massbank-data version (git sha256 and/or the release) of a DataFeed.

@tsufz
Copy link
Member

tsufz commented Feb 20, 2023

/.well-known/ is located in the Apache root to maintain the Letsencrypt challenge. The feeddata-general would be created in the Tomee root. Once implemented, we can redirect the request to the Tomee root.

@sneumann
Copy link
Member Author

The DataDump is now created as part of the release process in

./MassBank-lib/target/MassBank-lib/MassBank-lib/bin/RecordExporter -f jsonld -o MassBank.json ../../MassBank-data/*

and files go to the GitHub releases:
https://github.com/MassBank/MassBank-data/releases/latest/download/MassBank.json
We do not have that file in the running MassBank server yet.
Yours,
Steffen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants