Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New output: convert BODS JSON to RDF using GraphDB #4

Open
1 task done
StephenAbbott opened this issue Apr 13, 2022 · 5 comments
Open
1 task done

New output: convert BODS JSON to RDF using GraphDB #4

StephenAbbott opened this issue Apr 13, 2022 · 5 comments
Assignees

Comments

@StephenAbbott
Copy link
Member

StephenAbbott commented Apr 13, 2022

We received an offer from Cos at Blue Anvil for us to extend the BODS data analysis tools reusing their code - https://github.com/blueanvil/bods-rdf - in order to covert BODS data from the Register or any other source and ingest it into an RDF repository.

@cosmin-marginean
Copy link

cosmin-marginean commented Jun 26, 2022

Some of my initial thoughts on BODS-to-RDF integration and some challenges to consider.

  1. I'm assuming that OpenOwnership will provide and host for download the RDF format "atomically", correct? (i.e. an RDF-format register dataset will be available for each published BODS JSON register dataset)
  2. This is a long-running process (hours) and it's expected to increase with the register size.
  3. When integrating this, we should consider the option to also provide the RDF format for individual registers not just the combined register (Update BODS data analysis tools to offer Denmark, Slovakia and the UK BO registers as separate sources #11)
  4. The conversion code at BODS-RDF (https://github.com/blueanvil/bods-rdf) is written in Kotlin (JVM) so there are several ways to proceed with integrating this, each with various implications:
    • 4.1 Integrate the code as a library in a processing pipeline running on JVM. This will require JVM coding and JVM processes on the OpenOwnership pipeline.
    • 4.2 Running the Gradle build to produce .ttl files for BODS data from JSONL format. This will only require a JVM 11+ available in the stack.
    • 4.3 Rewrite this in any of the Flatterer languages and integrate it there. As this seems to be Python/Rust, it means we won't be able to assist with it, so we'd need someone with experience in these languages for implementation (we'll obviously assist with the conceptual elements). However, I'd assume this would be the preferred/sane approach?
  5. The RDF vocabularies should probably be generated and provided as deliverables together with the RDF data set. This is a one-off that can be simply achieved with Gradle/JVM for each BODS schema release (Blue Anvil can do that periodically). Alternatively, it can be integrated with one of the options above.

@StephenAbbott
Copy link
Member Author

Thanks @cosmin-marginean for the comprehensive feedback. Just back from holidays and catching up with updates. I'm due to work with our team on updates to the data analysis tools in August. Will be in touch as soon as possible

@StephenAbbott
Copy link
Member Author

@StephenAbbott
Copy link
Member Author

Bear in mind related discussion openownership/data-standard#121

@StephenAbbott
Copy link
Member Author

From @cosmin-marginean:

There is a Downloads section here which contains info on all BODS RDF datasets: https://github.com/cosmin-marginean/kbods/tree/main/kbods-rdf

I'm exporting these when I get a chance (once a month or so) and happy to host them in my S3 for now, so if you want to link to these feel free to do so.

I also have a short bash script to produce them if you ever want to include these in the registry pipeline on your side (takes a couple of hours to run though and needs about 50GBs of disk space).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants