Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put together and demonstrate a toolset that can search the JSON-LD catalog knowledge graph #327

Open
kzollove opened this issue Feb 29, 2024 · 2 comments
Assignees

Comments

@kzollove
Copy link
Collaborator

Put together and demonstrate a toolset that can search the JSON-LD catalog knowledge graph. This could include the use of an LLM. Right now asking an LLM to explore a knowledge graph is bleeding edge but it won't be in six months to a year. Here is the state of the art maybe nine months ago: Knowledge Graphs + Large Language Models = The ability for users to ask their own questions?. We can probably develop some best practices for the use of chain of thought prompting that assists an LLM in our use case, i.e. querying a catalog of datasets at first at the dataset and then the variable levels.

@fils
Copy link

fils commented May 13, 2024

I just wanted to add a few thoughts in this issue. Moving from a collection of JSON-LD files to a RDF based KG is rather easy at this time. The biggest issue is to address the presence of similarly named blank nodes if you simply convert the JSON-LD to RDF (like N-quads). This is resolved if you feed the results into a triplestore since they will address the blank node on ingest to internally unique elements.

So you can, for example use Oxigraph at the the command line simply feed in the JSON-LD to it. A script like https://github.com/gleanerio/nabu/blob/df-dev/scripts/jsonldLoader.sh can do this. Note that script is for reading from an object store, but could easily be modified to work from a local directory.

The Nabu program ( https://github.com/gleanerio/nabu/blob/df-dev/docs/README.md ) is designed to generate complete graphs from JSON-LD taking into account the blank nodes and some other edge cases. The results of that program can then be fed into a triplestore or locally queried with jena or packages like Oxigraph or KuzuDB.

With respect to leveraging an LLM. It is possible to connect some of these LLM based RAG approaches. There are many examples of these. There is also the approach of leveraging the LLM to generate SPARQL or other query langauges. See https://python.langchain.com/v0.1/docs/integrations/graphs/ for some examples in LangChain.

If you are interested I'd be happy to share some more examples or work up a short document like I did for the UN Oceans community here: https://github.com/gleanerio/archetype/blob/master/networks/oceans/README.md

@jaygee-on-github
Copy link
Collaborator

@kzollove, @martyalvarez, @AEW0330: perhaps we need to save this discussion for another day. I would like this task be one which we used an augmented LLM in line with this article to explore the catalog instead of a query language.

This doesn't seem so far fetched now since this is exactly what APHRC and CODATA is planning to do in another project called Data Science Without Borders. Here we are building a catalog of research that "pathfinders" are engaged in. Each research project in the catalog is represented by a schema.org JSON-LD MedicalObservationalStudy. We are just now starting to explore how to augment an LLM and use it to query a collection of MedicalObservationalStudy knowledge graphs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants