Skip to content

Allows to run SPARQL chunks in R-markdown files. Also privides inline functions to send a SPARQL query to an endpoint and retrieve data in dataframe or list form.

License

Notifications You must be signed in to change notification settings

aourednik/SPARQLchunks

Repository files navigation

SPARQLchunks

Lifecycle: stable

Coding in R is useless without interesting research questions; and even the best questions remain unanswered without data. RStudio provides a number of convenient ways to access data, among which the possibility to write SQL code chunks in Rmarkdown, to run these chunks and to assign the value of the query result directly to a variable of your choice. No such thing is available yet for SPARQL queries: the ones that allow you to navigate gigantic knowledge graphs that incarnate the conscience of the semantic web. This is where the SPARQLchunks package steps in.

This package allows you to query SPARQL endpoints in two different ways:

  1. It allows you to run SPARQL chunks in Rmarkdown files.
  2. It provides inline functions to send SPARQL queries to a user-defined endpoint and retrieve data in dataframe form (sparql2df) or list form (sparql2list).

Endpoints can be reached from behind corporate firewalls on Windows machines thanks to automatic proxy detection. See Execute SPARQL chunks in R Markdown.

Installation

Most users can install by running this command

remotes::install_github("aourednik/SPARQLchunks", build_vignettes = TRUE)

If you are behind a corporate firewall on a Windows machine, direct access to GitHub might be blocked. If that is your case, run this installation code instead:

proxy_url <- curl::ie_get_proxy_for_url("https://github.com")
httr::set_config(httr::use_proxy(proxy_url))
remotes::install_url("https://github.com/aourednik/SPARQLchunks/archive/refs/heads/master.zip", build_vignettes = TRUE)

Use

To use the full potential of the package you need to load the library and tell knitr that a SPARQL engine exists:

library(SPARQLchunks)
knitr::knit_engines$set(sparql = SPARQLchunks::eng_sparql)

Once you have done so, you can run SPARQL chunks:

Chunks

Retrieve a data frame

output.var: the name of the data frame you want to store the results in

endpoint: the URL of the SPARQL endpoint

autoproxy: whether or not try to use the automatic proxy detection

auth: authentication information for the sparql endpoint (as an httr authentication object, optional)

Example 1 (Swiss administration endpoint)

```{sparql output.var="queryres_df", endpoint="https://lindas.admin.ch/query"}
PREFIX schema: <http://schema.org/>
SELECT * WHERE {
  ?sub a schema:DataCatalog .
  ?subtype a schema:DataType .
}
```

Example 2 (Uniprot endpoint)

Note the use of attempt at automatic proxy detection.

```{sparql output.var="tes5", endpoint="https://sparql.uniprot.org/sparql", autoproxy=TRUE}
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?taxon
FROM <http://sparql.uniprot.org/taxonomy>
WHERE {
	?taxon a up:Taxon .
} LIMIT 500
```

Example 3 (WikiData endpoint):

```{sparql output.var="res.df", endpoint="https://query.wikidata.org/sparql"}
SELECT DISTINCT ?item ?itemLabel ?country ?countryLabel ?linkTo ?linkToLabel
WHERE {
    ?item wdt:P1142 ?linkTo .
    ?linkTo wdt:P31 wd:Q12909644 .
    VALUES ?type { wd:Q7278  wd:Q24649 }
    ?item wdt:P31 ?type .
    ?item wdt:P17 ?country .
    MINUS { ?item wdt:P576 ?abolitionDate }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
```

Retrieve a list

output.var: the name of the list you want to store the results in

endpoint: the URL of the SPARQL endpoint

output.type : when set to "list", retrieves a list (tree structure) instead of a data-frame

autoproxy: whether or not try to use the automatic proxy detection

```{sparql output.var="queryres_list", endpoint="https://lindas.admin.ch/query", output.type="list"}
PREFIX schema: <http://schema.org/>
SELECT * WHERE {
  ?sub a schema:DataCatalog .
  ?subtype a schema:DataType .
}
```

Inline code

The inline functions sparql2df and sparql2list both have the same pair of arguments: a SPARQL endpoint and a SPARQL query. Queries can be multi-line:

endpoint <- "https://lindas.admin.ch/query"
query <- "PREFIX schema: <http://schema.org/>
  SELECT * WHERE {
  ?sub a schema:DataCatalog .
  ?subtype a schema:DataType .
}"

Retrieve a data frame

result_df <- sparql2df(endpoint,query)

The same but with attempt at automatic proxy detection:

result_df <- sparql2df(endpoint,query,autoproxy=TRUE)

Retrieve a list

result_list <- sparql2list(endpoint,query)

The same but with attempt at automatic proxy detection:

result_list <- sparql2list(endpoint,query,autoproxy=TRUE)

About

Allows to run SPARQL chunks in R-markdown files. Also privides inline functions to send a SPARQL query to an endpoint and retrieve data in dataframe or list form.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages