Skip to content

Latest commit

 

History

History
468 lines (346 loc) · 16.6 KB

tutorial.org

File metadata and controls

468 lines (346 loc) · 16.6 KB

Tutorial

Welcome to the SCKAN tutorial!

This tutorial will walk you through how to use this interface to query SCKAN via the SPARQL and Cypher query languages.

Navigation

You can open and close headings (lines that start with * that use a larger font) by clicking on them with the mouse or by hitting tab when the cursor is on the heading line. This also works for ordered and descriptive lists which you might toggle by accident.

You can also open and close source blocks by clicking on the #+begin_src line, #+end_src line, or any of the #+header:, #+caption:, or #+name: lines associated with the block. tab works in this case as well.

Queries

For a written introduction see the manual on Working with Source Code.

Running queries

There are two ways to run queries.

  1. Hit C-c C-c to run the source block under the cursor. C-c C-c is done by holding Ctrl and then hitting the c key twice.
  2. Hit the F5 key to run the block that is closest to the cursor.

You can try out both methods on the blocks below. If everything is working you will see #+RESULTS:

SELECT ?x WHERE { VALUES ?x {"hello world"} }
RETURN "hello world"

The SCKAN query interface is configured to run sparql and cypher queries without prompting. However you might encounter a block in another language that will not run automatically, such this one.

"hello world"

See the manual section on Evaluating Code Blocks for more details.

Modifying queries

You can edit any source block as plain text.

Try editing "hello world" to read "foo bar" and then run the block.

SELECT ?x WHERE { VALUES ?x {"hello world"} }

Setting limits

Sometimes it is useful to be able to limit the number of results.

SPARQL

Try increasing the limit beyond 1000 and then try removing the limit statement from the query.

SELECT * WHERE { ?s ?p ?o }
LIMIT ?limit

Cypher

The Cypher endpoint that is used for SCKAN does not allow the LIMIT keyword. Thus there is a slight difference in how you set the limit for a Cypher query.

MATCH (g:Class) RETURN g

Parameterizing queries

Sometimes you want pass input variables to a query so that it can be called as a function without having to change the text of the query.

You can assign values to variables using the :var header.

For more details see the manual on Using Header Arguments.

SPARQL

You have already worked with variables in the previous section to set LIMIT for the SPARQL query via the variable ?limit. Any variable in a SPARQL block (name starting with ?) can be parameterized to have a specific value. You should not parameterize variables used in the SELECT statement as it will break the query.

Try running the block and then changing the value to another curie, such as TEMP:there.

SELECT ?x WHERE { VALUES ?x { ?myvar } }

By default variables are interpreted literally as IRIs, CURIES, or numbers. If you want to pass a string you can use the literal function or include the escaped quotes explicitly.

SELECT ?x WHERE { VALUES ?x { ?myvar } }

See the reference section on SPARQL variables for more details.

Cypher

Here is an example of using a variable in a Cypher block to query for all the parts of a given anatomical entity. The default is the peripheral nervous system.

MATCH path = (region:Class{iri: $region_id})<-[:BFO:0000050*1..]-(:Class)
RETURN path

Naming blocks

Queries can be called with different parameters you need to name them.

You can name any source block by adding a #+name: line directly above the #+begin_src line.

SELECT ?x WHERE { VALUES ?x { "This string is inside a named block!" } }

Try creating a new block (C-c C-, s for the adventurous) and giving it a name.

Calling blocks

It is possible to run a query with different parameters without changing the original query.

Try running C-c C-c on the #+call: line here.

Here is the block that we just ran. We changed the assignment of ?myvar from "default value" to "new value" when we ran it via #+call:.

SELECT ?x WHERE { VALUES ?x { ?myvar } }

You can also try writing and running a new #+call: line for the block that you renamed in the previous section.

Reference

Saving to the host system

If you want to use this docker image for more than basic exploration, the best approach is to dump the docker image files to a folder on the host system and then make that folder accessible to the container.

You can save to the host system and docker start together since docker start also restores things window layout. See Saving and restoring.

Mounting the sckan folder from the host

If you have not yet dumped the sckan folder see the next section first.

Running with --mount type=bind,source=sckan,destination=/home/user/sckan will mount the host sckan folder over the container sckan folder. Make sure you dump first or you will get an error. If for any reason the folder is empty the container will not start correctly.

# run the image with host sckan mounted

## linux

docker run --mount type=bind,source=sckan,destination=/home/user/sckan --volumes-from sckan-data -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -it tgbugs/musl:kg-release-user

## macos

docker run --mount type=bind,source=sckan,destination=/home/user/sckan --volumes-from sckan-data -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=host.docker.internal:0 -it tgbugs/musl:kg-release-user

## windows

docker run --mount type=bind,source=sckan,destination=/home/user/sckan --volumes-from sckan-data -e DISPLAY=host.docker.internal:0 -it tgbugs/musl:kg-release-user

Dumping the sckan folder

Dump the sckan folder to the host file system.

cid=$(docker ps -lqf ancestor=tgbugs/musl:kg-release-user)

if [ -z "${cid}" ]; then
  ncid=$(docker create tgbugs/musl:kg-release-user)
  cid=${ncid}
fi

if [ -d sckan ]; then
  echo error: sckan folder already exists
else
  docker cp ${cid}:/home/user/sckan sckan
fi

if [ -n "${ncid}" ]; then
  docker rm -v ${ncid}
fi

Updating the sckan folder

Update the sckan folder on the host without overwriting scratch.org.

cid=$(docker ps -lqf ancestor=tgbugs/musl:kg-release-user)

if [ -z "${cid}" ]; then
  ncid=$(docker create tgbugs/musl:kg-release-user)
  cid=${ncid}
fi

td=$(mktemp -d --tmpdir=$(pwd))
docker cp ${cid}:/home/user/sckan ${td}/sckan

if [ -n "${ncid}" ]; then
  docker rm -v ${ncid}
fi

if [ -d sckan ]; then
  nowish=$(date +%Y-%m-%dT%H%M)
  mkdir sckan/${nowish}
  rm ${td}/sckan/scratch.org  # explicit ${td} to avoid any risk of rming the wrong one
  pushd ${td}
  find sckan/ -maxdepth 1 -type f -exec mv ../{} ../sckan/${nowish}/ \; -exec mv {} ../{} \;
  popd
else
  mv ${td}/sckan sckan
fi

rm -r ${td}

Saving and restoring

By default docker containers are not deleted on exit. This means that if you accidentally exit the SCKAN interface your work will still be saved. If you force quit the image or if Emacs does not have a chance to shut down properly some work might be lost.

The full command to restart the most recent container that was run from the kg-release-user image is as follows.

docker container start --attach --interactive \
$(docker ps --latest --quiet --filter ancestor=tgbugs/musl:kg-release-user)

When restarting a container it should take you back to the exact state where you left off. If you want a clean slate you can run a new image from scratch by calling docker run as you did the first time you started the SCKAN image.

Transferring work

If you have queries that you have saved inside a SCKAN docker container it is possible to copy those files to the host system.

For example, to copy scratch.org to the host you can run the following.

docker cp $(docker ps -lqf ancestor=tgbugs/musl:kg-release-user):/home/user/scratch.org scratch-backup.org

Keybinds

C-ccopy
C-vpaste
C-zundo
C-y C-Zredo
C-ssave
C-ffind
C-qsave and quit
f5run nearest block
C-c C-crun block or call

Navigating images

Cypher query results can be rendered as images using the :file header argument.

When you click on a an image link feh will open the image in a new window.

Left click pans and middle click drag zooms.

When in feh you can hit the q key to quit.

You can click this example image link to open feh and see how it works.

See the feh manual section on mouse actions for more details.

Org mode

The query interface for SCKAN is a computational notebook written in Org mode. An extensive manual is built in.

For general help you can type F1 F1 or C-h C-h.

Variables

The :var header can be provided on the #+begin_src line, or it can provided on a separate #+header: line. The last assignment of the variable on the first line on which it appears (be it header or begin_src) takes priority. You could think of this as a top-right rule for priority.

Said another way, if a variable is defined multiple times on the same line the last instance takes priority. This makes it possible to temporarily shadow variable bindings without having to reorder the contents of the line. Similarly new header lines can be added above other lines and will shadow bindings without having to reorder lines.

SPARQL

In some cases it is possible to parameterize a variable in a query without using VALUES but in other cases VALUES is require, in which case the variable must always be supplied.

For example, sometimes you only need to constrain one node in a subgraph and don’t need to ensure exact equality of e.g. all objects, then you can assign the constraining variable and move on.

SELECT DISTINCT ?s ?p ?o WHERE {
  ?s ?p ?constraint .
  ?s ?p ?o .
  ?o rdfs:subClassOf+ UBERON:0001062 .
  FILTER (!isBlank(?s) && !isBlank(?o))
} LIMIT 5

If you need exact equality, then you should use VALUES and must always pass a value for the variable.

SELECT DISTINCT ?s ?p ?o WHERE {
  VALUES ?s { ?constraint }
  ?s ?p ?o .
} LIMIT 5

Cypher

There are some suspected bugs/irregularities in some part of handling of variables in Cypher blocks. If one way of writing a variable does not work try one of the other variants.

For example, some curied forms such as "ilxtr:neuron-type-keast-5" will not be expanded correctly, in which case you should use the fully expanded form and then it will work. In the example above "http://uri.interlex.org/tgbugs/readable/neuron-type-keast-5".

In other places, such as for specifying keys to look up properties you must use `${var}` instead of $var. In short, one variant doesn’t work then try another, and if the curie doesn’t work, try the expanded iri.

MATCH
(neugrp:NamedIndividual{`${key_id}`: "dynamic"})
RETURN neugrp

Environment

Default settings

By default the query settings for the files in this docker image come from queries.org and are set via the #+setupfile: ./queries.org line near the start of each file.

You can override those settings locally or change them in queries.org.

If you make a change a #+property: line in queries.org you will need to run C-c C-c on the #+setupfile: ./queries.org line of other files in order for the changes to propagate.

Query endpoints

By default this docker image is configured to work with instances of Blazegraph and SciGraph running inside the image on localhost.

The endpoints can be changed to query other endpoints by modifying either of these lines in queries.org or by setting them locally.

#+property: header-args:sparql :url http://localhost:9999/blazegraph/sparql #+property: header-args:cypher :scigraph http://localhost:9000/scigraph

In this way it is possible to use this kg-release-user image without the sckan:latest image. It is also possible to use the Org files directly if you have Emacs installed on your local system.

SciGraph Cypher

The SciGraph cypher endpoint uses a dialect of cypher that is slightly different than Neo4j. The additions are documented in the wiki at https://github.com/SciGraph/SciGraph/wiki/Cypher-language-extension.

In addition, the cypher execute endpoint has been modified to return json structured as if it came from the SciGraph /dynamic/ cypher endpoints to make it easier to develop new dynamic queries. This means that at the moment RETURN "hello world" returns no values since SciGraph will not find graph elements in the query result. This will be fixed in some future release.

Bootstrap

Local Variables