Welcome to the SCKAN tutorial!
This tutorial will walk you through how to use this interface to query SCKAN via the SPARQL and Cypher query languages.
You can open and close headings (lines that start with *
that use a
larger font) by clicking on them with the mouse or by hitting tab
when
the cursor is on the heading line. This also works for ordered and
descriptive lists which you might toggle by accident.
You can also open and close source blocks by clicking on the
#+begin_src
line, #+end_src
line, or any of the #+header:
,
#+caption:
, or #+name:
lines associated with the block. tab
works in this case as well.
For a written introduction see the manual on Working with Source Code.
There are two ways to run queries.
- Hit
C-c C-c
to run the source block under the cursor.C-c C-c
is done by holdingCtrl
and then hitting thec
key twice. - Hit the
F5
key to run the block that is closest to the cursor.
You can try out both methods on the blocks below.
If everything is working you will see #+RESULTS:
SELECT ?x WHERE { VALUES ?x {"hello world"} }
RETURN "hello world"
The SCKAN query interface is configured to run sparql and cypher queries without prompting. However you might encounter a block in another language that will not run automatically, such this one.
"hello world"
See the manual section on Evaluating Code Blocks for more details.
You can edit any source block as plain text.
Try editing "hello world"
to read "foo bar"
and then run the block.
SELECT ?x WHERE { VALUES ?x {"hello world"} }
Sometimes it is useful to be able to limit the number of results.
Try increasing the limit beyond 1000 and then try removing the limit statement from the query.
SELECT * WHERE { ?s ?p ?o }
LIMIT ?limit
The Cypher endpoint that is used for SCKAN does not allow the LIMIT keyword. Thus there is a slight difference in how you set the limit for a Cypher query.
MATCH (g:Class) RETURN g
Sometimes you want pass input variables to a query so that it can be called as a function without having to change the text of the query.
You can assign values to variables using the :var
header.
For more details see the manual on Using Header Arguments.
You have already worked with variables in the previous section to set
LIMIT
for the SPARQL query via the variable ?limit
. Any variable
in a SPARQL block (name starting with ?
) can be parameterized to
have a specific value. You should not parameterize variables used in
the SELECT
statement as it will break the query.
Try running the block and then changing the value to another curie,
such as TEMP:there
.
SELECT ?x WHERE { VALUES ?x { ?myvar } }
By default variables are interpreted literally as IRIs, CURIES, or
numbers. If you want to pass a string you can use the literal
function
or include the escaped quotes explicitly.
SELECT ?x WHERE { VALUES ?x { ?myvar } }
See the reference section on SPARQL variables for more details.
Here is an example of using a variable in a Cypher block to query for all the parts of a given anatomical entity. The default is the peripheral nervous system.
MATCH path = (region:Class{iri: $region_id})<-[:BFO:0000050*1..]-(:Class)
RETURN path
Queries can be called with different parameters you need to name them.
You can name any source block by adding a #+name:
line directly above
the #+begin_src
line.
SELECT ?x WHERE { VALUES ?x { "This string is inside a named block!" } }
Try creating a new block (C-c C-, s
for the adventurous) and giving it a name.
It is possible to run a query with different parameters without changing the original query.
Try running C-c C-c
on the #+call:
line here.
Here is the block that we just ran. We changed the assignment of
?myvar
from "default value"
to "new value"
when we ran it via #+call:
.
SELECT ?x WHERE { VALUES ?x { ?myvar } }
You can also try writing and running a new #+call:
line for the block
that you renamed in the previous section.
If you want to use this docker image for more than basic exploration, the best approach is to dump the docker image files to a folder on the host system and then make that folder accessible to the container.
You can save to the host system and docker start
together since docker
start
also restores things window layout. See Saving and restoring.
If you have not yet dumped the sckan
folder see the next section first.
Running with --mount type=bind,source=sckan,destination=/home/user/sckan
will mount the host sckan
folder over the container sckan
folder. Make
sure you dump first or you will get an error. If for any reason the
folder is empty the container will not start correctly.
# run the image with host sckan mounted
## linux
docker run --mount type=bind,source=sckan,destination=/home/user/sckan --volumes-from sckan-data -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -it tgbugs/musl:kg-release-user
## macos
docker run --mount type=bind,source=sckan,destination=/home/user/sckan --volumes-from sckan-data -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=host.docker.internal:0 -it tgbugs/musl:kg-release-user
## windows
docker run --mount type=bind,source=sckan,destination=/home/user/sckan --volumes-from sckan-data -e DISPLAY=host.docker.internal:0 -it tgbugs/musl:kg-release-user
Dump the sckan
folder to the host file system.
cid=$(docker ps -lqf ancestor=tgbugs/musl:kg-release-user)
if [ -z "${cid}" ]; then
ncid=$(docker create tgbugs/musl:kg-release-user)
cid=${ncid}
fi
if [ -d sckan ]; then
echo error: sckan folder already exists
else
docker cp ${cid}:/home/user/sckan sckan
fi
if [ -n "${ncid}" ]; then
docker rm -v ${ncid}
fi
Update the sckan
folder on the host without overwriting scratch.org
.
cid=$(docker ps -lqf ancestor=tgbugs/musl:kg-release-user)
if [ -z "${cid}" ]; then
ncid=$(docker create tgbugs/musl:kg-release-user)
cid=${ncid}
fi
td=$(mktemp -d --tmpdir=$(pwd))
docker cp ${cid}:/home/user/sckan ${td}/sckan
if [ -n "${ncid}" ]; then
docker rm -v ${ncid}
fi
if [ -d sckan ]; then
nowish=$(date +%Y-%m-%dT%H%M)
mkdir sckan/${nowish}
rm ${td}/sckan/scratch.org # explicit ${td} to avoid any risk of rming the wrong one
pushd ${td}
find sckan/ -maxdepth 1 -type f -exec mv ../{} ../sckan/${nowish}/ \; -exec mv {} ../{} \;
popd
else
mv ${td}/sckan sckan
fi
rm -r ${td}
By default docker containers are not deleted on exit. This means that if you accidentally exit the SCKAN interface your work will still be saved. If you force quit the image or if Emacs does not have a chance to shut down properly some work might be lost.
The full command to restart the most recent container that was run
from the kg-release-user
image is as follows.
docker container start --attach --interactive \
$(docker ps --latest --quiet --filter ancestor=tgbugs/musl:kg-release-user)
When restarting a container it should take you back to the exact state
where you left off. If you want a clean slate you can run a new image
from scratch by calling docker run
as you did the first time you
started the SCKAN image.
If you have queries that you have saved inside a SCKAN docker container it is possible to copy those files to the host system.
For example, to copy scratch.org
to the host you can run the following.
docker cp $(docker ps -lqf ancestor=tgbugs/musl:kg-release-user):/home/user/scratch.org scratch-backup.org
C-c | copy |
C-v | paste |
C-z | undo |
C-y C-Z | redo |
C-s | save |
C-f | find |
C-q | save and quit |
f5 | run nearest block |
C-c C-c | run block or call |
Cypher query results can be rendered as images using the :file
header argument.
When you click on a an image link feh
will open the image in a new window.
Left click pans and middle click drag zooms.
When in feh
you can hit the q
key to quit.
You can click this example image link to open feh
and see how it works.
See the feh manual section on mouse actions for more details.
The query interface for SCKAN is a computational notebook written in Org mode. An extensive manual is built in.
For general help you can type F1 F1
or C-h C-h
.
The :var
header can be provided on the #+begin_src
line, or it can
provided on a separate #+header:
line. The last assignment of the
variable on the first line on which it appears (be it header or begin_src)
takes priority. You could think of this as a top-right rule for priority.
Said another way, if a variable is defined multiple times on the same line the last instance takes priority. This makes it possible to temporarily shadow variable bindings without having to reorder the contents of the line. Similarly new header lines can be added above other lines and will shadow bindings without having to reorder lines.
In some cases it is possible to parameterize a variable in a query
without using VALUES
but in other cases VALUES
is require, in which
case the variable must always be supplied.
For example, sometimes you only need to constrain one node in a subgraph and don’t need to ensure exact equality of e.g. all objects, then you can assign the constraining variable and move on.
SELECT DISTINCT ?s ?p ?o WHERE {
?s ?p ?constraint .
?s ?p ?o .
?o rdfs:subClassOf+ UBERON:0001062 .
FILTER (!isBlank(?s) && !isBlank(?o))
} LIMIT 5
If you need exact equality, then you should use VALUES
and
must always pass a value for the variable.
SELECT DISTINCT ?s ?p ?o WHERE {
VALUES ?s { ?constraint }
?s ?p ?o .
} LIMIT 5
There are some suspected bugs/irregularities in some part of handling of variables in Cypher blocks. If one way of writing a variable does not work try one of the other variants.
For example, some curied forms such as "ilxtr:neuron-type-keast-5"
will not be expanded correctly, in which case you should use the fully
expanded form and then it will work. In the example above
"http://uri.interlex.org/tgbugs/readable/neuron-type-keast-5"
.
In other places, such as for specifying keys to look up properties you
must use `${var}`
instead of $var
. In short, one variant doesn’t work
then try another, and if the curie doesn’t work, try the expanded iri.
MATCH
(neugrp:NamedIndividual{`${key_id}`: "dynamic"})
RETURN neugrp
By default the query settings for the files in this docker image come
from queries.org
and are set via the #+setupfile: ./queries.org
line
near the start of each file.
You can override those settings locally or change them in queries.org
.
If you make a change a #+property:
line in queries.org
you will need
to run C-c C-c
on the #+setupfile: ./queries.org
line of other files
in order for the changes to propagate.
By default this docker image is configured to work with instances of Blazegraph and SciGraph running inside the image on localhost.
The endpoints can be changed to query other endpoints by modifying
either of these lines in queries.org
or by setting them locally.
#+property: header-args:sparql :url http://localhost:9999/blazegraph/sparql
#+property: header-args:cypher :scigraph http://localhost:9000/scigraph
In this way it is possible to use this kg-release-user
image without
the sckan:latest
image. It is also possible to use the Org files
directly if you have Emacs installed on your local system.
The SciGraph cypher endpoint uses a dialect of cypher that is slightly different than Neo4j. The additions are documented in the wiki at https://github.com/SciGraph/SciGraph/wiki/Cypher-language-extension.
In addition, the cypher execute
endpoint has been modified to return
json structured as if it came from the SciGraph /dynamic/
cypher
endpoints to make it easier to develop new dynamic queries. This means
that at the moment RETURN "hello world"
returns no values since
SciGraph will not find graph elements in the query result. This will
be fixed in some future release.