-
Notifications
You must be signed in to change notification settings - Fork 39
Queries
We provide the queries below as EDN data. For your convenience, we have included a java file and a clojure file containing the corresponding query API calls.
You can query the database from Java:
package datomic.samples.mbrainz;
import java.util.*;
import datomic.Connection;
import datomic.Database;
import static datomic.Peer.*;
public class Query {
public static void main(String[] args) {
String uri = "datomic:dev://localhost:4334/mbrainz-1968-1973";
Connection conn = connect(uri);
Database db = conn.db();
Collection<List<Object>> result = q("[:find ?id ?type ?gender" +
":in $ ?name " +
":where [?e :artist/name ?name] " +
"[?e :artist/gid ?id]" +
"[?e :artist/type ?teid]" +
"[?teid :db/ident ?type]" +
"[?e :artist/gender ?geid]" +
"[?geid :db/ident ?gender]]",
db, "Janis Joplin");
System.out.println(result);
}
}
or query it from Clojure:
(require '[datomic.api :as d])
(def uri "datomic:dev://localhost:4334/mbrainz-1968-1973")
(def conn (d/connect uri))
(def db (d/db conn))
(d/q '[:find ?id ?type ?gender
:in $ ?name
:where
[?e :artist/name ?name]
[?e :artist/gid ?id]
[?e :artist/type ?teid]
[?teid :db/ident ?type]
[?e :artist/gender ?geid]
[?geid :db/ident ?gender]]
db
"Janis Joplin")
The first parameter to the Peer.q()
function is the query itself.
The remaining arguments are the data sources to be queried. They are
bound, in the order they are passed, to the names in the :in
clause
of the query, according to the following rules:
- Names starting with (or consisting of only)
$
bind database values - Names starting with (or consisting of only)
%
bind rule sets - Names starting with
?
bind data literals (e.g. the string "Janis Joplin" in the example above). - Any number of data sources can be passed into a query
Now we're ready to start the fun stuff.
Let's start with a simple query:
Query:
[:find ?title
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]]
Query args:
db, "John Lennon"
Sample results:
["Meat City"]
["Intuition"]
["Radio Play"]
["Dizzy Miss Lizzy"]
["Sunday Bloody Sunday"]
["I Found Out"]
["It's So Hard"]
["Nutopian International Anthem"]
["Jealous Guy"]
["Imagine"]
...
The results themselves are returned as a java.util.HashSet
containing tuples of the results specified in the :find
clause, in
this case just a 1-tuple containing the track title.
Run the query for a second time, and you'll notice it executes much faster. Datomic has a local cache in each Peer, which warms up as we execute queries. These queries perform better as the JVM heap gets bigger, up to 4GB, at which size the JVM starts to have its well-known long GC pauses.
If you're not familiar with John Lennon's post-Beatles work, many of the titles might not be familiar. Let's figure out what albums these tracks are on, and in what year those albums were released. To figure out how to navigate from track to album release, and figure out what attributes are available on each entity, you can refer to the Schema.
Query:
[:find ?title ?album ?year
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]
[?m :medium/tracks ?t]
[?r :release/media ?m]
[?r :release/name ?album]
[?r :release/year ?year]]
Query args:
db, "John Lennon"
Sample results:
["New York City" "Some Time in New York City" 1972]
[["Open Your Box" "Power to the People" 1971]]
["Happy X-Mas (War Is Over)" "Rock Christmas, Volume 5" 1996]
["Imagine" "Imagine" 1971]
["Sisters, O Sisters" "Some Time in New York City" 1972]
["The Luck of the Irish" "Some Time in New York City" 1973]
["Well (Baby Please Don't Go)" "Live Jam" 1973]
["We're All Water" "Some Time in New York City" 1972]
...
Most of these results are after the break up of the Beatles in 1970. Can we limit the results to only include releases before 1970?
So far we've only seen where clauses that look like:
[?entity :some/attr ?value]
The full version of these clauses is:
[?entity ?attribute ?value ?transaction-time]
and they allow us to specify or query for these various aspects of our stored datoms.
However, Datomic also allows us to execute arbitrary code as part of our queries. To figure out which tracks were released before John Lennon's death, we can use a Java method or Clojure function.
Query:
[:find ?title ?album ?year
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]
[?m :medium/tracks ?t]
[?r :release/media ?m]
[?r :release/name ?album]
[?r :release/year ?year]
[(< ?year 1970)]]
Query args:
db, "John Lennon"
Sample results:
["John & Yoko" "Unfinished Music No. 3: Wedding Album" 1969]
["Radio Play" "Unfinished Music No. 2: Life With the Lions" 1969]
["No Bed for Beatle John" "Unfinished Music No. 2: Life With the Lions" 1969]
["Two Minutes Silence" "Unfinished Music No. 2: Life With the Lions" 1969]
...
Here we use the <
function to filter the results to only the
John Lennon albums released before 1970.
To reuse query logic across many queries, you can create and use rules like the following:
[;; Given ?t bound to track entity-ids, binds ?r to the corresponding
;; set of album release entity-ids
[(track-release ?t ?r)
[?m :medium/tracks ?t]
[?r :release/media ?m]]]
This rule is called track-release
, and as defined in the first
clause when given a bound set of track eids ?t
, will bind ?r
to
the eids of the releases on which the tracks appear. So we can
re-write the John Lennon's albums query above as:
Query:
[:find ?title ?album ?year
:in $ % ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]
(track-release ?t ?r)
[?r :release/name ?album]
[?r :release/year ?year]]
Query args:
db, rules, "John Lennon"
Sample results:
["My Mummy's Dead" "John Lennon/Plastic Ono Band" 1970]
["The Luck of the Irish" "Some Time in New York City" 1972]
["Nutopian International Anthem" "Mind Games" 1973]
["Amsterdam" "Unfinished Music No. 3: Wedding Album" 1969]
["Happy Xmas (War Is Over)" "Happy Xmas (War Is Over)" 1972]
...
Notice the results are the same as the rule-less version above.
The rule set goes in as %
, and when you supply it, you can invoke
the rules by name. It is idiomatic to use round brackets for rule
invocations, and square brackets for other clauses.
For the queries that follow, when rules
is specified as a query arg,
it refers to the set of example rules in Appendix below.
Let's find all of the tracks with the word "always" in the title.
What are the titles, artists, album names, and release years of all tracks having the word "always" in their titles?
Query:
[:find ?title ?artist ?album ?year
:in $ % ?search
:where
(track-search ?search ?track)
(track-info ?track ?title ?artist ?album ?year)]
Query args:
db, rules, "always"
Sample results:
["She Always Leaves Me Laughing" "The Turtles" "Turtle Soup" 1969]
["Theme Three: The Grass Is Always Greener" "Colosseum" "Valentyne Suite" 1969]
["Always, Always" "Dolly Parton" "Always, Always / No Need to Hurry Home" 1969]
["Always Coming Back to You" "Scott Walker" "Joanna" 1968]
["The Best Girls Don't Always Win" "Betty Wright" "My First Time Around" 1968]
["Always Something There to Remind Me" "Terry Baxter & His Orchestra" "The Best of '70" 1971]
["You Know I'll Always Love You" "Budgie" "Never Turn Your Back on a Friend" 1973]
...
Using Datomic joins and rules, we can do graph walks to determine who collaborated with whom. We can also pass in data literals of various shapes, as described in the "Bindings" section of the query docs.
Query:
[:find ?aname ?aname2
:in $ % [?aname ...]
:where (collab ?aname ?aname2)]
Query args:
db, rules, ["John Lennon" "Paul McCartney" "George Harrison" "Ringo Starr"]
Sample results:
["John Lennon" "The Plastic Ono Band"]
["George Harrison" "Ravi Shankar"]
["John Lennon" "Yoko Ono"]
["George Harrison" "Bob Dylan"]
["Paul McCartney" "Linda McCartney"]
What about collaborators of collaborators? Let's find Paul McCartney's collaboration network to a depth of 2.
Who either directly collaborated with Paul McCartney, or collaborated with one of his collaborators?
Query:
[:find ?aname2
:in $ % ?aname
:where (collab-net-2 ?aname ?aname2)]
Query args:
db, rules, "George Harrison"
Sample results:
["George Harrison" "Ravi Shankar"]
["George Harrison" "Ali Akbar Khan"]
["George Harrison" "Bob Dylan"]
We used graph-walk rules to find the network above, but we could just as easily chain queries together, passing the results of one query into another query.
Query:
[:find ?aname2
:in $ % [[?aname]]
:where (collab ?aname ?aname2)]
Query args:
First time:
db, rules, [["Diana Ross"]]
Second time:
db, rules, resultsOfFirstQuery
Sample Results:
First time:
["Marvin Gaye"]
Second time:
["Diana Ross"]
["Tammi Terrell"]
Query:
[:find ?aname ?tname
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?tname]
[(!= "Outro" ?tname)]
[(!= "[outro]" ?tname)]
[(!= "Intro" ?tname)]
[(!= "[intro]" ?tname)]
[?t2 :track/name ?tname]
[?t2 :track/artists ?a2]
[(!= ?a2 ?a)]
[?a2 :artist/name ?aname]]
Query args:
db, "Bill Withers"
Sample results:
["Ken Boothe" "Ain't No Sunshine"]
["Grover Washington, Jr." "Ain't No Sunshine"]
["Sivuca" "Ain't No Sunshine"]
["Frederick Knight" "Lean on Me"]
["Grover Washington, Jr." "Lean on Me"]
...
Some of the queries above use a rule set (supplied to query as %
).
The example rules contain basic lookups, joins, and a graph-walking
rule for finding the collaboration network of an artist to a depth of
4. You could increase the depth by adding more rules according to the
recursive pattern. The rules are generated from the functions in
rules.clj,
are included as an
EDN file
in the project, and are also listed here:
[;; Given ?t bound to track entity-ids, binds ?r to the corresponding
;; set of album release entity-ids
[(track-release ?t ?r)
[?m :medium/tracks ?t]
[?r :release/media ?m]]
;; Supply track entity-ids as ?t, and the other parameters will be
;; bound to the corresponding information about the tracks
[(track-info ?t ?track-name ?artist-name ?album ?year)
[?t :track/name ?track-name]
[?t :track/artists ?a]
[?a :artist/name ?artist-name]
(track-release ?t ?r)
[?r :release/name ?album]
[?r :release/year ?year]]
;; Supply ?a (artist entity-ids) and and integer ?max track duration,
;; and ?t, ?len will be bound to track entity-ids and lengths
;; (respectively) of tracks shorter than the given ?max
[(short-track ?a ?t ?len ?max)
[?t :track/artists ?a]
[?t :track/duration ?len]
[(< ?len ?max)]]
;; Fulltext search on track. Supply the query string ?q, and ?track
;; will be bound to entity-ids of tracks whose title matches the
;; search.
[(track-search ?q ?track)
[(fulltext $ :track/name ?q) [[?track ?tname]]]]
;; Generic transitive network walking, used by collaboration network
;; rule below
;; Supply:
;; ?e1 -- an entity-id
;; ?attr -- an attribute ident
;; and ?e2 will be bound to entity-ids such that ?e1 and ?e2 are both
;; values of the given attribute for some entity (?x)
[(transitive-net-1 ?attr ?e1 ?e2)
[?x ?attr ?e1]
[?x ?attr ?e2]
[(!= ?e1 ?e2)]]
;; Same as transitive-net-1, but search one more level of depth. We
;; define this rule twice, once for each case, and the rule
;; represents the union of the two cases:
;; - The entities are directly related via the attribute
;; - The entities are related to the given depth (in this case 2) via the attribute
[(transitive-net-2 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?e2)]
[(transitive-net-2 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?x)
(transitive-net-1 ?attr ?x ?e2)
[(!= ?e1 ?e2)]]
;; Same as transitive-net-2 but to depth 3
[(transitive-net-3 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?e2)]
[(transitive-net-3 ?attr ?e1 ?e2)
(transitive-net-2 ?attr ?e1 ?x)
(transitive-net-2 ?attr ?x ?e2)
[(!= ?e1 ?e2)]]
;; Same as transitive-net-2 but to depth 4
[(transitive-net-4 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?e2)]
[(transitive-net-4 ?attr ?e1 ?e2)
(transitive-net-3 ?attr ?e1 ?x)
(transitive-net-3 ?attr ?x ?e2)
[(!= ?e1 ?e2)]]
;; Artist collaboration graph-walking rules, based on generic
;; graph-walk rule above
;; Supply an artist name as ?artist-name-1, an ?artist-name-2 will be
;; bound to the names of artists who directly collaborated with the
;; artist(s) having that name
[(collab ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-1 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]
;; Alias for collab
[(collab-net-1 ?artist-name-1 ?artist-name-2)
(collab ?artist-name-1 ?artist-name-2)]
;; Collaboration network walk to depth 2
[(collab-net-2 ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-2 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]
;; Collaboration network walk to depth 3
[(collab-net-3 ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-3 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]
;; Collaboration network walk to depth 4
[(collab-net-4 ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-4 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]]