-
Notifications
You must be signed in to change notification settings - Fork 39
Queries
We provide the queries below as EDN data. For your convenience, a java file and a clojure file containing the corresponding query API calls are included with the project.
You can query the database from Java:
package datomic.samples.mbrainz;
import java.util.*;
import datomic.Connection;
import datomic.Database;
import static datomic.Peer.*;
public class Query {
public static void main(String[] args) {
String uri = "datomic:free://localhost:4334/mbrainz";
Connection conn = connect(uri);
Database db = conn.db();
Collection<List<Object>> result = q("[:find ?id ?type ?gender" +
":in $ ?name" +
":where [?e :artist/name ?name]" +
"[?e :artist/gid ?id]" +
"[?e :artist/type ?type]" +
"[?e :artist/gender ?gender]]",
db, "Lupe Fiasco");
System.out.println(result);
}
}
or from Clojure:
(require '[datomic.api :as d])
(def uri "datomic:free://localhost:4334/mbrainz")
(def conn (d/connect uri))
(def db (d/db conn))
(d/q '[:find ?id ?type ?gender
:in $ ?name
:where
[?e :artist/name ?name]
[?e :artist/gid ?id]
[?e :artist/type ?type]
[?e :artist/gender ?gender]]
db
"Lupe Fiasco")
The first parameter to the Peer.q()
function is the query itself.
The remaining arguments are the data sources to be queried. They are
bound, in the order they are passed, to the names in the :in
clause
of the query, according to the following rules:
- Names starting with (or consisting of only)
$
bind database values - Names starting with (or consisting of only)
%
bind rulesets - Names starting with
?
bind data literals (e.g. the string "Lupe Fiasco" in the example above). - Any number of data sources can be passed into a query
Now we're ready to start the fun stuff.
Let's start with a simple query:
Query:
[:find ?title
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]]
Query args:
db, "John Lennon"
Sample results:
["I Don't Want to Be a Soldier, I Don't Want to Die - Sax Overdub (7-4-71 NYC)"]
["Real Love (Speech Removed) (Short version)"]
["Hound Dog"]
["San Francisco Bay Blues"]
["Tower Records Spot"]
["Move Over Ms. L."]
["Gimme Some Truth (remix)"]
["Meeting at the Indica Gallery"]
["How Do You Sleep? (alternate vocal)"]
["Clean Up Time"]
The results themselves are returned as a java.util.HashSet
containing tuples of the results specified in the :find
clause, in
this case just a 1-tuple containing the track title.
Run the query for a second time, and you'll notice it executes much faster. Datomic has a local cache in each Peer, which warms up as we execute queries. These queries perform better as the JVM heap gets bigger, up to 4GB, at which size the JVM starts to have its well-known long GC pauses.
If you're not familiar with John Lennon's post-Beatles work, many of the titles might not be familiar. Let's figure out what albums these tracks are on, and in what year those albums were released. To figure out how to navigate from track to album release, and figure out what attributes are available on each entity, you can refer to the Schema.
Query:
[:find ?title ?album ?year
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]
[?m :medium/tracks ?t]
[?r :release/media ?m]
[?r :release/name ?album]
[?r :release/year ?year]]
Query args:
db, "John Lennon"
Sample results:
["Whatever Gets You Thru the Night" "Shaved Fish" 2007]
["Bring on the Lucie (Freda People)" "The Complete Lost Lennon Tapes, Volume 9" 1997]
["Happy X-Mas (War Is Over)" "Rock Christmas, Volume 5" 1996]
["Power to the People" "John Lennon" 2009]
["Aisumasen (I'm Sorry)" "The Complete Lost Lennon Tapes, Volume 22" 1998]
["How?" "Lennon" 1990]
["A Hard Day's Night" "Hey Jude / Yesterday" 2000]
["Real Life" "The Complete Lost Lennon Tapes, Volume 15" 1998]
["Instant Karma! (We All Shine on)" "The Very Best of Legends" 2005]
["Oh Yoko! - Acoustic Demo (Mid 1969)" "Imagine: All the Outtakes" 1994]
Neat, all posthumous releases in this sample. I wonder which of his albums were released before his death.
So far we've only seen where clauses that look like:
[?entity :some/attr ?value]
The full version of these clauses is:
[?entity ?attribute ?value ?transaction-time]
and they allow us to specify or query for these various aspects of our stored datoms.
However, Datomic also allows us to execute arbitrary code as part of our queries. To figure out which tracks were released before John Lennon's death, we can use a Java method or Clojure function.
What are the titles, album names, and release years of the John Lennon tracks released before or during 1980?
Query:
[:find ?title ?album ?year
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]
[?m :medium/tracks ?t]
[?r :release/media ?m]
[?r :release/name ?album]
[?r :release/year ?year]
[(<= ?year 1980)]]
Query args:
db, "John Lennon"
Sample results:
["John & Yoko" "Unfinished Music No. 3: Wedding Album" 1969]
["Radio Play" "Unfinished Music No. 2: Life With the Lions" 1969]
["New York City" "Some Time in New York City" 1973]
["Hard Times Are Over" "Double Fantasy" 1980]
["New York City" "Some Time in New York City" 1972]
["You Can't Catch Me" "Rock 'n' Roll" 1975]
["We're All Water" "Some Time in New York City" 1972]
["Whatever Gets You Thru the Night" "Walls and Bridges" 1974]
["It's So Hard" "Imagine" 1971]
["Gimme Some Truth" "Imagine" 1971]
Here we use the <=
function to filter the results to only the
John Lennon albums released before or during 1980.
To reuse query logic across many queries, you can create and use rules like the following:
[;; Given ?t bound to track entity-ids, binds ?r to the corresponding
;; set of album release entity-ids
[(track-release ?t ?r)
[?m :medium/tracks ?t]
[?r :release/media ?m]]]
This rule is called track-release
, and as defined in the first
clause when given a bound set of track eids ?t
, will bind ?r
to
the eids of the releases on which the tracks appear. So we can
re-write the John Lennon's albums query above as:
Query:
[:find ?title ?album ?year
:in $ % ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?title]
(track-release ?t ?r)
[?r :release/name ?album]
[?r :release/year ?year]]
Query args:
db, rules, "John Lennon"
Sample results:
["Whatever Gets You Thru the Night" "Shaved Fish" 2007]
["Bring on the Lucie (Freda People)" "The Complete Lost Lennon Tapes, Volume 9" 1997]
["Happy X-Mas (War Is Over)" "Rock Christmas, Volume 5" 1996]
["Power to the People" "John Lennon" 2009]
["Aisumasen (I'm Sorry)" "The Complete Lost Lennon Tapes, Volume 22" 1998]
["How?" "Lennon" 1990]
["A Hard Day's Night" "Hey Jude / Yesterday" 2000]
["Real Life" "The Complete Lost Lennon Tapes, Volume 15" 1998]
["Instant Karma! (We All Shine on)" "The Very Best of Legends" 2005]
["Oh Yoko! - Acoustic Demo (Mid 1969)" "Imagine: All the Outtakes" 1994]
Notice the results are the same as the rule-less version above.
The rule set goes in as %
, and when you supply it, you can invoke
the rules by name. It is idiomatic to use round brackets for rule
invocations, and square brackets for other clauses.
For the queries that follow, when rules
is specified as a query arg,
it refers to the set of example rules in Appendix 1 below.
That reminds me, do I have the full collection of the Beatles' albums?
Query:
[:find ?artist ?rname ?type
:in $ ?aname
:where
[?a :artist/name ?aname]
[?ar :abstractRelease/artists ?a]
[?ar :abstractRelease/name ?rname]
[?ar :abstractRelease/artistCredit ?artist]
[?ar :abstractRelease/type ?type-e]
[?type-e :db/ident ?type]]
Query args:
db, "The Beatles"
Sample results:
["The Beatles" "Michelle" :release.type/album]
["The Beatles" "Another Sessions... Plus" :release.type/other]
["The Beatles" "Documents, Volume 2" :release.type/other]
["The Beatles" "West Coast Invasion" :release.type/other]
["The Beatles" "Past Masters, Volume Two" :release.type/album]
["The Beatles" "Golden Best 20, Volume 1" :release.type/album]
["The Beatles" "Interview Picture Disc, Volume 2" :release.type/other]
["The Beatles" "And I Love Her" :release.type/single]
["The Beatles" "Sgt. Pepper’s Lonely Hearts Club Band / With a Little Help From My Friends" :release.type/single]
["The Beatles" "Ultra Rare Trax, Volume 7" :release.type/other]
Let's find all of the tracks with the word "zombie" in the title!
years of all tracks having the word "zombie" in their titles?
Query:
[:find ?title ?artist ?album ?year
:in $ % ?search
:where
(track-search ?search ?track)
(track-info ?track ?title ?artist ?album ?year)]
Query args:
db, rules, "zombie"
Sample results:
["Was a Zombie" "Twisted in Graves" "Deepsix vs. Twisted in Graves" 2007]
["Zombie" "The Cranberries" "Unplugged and Exposed" 2001]
["Zombie Ritual" "Japanische Kampfhörspiele" "Sektion JaKa" 1998]
["Zombie (A-Team mix)" "Ororo" "Maxi Power, Volume 6" 1995]
["Go Go Zombie" "Under a Nightmare" "Valley of the Scarecrow" 2004]
["Zombie" "Disarm" "Really Fast, Volume 3" 1986]
["Zombie Jig" "Benga" "Newstep" 2006]
["Blood Zombie 2007" "Horror of 59" "The Golden Age of Sin" 2007]
["What You Need (Zombie Nation remix)" "Tiga" "Ministry of Sound: Mashed 5" 2009]
["Sex Zombie" "Red Aunts" "Drag" 1993]
Using Datomic joins and rules, we can do graph walks to determine who collaborated with whom. We can also pass in data literals of various shapes, as described in the "Bindings" section of the query docs.
Query:
[:find ?aname ?aname2
:in $ % [?aname ...]
:where (collab ?aname ?aname2)]
Query args:
db, rules, ["John Lennon" "Paul McCartney" "George Harrison" "Ringo Starr"]
Sample results:
["George Harrison" "Richie Havens"]
["John Lennon" "Yoko Ono"]
["George Harrison" "Tracy Chapman"]
["John Lennon" "Elephant's Memory"]
["George Harrison" "The Clancy Brothers"]
["George Harrison" "Kris Kristofferson"]
["John Lennon" "Elton John"]
["Ringo Starr" "Quincy Jones"]
["George Harrison" "Chrissie Hynde"]
["George Harrison" "Ron Wood"]
What about collaborators of collaborators? Let's find Paul McCartney's collaboration network to a depth of 2.
Who either directly collaborated with Paul McCartney, or collaborated with one of his collaborators?
Query:
[:find ?aname2
:in $ % ?aname
:where (collab-net-2 ?aname ?aname2)]
Query args:
db, rules, "Paul McCartney"
Sample results:
["Andy Fairweather-Low"]
["Peter Tosh"]
["The Powerhouse"]
["The Blockheads"]
["The Count Basie Orchestra"]
["Faheem Mazhar"]
["Rod Stewart"]
["Lee Ritenour"]
["Smokey Robinson"]
["Savage Garden"]
We used graph-walk rules to find the network above, but we could just as easily chain queries together, passing the results of one query into another query.
Who either directly collaborated with Paul McCartney, or collaborated with one of his collaborators?
Query:
[:find ?aname2
:in $ % [[?aname]]
:where (collab ?aname ?aname2)]
Query args:
First time:
db, rules, [["Paul McCartney"]]
Second time:
db, rules, resultsOfFirstQuery
Sample Results:
First time:
["Wings"]
["The Christians"]
["Tony Bennett"]
["Joe Cocker"]
["Lulu"]
["Gerry Marsden"]
["George Martin"]
["Stock Aitken Waterman"]
["George Benson"]
["Denny Laine"]
Second time:
["Andy Fairweather-Low"]
["Peter Tosh"]
["The Powerhouse"]
["The Count Basie Orchestra"]
["Faheem Mazhar"]
["Rod Stewart"]
["Lee Ritenour"]
["Smokey Robinson"]
["Savage Garden"]
["Joe Lovano"]
Let's play Battle of the Bands. We can specify an attribute on which to compare two or more artists, and use aggregation to see who's has the most. Now you can settle those bar room debates with data!
Query:
[:find ?aname (count ?e)
:with ?a
:in $ ?criterion [?aname ...]
:where
[?a :artist/name ?aname]
[?e ?criterion ?a]]
Query args:
db, :abstractRelease/artists, ["Jay-Z", "Beyoncé Knowles"]
Result:
[["Beyoncé Knowles" 44] ["Jay-Z" 78]]
You could do the same thing with :track/artists
as the criterion to
see who has more tracks.
Query:
[:find ?aname ?tname
:in $ ?artist-name
:where
[?a :artist/name ?artist-name]
[?t :track/artists ?a]
[?t :track/name ?tname]
[(!= "Outro" ?tname)]
[(!= "[outro]" ?tname)]
[(!= "Intro" ?tname)]
[(!= "[intro]" ?tname)]
[?t2 :track/name ?tname]
[?t2 :track/artists ?a2]
[(!= ?a2 ?a)]
[?a2 :artist/name ?aname]]
Query args:
db, "The Who"
Sample results:
["Big Sean" "You"]
["Karaoke" "My Way"]
["The Slags" "Water"]
["Special Patrol" "I Don't Mind"]
["Phillip Boa and the Voodooclub" "Fire"]
["Day of the Sword" "After the Fire"]
["DJ Resist" "Sunrise"]
["The California Aggie Marching Band-uh!" "You Better You Bet"]
["Amazing Journey" "I'm Free"]
["Rainbow Brite" "Jingle Bells"]
Some of the queries above use a rule set (supplied to query as %
).
The example rules contain basic lookups, joins, and a graph-walking
rule for finding the collaboration network of an artist to a depth of 4.
You could increase the depth by adding more rules according to the
recursive pattern. For an example of generating such a graph-walk
ruleset programmatically, see
rules.clj
These rules are as follows:
[;; Given ?t bound to track entity-ids, binds ?r to the corresponding
;; set of album release entity-ids
[(track-release ?t ?r)
[?m :medium/tracks ?t]
[?r :release/media ?m]]
;; Supply track entity-ids as ?t, and the other parameters will be
;; bound to the corresponding information about the tracks
[(track-info ?t ?track-name ?artist-name ?album ?year)
[?t :track/name ?track-name]
[?t :track/artists ?a]
[?a :artist/name ?artist-name]
(track-release ?t ?r)
[?r :release/name ?album]
[?r :release/year ?year]]
;; Supply ?a (artist entity-ids) and and integer ?max track duration,
;; and ?t, ?len will be bound to track entity-ids and lengths
;; (respectively) of tracks shorter than the given ?max
[(short-track ?a ?t ?len ?max)
[?t :track/artists ?a]
[?t :track/duration ?len]
[(< ?len ?max)]]
;; Fulltext search on track. Supply the query string ?q, and ?track
;; will be bound to entity-ids of tracks whose title matches the
;; search.
[(track-search ?q ?track)
[(fulltext $ :track/name ?q) [[?track ?tname]]]]
;; Generic transitive network walking, used by collaboration network
;; rule below
;; Supply:
;; ?e1 -- an entity-id
;; ?attr -- an attribute ident
;; and ?e2 will be bound to entity-ids such that ?e1 and ?e2 are both
;; values of the given attribute for some entity (?x)
[(transitive-net-1 ?attr ?e1 ?e2)
[?x ?attr ?e1]
[?x ?attr ?e2]
[(!= ?e1 ?e2)]]
;; Same as transitive-net-1, but search one more level of depth. We
;; define this rule twice, once for each case, and the rule
;; represents the union of the two cases:
;; - The entities are directly related via the attribute
;; - The entities are related to the given depth (in this case 2) via the attribute
[(transitive-net-2 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?e2)]
[(transitive-net-2 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?x)
(transitive-net-1 ?attr ?x ?e2)
[(!= ?e1 ?e2)]]
;; Same as transitive-net-2 but to depth 3
[(transitive-net-3 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?e2)]
[(transitive-net-3 ?attr ?e1 ?e2)
(transitive-net-2 ?attr ?e1 ?x)
(transitive-net-2 ?attr ?x ?e2)
[(!= ?e1 ?e2)]]
;; Same as transitive-net-2 but to depth 4
[(transitive-net-4 ?attr ?e1 ?e2)
(transitive-net-1 ?attr ?e1 ?e2)]
[(transitive-net-4 ?attr ?e1 ?e2)
(transitive-net-3 ?attr ?e1 ?x)
(transitive-net-3 ?attr ?x ?e2)
[(!= ?e1 ?e2)]]
;; Artist collaboration graph-walking rules, based on generic
;; graph-walk rule above
;; Supply an artist name as ?artist-name-1, an ?artist-name-2 will be
;; bound to the names of artists who directly collaborated with the
;; artist(s) having that name
[(collab ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-1 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]
;; Alias for collab
[(collab-net-1 ?artist-name-1 ?artist-name-2)
(collab ?artist-name-1 ?artist-name-2)]
;; Collaboration network walk to depth 2
[(collab-net-2 ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-2 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]
;; Collaboration network walk to depth 3
[(collab-net-3 ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-3 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]
;; Collaboration network walk to depth 4
[(collab-net-4 ?artist-name-1 ?artist-name-2)
[?a1 :artist/name ?artist-name-1]
(transitive-net-4 :track/artists ?a1 ?a2)
[?a2 :artist/name ?artist-name-2]]]