Quick and dirty SQL/JDBC implementation for Blueprints.
Written solely for the sake of it.
Warning: Current caveats:
- Does no optimisations, like property caching, whatsoever
- Leaks resources unless you cast the _Iterable_s to _CloseableIterable_s and close them explicitly.
- Untested with databases other than H2 and Postgres
- Only supports primitive types and strings as property values
- Mmmn, probably lots more I haven’t thought of…
To try this out in the Gremlin repl you need to put some jars on the Gremlin-Groovy standalone classpath:
- blueprints-sql-graph-1.0.jar (this thing)
- Your database driver, i.e. postgresql-9.1-901.jdbc4.jar, or h2.jar
If you’re using a Gremlin distribution compiled from Github, these extra jars need to be copied to:
$GREMLIN_HOME/gremlin-groovy/target/gremlin-groovy-2.4.0-SNAPSHOT-standalone/lib/
gremlin> import com.tinkerpop.blueprints.impls.sql.SqlGraph
gremlin> g = new SqlGraph(["sql.datasource.class": "org.h2.jdbcx.JdbcDataSource",
"sql.datasource.url": "jdbc:h2:mem:test"])
gremlin> g.createSchemaIfNeeded()
gremlin> v1 = g.addVertex()
gremlin> v1.addEdge("knows", g.addVertex())
gremlin> g.commit()
For server DBs you first have to create an empty database, with a username/password. The graph can be loaded thusly:
gremlin> import com.tinkerpop.blueprints.impls.sql.SqlGraph
==> ...
gremlin> g = new SqlGraph(["sql.datasource.class": "org.postgresql.ds.PGSimpleDataSource",
"sql.datasource.serverName": "example.com", "sql.datasource.databaseName": "blueprints", "sql.datasource.user": "...",
"sql.datasource.password": "..."])
gremlin> g.createSchemaIfNeeded()
==>null
The parameter for the constructor is a map specifying the class of the datasource + any additional properties of the
datasource and optionally the names of the tables to be used for storage:
- sql.datasource.class – the name of the datasource class from some JDBC driver on the classpath
- sql.datasource.* – any properties of the datasource can be passed using this prefix. E.g. sql.datasource.portNumber, sql.datasource.serverName, etc. See the documentation of the datasource for the list of available properties.
- sql.verticesTable – the name of the vertices table. Defaults to “vertices”.
- sql.edgesTable – the name of the edges table. Defaults to “edges”.
- sql.vertexPropertiesTable – the name of the table for vertex properties. Defaults to “vertex_properties”.
- sql.edgePropertiesTable – the name of the table for edge properties. Defaults to “edge_properties”.
This is more of a toy than an serious attempt at performant graph database backed by an RDBMS. As such, there are
no optimizations done either at the schema level or in the codebase above declaring indexes on frequently accessed
columns.