Node.js driver for Neo4j, a graph database.
This driver aims to be the most robust, comprehensive, and battle-tested driver available. It's run in production by FiftyThree to power the popular iOS app Paper.
Note: if you're still on Neo4j 1.x, you'll need to use node-neo4j v1.
Note: node-neo4j v2 is a ground-up rewrite with an entirely new API. If you're currently using node-neo4j v1, here's the migration guide.
- Cypher queries, parameters, batching, and transactions
- Arbitrary HTTP requests, for custom Neo4j plugins
- Custom headers, for high availability, application tracing, query logging, and more
- Precise errors, for robust error handling from the start
- Configurable connection pooling, for performance tuning & monitoring
- Thorough test coverage with >100 tests
- Continuously integrated against multiple versions of Node.js and Neo4j
npm install neo4j --save
var neo4j = require('neo4j');
var db = new neo4j.GraphDatabase('http://username:password@localhost:7474');
db.cypher({
query: 'MATCH (user:User {email: {email}}) RETURN user',
params: {
email: '[email protected]',
},
}, callback);
function callback(err, results) {
if (err) throw err;
var result = results[0];
if (!result) {
console.log('No user found.');
} else {
var user = result['user'];
console.log(user);
}
};
Yields e.g.:
{
"_id": 12345678,
"labels": [
"User",
"Admin"
],
"properties": {
"name": "Alice Smith",
"email": "[email protected]",
"emailVerified": true,
"passwordHash": "..."
}
}
See node-neo4j-template for a more thorough example.
Connect to a running Neo4j instance by instantiating the GraphDatabase
class:
var neo4j = require('neo4j');
// Shorthand:
var db = new neo4j.GraphDatabase('http://username:password@localhost:7474');
// Full options:
var db = new neo4j.GraphDatabase({
url: 'http://localhost:7474',
auth: {username: 'username', password: 'password'},
// ...
});
Options:
-
url
(required): the base URL to the Neo4j instance, e.g.'http://localhost:7474'
. This can include auth credentials (e.g.'http://username:password@localhost:7474'
), but doesn't have to. -
auth
: optional auth credentials; either a'username:password'
string, or a{username, password}
object. If present, this takes precedence over any credentials in theurl
. -
headers
: optional custom HTTP headers to send with every request. These can be overridden per request. Node-Neo4j defaults to sending aUser-Agent
identifying itself, but this can be overridden too. -
proxy
: optional URL to a proxy. If present, all requests will be routed through the proxy. -
agent
: optionalhttp.Agent
instance, for custom socket pooling.
Once you have a GraphDatabase
instance, you can make queries and more.
Most operations are asynchronous, which means they take a callback. Node-Neo4j callbacks are of the standard (error[, results])
form.
Async control flow can get pretty tricky, so it's highly recommended to use a flow control library or tool, like async or Streamline.
To make a Cypher query, simply pass the string query, any query parameters, and a callback to receive the error or results.
db.cypher({
query: 'MATCH (user:User {email: {email}}) RETURN user',
params: {
email: '[email protected]',
},
}, callback);
It's extremely important to pass params
separately. If you concatenate them into the query
, you'll be vulnerable to injection attacks, and Neo4j performance will suffer as well. (Note that parameters can't be used for labels, property names, and relationship types, as those things determine the query plan. Docs »)
Cypher queries always return a list of results (like SQL rows), with each result having common properties (like SQL columns). Thus, query results passed to the callback are always an array (even if it's empty), and each result in the array is always an object (even if it's empty).
function callback(err, results) {
if (err) throw err;
var result = results[0];
if (!result) {
console.log('No user found.');
} else {
var user = result['user'];
console.log(user);
}
};
If the query results include nodes or relationships, Node
and Relationship
instances are returned for them. These instances encapsulate {_id, labels, properties}
for nodes, and {_id, type, properties, _fromId, _toId}
for relationships, but they can be used just like normal objects.
{
"_id": 12345678,
"labels": [
"User",
"Admin"
],
"properties": {
"name": "Alice Smith",
"email": "[email protected]",
"emailVerified": true,
"passwordHash": "..."
}
}
(The _id
properties refer to Neo4j's internal IDs. These can be convenient for debugging, but their usage otherwise — especially externally — is discouraged.)
If you don't need to know Neo4j IDs, node labels, or relationship types, you can pass lean: true
to get back just properties, for a potential performance gain.
db.cypher({
query: 'MATCH (user:User {email: {email}}) RETURN user',
params: {
email: '[email protected]',
},
lean: true,
}, callback);
{
"name": "Alice Smith",
"email": "[email protected]",
"emailVerified": true,
"passwordHash": "..."
}
Other options:
headers
: optional custom HTTP headers to send with this query. These will add onto the defaultGraphDatabase
headers
, but also override any that overlap.
You can also make multiple Cypher queries within a single network request, by passing a queries
array rather than a single query
string.
Query params
(and optionally lean
) are then specified per query, so the elements in the array are {query, params[, lean]}
objects. (Other options like headers
remain "global" for the entire request.)
db.cypher({
queries: [{
query: 'MATCH (user:User {email: {email}}) RETURN user',
params: {
email: '[email protected]',
},
}, {
query: 'MATCH (task:WorkerTask) RETURN task',
lean: true,
}, {
query: 'MATCH (task:WorkerTask) DELETE task',
}],
headers: {
'X-Request-ID': '1234567890',
},
}, callback);
The callback then receives an array of query results, one per query.
function callback(err, batchResults) {
if (err) throw err;
var userResults = batchResults[0];
var taskResults = batchResults[1];
var deleteResults = batchResults[2];
// User results:
var userResult = userResults[0];
if (!userResult) {
console.log('No user found.');
} else {
var user = userResult['user'];
console.log('User %s (%s) found.', user._id, user.properties.name);
}
// Worker task results:
if (!taskResults.length) {
console.log('No worker tasks to process.');
} else {
taskResults.forEach(function (taskResult) {
var task = taskResult['task'];
console.log('Processing worker task %s...', task.operation);
});
}
// Delete results (shouldn’t have returned any):
assert.equal(deleteResults.length, 0);
};
Importantly, batch queries execute (a) sequentially and (b) transactionally: they all succeed, or they all fail. If you don't need them to be transactional, it can often be better to parallelize separate db.cypher
calls instead.
You can also batch multiple Cypher queries into a single transaction across multiple network requests. This can be useful when application logic needs to run in between related queries (e.g. for domain-aware cascading deletes), or Neo4j state needs to be coordinated with side effects (e.g. writes to another data store). The queries will all succeed or fail together.
To do this, begin a new transaction, make Cypher queries within that transaction, and then ultimately commit the transaction or roll it back.
var tx = db.beginTransaction();
function makeFirstQuery() {
tx.cypher({
query: '...',
params {...},
}, makeSecondQuery);
}
function makeSecondQuery(err, results) {
if (err) throw err;
// ...some application logic...
tx.cypher({
query: '...',
params: {...},
}, finish);
}
function finish(err, results) {
if (err) throw err;
// ...some application logic...
tx.commit(done); // or tx.rollback(done);
}
function done(err) {
if (err) throw err;
// At this point, the transaction has been committed.
}
makeFirstQuery();
The transactional cypher
method supports everything the normal cypher
method does (e.g. lean
, headers
, and batch queries
). In addition, you can pass commit: true
to auto-commit the transaction (and save a network request) if the query succeeds.
function makeSecondQuery(err, results) {
if (err) throw err;
// ...some application logic...
tx.cypher({
query: '...',
params: {...},
commit: true,
}, done);
}
function done(err) {
if (err) throw err;
// At this point, the transaction has been committed.
}
Importantly, transactions allow only one query at a time. To help preempt errors, you can inspect the state
of the transaction, e.g. whether it's open for queries or not.
// Initially, transactions are open:
assert.equal(tx.state, tx.STATE_OPEN);
// Making a query...
tx.cypher({
query: '...',
params: {...},
}, callback)
// ...will result in the transaction being pending:
assert.equal(tx.state, tx.STATE_PENDING);
// All other operations (making another query, committing, etc.)
// are rejected while the transaction is pending:
assert.throws(tx.renew.bind(tx))
function callback(err, results) {
// When the query returns, the transaction is likely open again,
// but it could be committed if `commit: true` was specified,
// or it could have been rolled back automatically (by Neo4j)
// if there was an error:
assert.notEqual([
tx.STATE_OPEN, tx.STATE_COMMITTED, tx.STATE_ROLLED_BACK
].indexOf(tx.state), -1); // i.e. tx.state is in this array
}
Finally, open transactions expire after some period of inactivity. This period is configurable in Neo4j, but it defaults to 60 seconds today. Transactions renew automatically on every query, but if you need to, you can inspect transactions' expiration times and renew them manually.
// Only open transactions (not already expired) can be renewed:
assert.equal(tx.state, tx.STATE_OPEN);
assert.notEqual(tx.state, tx.STATE_EXPIRED);
console.log('Before:', tx.expiresAt, '(in', tx.expiresIn, 'ms)');
tx.renew(function (err) {
if (err) throw err;
console.log('After:', tx.expiresAt, '(in', tx.expiresIn, 'ms)');
});
The full state diagram putting this all together:
Most node-neo4j operations support passing in custom headers for the underlying HTTP requests. The GraphDatabase
constructor also supports passing in default headers for every operation.
This can be useful to achieve a variety of features, such as:
- Logging individual queries
- Tracing application requests
- Splitting master/slave traffic (see High Availability below)
None of these things are supported out-of-the-box by Neo4j today, but all can be handled by a server (e.g. Apache or Nginx) or load balancer (e.g. HAProxy or Amazon ELB) in front.
For example, at FiftyThree, our Cypher requests look effectively like this (though we abstract and encapsulate these things with higher-level helpers):
db.cypher({
query: '...',
params: {...},
headers: {
// Identify the query via a short, human-readable name.
// This is what we log in HAProxy for every request,
// since all Cypher calls have the same HTTP path,
// and this is friendlier than the entire query.
'X-Query-Name': 'User_getUnreadNotifications',
// This tells HAProxy to send this query to the master (even
// though it's a read), as we require strong consistency here.
// See the High Availability section below.
'X-Consistency': 'strong'
// This is a concatenation of upstream services' request IDs
// along with a randomly generated one of our own.
// We log this header on all our servers, so we can trace
// application requests through our entire stack.
// TODO: Link to Heroku article on this!
'X-Request-Ids': '123,456,789'
},
}, callback);
You might also find custom headers helpful for custom Neo4j plugins.
Neo4j Enterprise supports running multiple instances of Neo4j in a single "High Availability" (HA) cluster. Neo4j's HA uses a master-slave setup, so slaves typically lag behind the master by a small delay (tunable in Neo4j).
There are multiple ways to interface with an HA cluster from node-neo4j, but the recommended route is to place a load balancer in front (e.g. HAProxy or Amazon ELB). You can then point node-neo4j to the load balancer's endpoint.
var db = new neo4j.GraphDatabase({
url: 'https://username:[email protected]:1234',
});
You'll still want to split traffic between the master and the slaves (e.g. reads to slaves, writes to master), in order to distribute load and improve performance. You can achieve this through multiple ways:
-
Create separate
GraphDatabase
instances with differenturl
s to the load balancer (e.g. different host, port, or path). The load balancer can inspect the URL to route queries appropriately. -
Use the same, single
GraphDatabase
instance, but send a custom header to let the load balancer know where the query should go. This is what we do at FiftyThree, and what's shown in the custom header example above. -
Have the load balancer derive the target automatically, e.g. by inspecting the Cypher query. This isn't recommended. =)
With this setup, you should find node-neo4j usage with an HA cluster to be seamless.
If you need functionality beyond Cypher, you can make direct HTTP requests to Neo4j. This can be useful for legacy APIs (e.g. traversals), custom plugins (e.g. neo4j-spatial), or even future APIs before node-neo4j implements them.
db.http({
method: 'GET',
path: '/db/data/node/12345678',
// ...
}, callback);
function callback(err, body) {
if (err) throw err;
console.log(body);
}
{
"_id": 12345678,
"labels": [
"User",
"Admin"
],
"properties": {
"name": "Alice Smith",
"email": "[email protected]",
"emailVerified": true,
"passwordHash": "..."
}
}
By default:
- The callback receives just the response body (not the status code or headers);
- Any nodes and relationships in the body are transformed to
Node
andRelationship
instances (likecypher
); and - 4xx and 5xx responses are treated as errors.
You can alternately pass raw: true
for more control, in which case:
- The callback receives the entire response (with an additional
body
property); - Nodes and relationships are not transformed into
Node
andRelationship
instances (but the body is still parsed as JSON); and - 4xx and 5xx responses are not treated as errors.
db.http({
method: 'GET',
path: '/db/data/node/12345678',
raw: true,
}, callback);
function callback(err, resp) {
if (err) throw err;
assert.equal(resp.statusCode, 200);
assert.equal(typeof resp.headers, 'object');
console.log(resp.body);
}
{
"self": "http://localhost:7474/db/data/node/12345678",
"labels": "http://localhost:7474/db/data/node/12345678/labels",
"properties": "http://localhost:7474/db/data/node/12345678/properties",
// ...
"metadata": {
"id": 12345678,
"labels": [
"User",
"Admin"
]
},
"data": {
"name": "Alice Smith",
"email": "[email protected]",
"emailVerified": true,
"passwordHash": "..."
}
}
Other options:
-
headers
: optional custom HTTP headers to send with this request. These will add onto the defaultGraphDatabase
headers
, but also override any that overlap. -
body
: an optional request body, e.g. forPOST
andPUT
requests. This gets serialized to JSON.
Requests and responses can also be streamed for maximum performance. The http
method returns a Request.js instance, which is a DuplexStream
combining both the writeable request stream and the readable response stream.
(Request.js provides a number of benefits over the native HTTP
ClientRequest
and IncomingMessage
classes, e.g. proxy support,
gzip decompression, simpler writes, and a single unified 'error'
event.)
If you want to stream the request, be sure not to pass a body
option. And if you want to stream the response (without having it buffer in memory), be sure not to pass a callback. You can stream the request without streaming the response, and vice versa.
Streaming the response implies the raw
option above: nodes and relationships are not transformed (as even JSON isn't parsed), and 4xx and 5xx responses are not treated as errors.
var req = db.http({
method: 'GET',
path: '/db/data/node/12345678',
});
req.on('error', function (err) {
// Handle the error somehow. The default behavior is:
throw err;
});
req.on('response', function (resp) {
assert.equal(resp.statusCode, 200);
assert.equal(typeof resp.headers, 'object');
assert.equal(typeof resp.body, 'undefined');
});
var body = '';
req.on('data', function (chunk) {
body += chunk;
});
req.on('end', function () {
body = JSON.parse(body);
console.log(body);
});
To achieve robustness in your app, it's vitally important to handle errors precisely. Neo4j supports this nicely by returning semantic and precise error codes.
There are multiple levels of detail, but the high-level classifications are a good granularity for decision-making:
ClientError
(likely a bug in your code, but possibly invalid user input)DatabaseError
(a bug in Neo4j)TransientError
(occasionally expected; can/should retry)
node-neo4j translates these classifications to named Error
subclasses. That means there are two ways to detect Neo4j errors:
// `instanceof` is recommended:
err instanceof neo4j.TransientError
// `name` works too, though:
err.name === 'neo4j.TransientError'
These error instances also have a neo4j
property with semantic data inside. E.g. Cypher errors have data of the form {code, message}
:
{
"code": "Neo.TransientError.Transaction.DeadlockDetected",
"message": "LockClient[83] can't wait on resource ...",
"stackTrace": "..."
}
Other types of errors (e.g. managing schema) may have different forms of neo4j
data:
{
"exception": "BadInputException",
"fullname": "org.neo4j.server.rest.repr.BadInputException",
"message": "Unable to add label, see nested exception.",
"stackTrace": [...],
"cause": {...}
}
Finally, malformed (non-JSON) responses from Neo4j (rare) will have neo4j
set to the raw response string, while native Node.js errors (e.g. DNS errors) will be propagated in their original form, to avoid masking unexpected issues.
Putting all this together, you now have the tools to handle Neo4j errors precisely! For example, we have helpers similar to these at FiftyThree:
// A query or transaction failed. Should we retry it?
// We check this in a retry loop, with proper backoff, etc.
// http://aseemk.com/talks/advanced-neo4j#/50
function shouldRetry(err) {
// All transient errors are worth retrying, of course.
if (err instanceof neo4j.TransientError) {
return true;
}
// If the database is unavailable, it's probably failing over.
// We expect it to come back up quickly, so worth retrying also.
if (isDbUnavailable(err)) {
return true;
}
// There are a few other non-transient Neo4j errors we special-case.
// Important: this assumes we don't have bugs in our code that would trigger
// these errors legitimately.
if (typeof err.neo4j === 'object' && (
// If a failover happened when we were in the middle of a transaction,
// the new instance won't know about that transaction, so we re-do it.
err.neo4j.code === 'Neo.ClientError.Transaction.UnknownId' ||
// These are current Neo4j bugs we occasionally hit with our queries.
err.neo4j.code === 'Neo.ClientError.Statement.EntityNotFound' ||
err.neo4j.code === 'Neo.DatabaseError.Statement.ExecutionFailure')) {
return true;
}
return false;
}
// Is this error due to Neo4j being down, failing over, etc.?
// This is a separate helper because we also retry less aggressively in this case.
function isDbUnavailable(err) {
// If we're unable to connect, we see these particular Node.js errors.
// https://nodejs.org/api/errors.html#errors_common_system_errors
// E.g. http://stackoverflow.com/questions/17245881/node-js-econnreset
if ((err.syscall === 'getaddrinfo' && err.code === 'ENOTFOUND') ||
(err.syscall === 'connect' && err.code === 'EHOSTUNREACH') ||
(err.syscall === 'read' && err.code === 'ECONNRESET')) {
return true;
}
// We load balance via HAProxy, so if Neo4j is unavailable, HAProxy responds
// with 5xx status codes.
// node-neo4j sees this and translates to a DatabaseError, but the body is
// HTML, not JSON, so the `neo4j` property is simply the HTML string.
if (err instanceof neo4j.DatabaseError && typeof err.neo4j === 'string' &&
err.neo4j.match(/(502 Bad Gateway|503 Service Unavailable)/)) {
return true;
}
return false;
}
In addition to all of the above, node-neo4j also embeds the most useful information from neo4j
into the message
and stack
properties, so you don't need to do anything special to log meaningful, actionable, and debuggable errors. (E.g. Node's native logging of errors, both via console.log
and on uncaught exceptions, includes this info.)
Here are some example snippets from real-world stack
traces:
neo4j.ClientError: [Neo.ClientError.Statement.ParameterMissing] Expected a parameter named email
neo4j.ClientError: [Neo.ClientError.Schema.ConstraintViolation] Node 15 already exists with label User and property "email"=[15]
neo4j.DatabaseError: [Neo.DatabaseError.Statement.ExecutionFailure] scala.MatchError: (email,null) (of class scala.Tuple2)
at <Java stack first, to include in Neo4j bug report>
at <then Node.js stack...>
neo4j.DatabaseError: 502 Bad Gateway response for POST /db/data/transaction/commit: "<html><body><h1>502 Bad Gateway</h1>\nThe server returned an invalid or incomplete response.\n</body></html>\n"
neo4j.TransientError: [Neo.TransientError.Transaction.DeadlockDetected] LockClient[1150] can't wait on resource RWLock[NODE(196), hash=2005718009] since => LockClient[1150] <-[:HELD_BY]- RWLock[NODE(197), hash=1180589294] <-[:WAITING_FOR]- LockClient[1149] <-[:HELD_BY]- RWLock[NODE(196), hash=2005718009]
Precise and helpful error reporting is one of node-neo4j's best strengths. We hope it helps your app run smoothly!
(TODO)
(TODO)
- change password
- get labels, etc.
Questions, comments, or other general discussion? Google Group »
Bug reports or feature requests? GitHub Issues »
You can also try Gitter, Stack Overflow or Slack (sign up here).
Copyright © 2016 Aseem Kishore and contributors.
This library is licensed under the Apache License, Version 2.0.