Skip to content

thingdom/node-neo4j

Repository files navigation

Node-Neo4j

npm version Build Status

Node.js driver for Neo4j, a graph database.

This driver aims to be the most robust, comprehensive, and battle-tested driver available. It's run in production by FiftyThree to power the popular iOS app Paper.

Note: if you're still on Neo4j 1.x, you'll need to use node-neo4j v1.

Note: node-neo4j v2 is a ground-up rewrite with an entirely new API. If you're currently using node-neo4j v1, here's the migration guide.

Features

Installation

npm install neo4j --save

Example

var neo4j = require('neo4j');
var db = new neo4j.GraphDatabase('http://username:password@localhost:7474');

db.cypher({
    query: 'MATCH (user:User {email: {email}}) RETURN user',
    params: {
        email: '[email protected]',
    },
}, callback);

function callback(err, results) {
    if (err) throw err;
    var result = results[0];
    if (!result) {
        console.log('No user found.');
    } else {
        var user = result['user'];
        console.log(user);
    }
};

Yields e.g.:

{
    "_id": 12345678,
    "labels": [
        "User",
        "Admin"
    ],
    "properties": {
        "name": "Alice Smith",
        "email": "[email protected]",
        "emailVerified": true,
        "passwordHash": "..."
    }
}

See node-neo4j-template for a more thorough example.

Basics

Connect to a running Neo4j instance by instantiating the GraphDatabase class:

var neo4j = require('neo4j');

// Shorthand:
var db = new neo4j.GraphDatabase('http://username:password@localhost:7474');

// Full options:
var db = new neo4j.GraphDatabase({
    url: 'http://localhost:7474',
    auth: {username: 'username', password: 'password'},
    // ...
});

Options:

  • url (required): the base URL to the Neo4j instance, e.g. 'http://localhost:7474'. This can include auth credentials (e.g. 'http://username:password@localhost:7474'), but doesn't have to.

  • auth: optional auth credentials; either a 'username:password' string, or a {username, password} object. If present, this takes precedence over any credentials in the url.

  • headers: optional custom HTTP headers to send with every request. These can be overridden per request. Node-Neo4j defaults to sending a User-Agent identifying itself, but this can be overridden too.

  • proxy: optional URL to a proxy. If present, all requests will be routed through the proxy.

  • agent: optional http.Agent instance, for custom socket pooling.

Once you have a GraphDatabase instance, you can make queries and more.

Most operations are asynchronous, which means they take a callback. Node-Neo4j callbacks are of the standard (error[, results]) form.

Async control flow can get pretty tricky, so it's highly recommended to use a flow control library or tool, like async or Streamline.

Cypher

To make a Cypher query, simply pass the string query, any query parameters, and a callback to receive the error or results.

db.cypher({
    query: 'MATCH (user:User {email: {email}}) RETURN user',
    params: {
        email: '[email protected]',
    },
}, callback);

It's extremely important to pass params separately. If you concatenate them into the query, you'll be vulnerable to injection attacks, and Neo4j performance will suffer as well. (Note that parameters can't be used for labels, property names, and relationship types, as those things determine the query plan. Docs »)

Cypher queries always return a list of results (like SQL rows), with each result having common properties (like SQL columns). Thus, query results passed to the callback are always an array (even if it's empty), and each result in the array is always an object (even if it's empty).

function callback(err, results) {
    if (err) throw err;
    var result = results[0];
    if (!result) {
        console.log('No user found.');
    } else {
        var user = result['user'];
        console.log(user);
    }
};

If the query results include nodes or relationships, Node and Relationship instances are returned for them. These instances encapsulate {_id, labels, properties} for nodes, and {_id, type, properties, _fromId, _toId} for relationships, but they can be used just like normal objects.

{
    "_id": 12345678,
    "labels": [
        "User",
        "Admin"
    ],
    "properties": {
        "name": "Alice Smith",
        "email": "[email protected]",
        "emailVerified": true,
        "passwordHash": "..."
    }
}

(The _id properties refer to Neo4j's internal IDs. These can be convenient for debugging, but their usage otherwise — especially externally — is discouraged.)

If you don't need to know Neo4j IDs, node labels, or relationship types, you can pass lean: true to get back just properties, for a potential performance gain.

db.cypher({
    query: 'MATCH (user:User {email: {email}}) RETURN user',
    params: {
        email: '[email protected]',
    },
    lean: true,
}, callback);
{
    "name": "Alice Smith",
    "email": "[email protected]",
    "emailVerified": true,
    "passwordHash": "..."
}

Other options:

  • headers: optional custom HTTP headers to send with this query. These will add onto the default GraphDatabase headers, but also override any that overlap.

Batching

You can also make multiple Cypher queries within a single network request, by passing a queries array rather than a single query string.

Query params (and optionally lean) are then specified per query, so the elements in the array are {query, params[, lean]} objects. (Other options like headers remain "global" for the entire request.)

db.cypher({
    queries: [{
        query: 'MATCH (user:User {email: {email}}) RETURN user',
        params: {
            email: '[email protected]',
        },
    }, {
        query: 'MATCH (task:WorkerTask) RETURN task',
        lean: true,
    }, {
        query: 'MATCH (task:WorkerTask) DELETE task',
    }],
    headers: {
        'X-Request-ID': '1234567890',
    },
}, callback);

The callback then receives an array of query results, one per query.

function callback(err, batchResults) {
    if (err) throw err;

    var userResults = batchResults[0];
    var taskResults = batchResults[1];
    var deleteResults = batchResults[2];

    // User results:
    var userResult = userResults[0];
    if (!userResult) {
        console.log('No user found.');
    } else {
        var user = userResult['user'];
        console.log('User %s (%s) found.', user._id, user.properties.name);
    }

    // Worker task results:
    if (!taskResults.length) {
        console.log('No worker tasks to process.');
    } else {
        taskResults.forEach(function (taskResult) {
            var task = taskResult['task'];
            console.log('Processing worker task %s...', task.operation);
        });
    }

    // Delete results (shouldn’t have returned any):
    assert.equal(deleteResults.length, 0);
};

Importantly, batch queries execute (a) sequentially and (b) transactionally: they all succeed, or they all fail. If you don't need them to be transactional, it can often be better to parallelize separate db.cypher calls instead.

Transactions

You can also batch multiple Cypher queries into a single transaction across multiple network requests. This can be useful when application logic needs to run in between related queries (e.g. for domain-aware cascading deletes), or Neo4j state needs to be coordinated with side effects (e.g. writes to another data store). The queries will all succeed or fail together.

To do this, begin a new transaction, make Cypher queries within that transaction, and then ultimately commit the transaction or roll it back.

var tx = db.beginTransaction();

function makeFirstQuery() {
    tx.cypher({
        query: '...',
        params {...},
    }, makeSecondQuery);
}

function makeSecondQuery(err, results) {
    if (err) throw err;
    // ...some application logic...
    tx.cypher({
        query: '...',
        params: {...},
    }, finish);
}

function finish(err, results) {
    if (err) throw err;
    // ...some application logic...
    tx.commit(done);  // or tx.rollback(done);
}

function done(err) {
    if (err) throw err;
    // At this point, the transaction has been committed.
}

makeFirstQuery();

The transactional cypher method supports everything the normal cypher method does (e.g. lean, headers, and batch queries). In addition, you can pass commit: true to auto-commit the transaction (and save a network request) if the query succeeds.

function makeSecondQuery(err, results) {
    if (err) throw err;
    // ...some application logic...
    tx.cypher({
        query: '...',
        params: {...},
        commit: true,
    }, done);
}

function done(err) {
    if (err) throw err;
    // At this point, the transaction has been committed.
}

Importantly, transactions allow only one query at a time. To help preempt errors, you can inspect the state of the transaction, e.g. whether it's open for queries or not.

// Initially, transactions are open:
assert.equal(tx.state, tx.STATE_OPEN);

// Making a query...
tx.cypher({
    query: '...',
    params: {...},
}, callback)

// ...will result in the transaction being pending:
assert.equal(tx.state, tx.STATE_PENDING);

// All other operations (making another query, committing, etc.)
// are rejected while the transaction is pending:
assert.throws(tx.renew.bind(tx))

function callback(err, results) {
    // When the query returns, the transaction is likely open again,
    // but it could be committed if `commit: true` was specified,
    // or it could have been rolled back automatically (by Neo4j)
    // if there was an error:
    assert.notEqual([
        tx.STATE_OPEN, tx.STATE_COMMITTED, tx.STATE_ROLLED_BACK
    ].indexOf(tx.state), -1);   // i.e. tx.state is in this array
}

Finally, open transactions expire after some period of inactivity. This period is configurable in Neo4j, but it defaults to 60 seconds today. Transactions renew automatically on every query, but if you need to, you can inspect transactions' expiration times and renew them manually.

// Only open transactions (not already expired) can be renewed:
assert.equal(tx.state, tx.STATE_OPEN);
assert.notEqual(tx.state, tx.STATE_EXPIRED);

console.log('Before:', tx.expiresAt, '(in', tx.expiresIn, 'ms)');
tx.renew(function (err) {
    if (err) throw err;
    console.log('After:', tx.expiresAt, '(in', tx.expiresIn, 'ms)');
});

The full state diagram putting this all together:

Neo4j transaction state diagram

Headers

Most node-neo4j operations support passing in custom headers for the underlying HTTP requests. The GraphDatabase constructor also supports passing in default headers for every operation.

This can be useful to achieve a variety of features, such as:

  • Logging individual queries
  • Tracing application requests
  • Splitting master/slave traffic (see High Availability below)

None of these things are supported out-of-the-box by Neo4j today, but all can be handled by a server (e.g. Apache or Nginx) or load balancer (e.g. HAProxy or Amazon ELB) in front.

For example, at FiftyThree, our Cypher requests look effectively like this (though we abstract and encapsulate these things with higher-level helpers):

db.cypher({
    query: '...',
    params: {...},
    headers: {
        // Identify the query via a short, human-readable name.
        // This is what we log in HAProxy for every request,
        // since all Cypher calls have the same HTTP path,
        // and this is friendlier than the entire query.
        'X-Query-Name': 'User_getUnreadNotifications',

        // This tells HAProxy to send this query to the master (even
        // though it's a read), as we require strong consistency here.
        // See the High Availability section below.
        'X-Consistency': 'strong'

        // This is a concatenation of upstream services' request IDs
        // along with a randomly generated one of our own.
        // We log this header on all our servers, so we can trace
        // application requests through our entire stack.
        // TODO: Link to Heroku article on this!
        'X-Request-Ids': '123,456,789'
    },
}, callback);

You might also find custom headers helpful for custom Neo4j plugins.

High Availability

Neo4j Enterprise supports running multiple instances of Neo4j in a single "High Availability" (HA) cluster. Neo4j's HA uses a master-slave setup, so slaves typically lag behind the master by a small delay (tunable in Neo4j).

There are multiple ways to interface with an HA cluster from node-neo4j, but the recommended route is to place a load balancer in front (e.g. HAProxy or Amazon ELB). You can then point node-neo4j to the load balancer's endpoint.

var db = new neo4j.GraphDatabase({
    url: 'https://username:[email protected]:1234',
});

You'll still want to split traffic between the master and the slaves (e.g. reads to slaves, writes to master), in order to distribute load and improve performance. You can achieve this through multiple ways:

  • Create separate GraphDatabase instances with different urls to the load balancer (e.g. different host, port, or path). The load balancer can inspect the URL to route queries appropriately.

  • Use the same, single GraphDatabase instance, but send a custom header to let the load balancer know where the query should go. This is what we do at FiftyThree, and what's shown in the custom header example above.

  • Have the load balancer derive the target automatically, e.g. by inspecting the Cypher query. This isn't recommended. =)

With this setup, you should find node-neo4j usage with an HA cluster to be seamless.

HTTP / Plugins

If you need functionality beyond Cypher, you can make direct HTTP requests to Neo4j. This can be useful for legacy APIs (e.g. traversals), custom plugins (e.g. neo4j-spatial), or even future APIs before node-neo4j implements them.

db.http({
    method: 'GET',
    path: '/db/data/node/12345678',
    // ...
}, callback);

function callback(err, body) {
    if (err) throw err;
    console.log(body);
}
{
    "_id": 12345678,
    "labels": [
        "User",
        "Admin"
    ],
    "properties": {
        "name": "Alice Smith",
        "email": "[email protected]",
        "emailVerified": true,
        "passwordHash": "..."
    }
}

By default:

  • The callback receives just the response body (not the status code or headers);
  • Any nodes and relationships in the body are transformed to Node and Relationship instances (like cypher); and
  • 4xx and 5xx responses are treated as errors.

You can alternately pass raw: true for more control, in which case:

  • The callback receives the entire response (with an additional body property);
  • Nodes and relationships are not transformed into Node and Relationship instances (but the body is still parsed as JSON); and
  • 4xx and 5xx responses are not treated as errors.
db.http({
    method: 'GET',
    path: '/db/data/node/12345678',
    raw: true,
}, callback);

function callback(err, resp) {
    if (err) throw err;
    assert.equal(resp.statusCode, 200);
    assert.equal(typeof resp.headers, 'object');
    console.log(resp.body);
}
{
    "self": "http://localhost:7474/db/data/node/12345678",
    "labels": "http://localhost:7474/db/data/node/12345678/labels",
    "properties": "http://localhost:7474/db/data/node/12345678/properties",
    // ...
    "metadata": {
        "id": 12345678,
        "labels": [
            "User",
            "Admin"
        ]
    },
    "data": {
        "name": "Alice Smith",
        "email": "[email protected]",
        "emailVerified": true,
        "passwordHash": "..."
    }
}

Other options:

  • headers: optional custom HTTP headers to send with this request. These will add onto the default GraphDatabase headers, but also override any that overlap.

  • body: an optional request body, e.g. for POST and PUT requests. This gets serialized to JSON.

Requests and responses can also be streamed for maximum performance. The http method returns a Request.js instance, which is a DuplexStream combining both the writeable request stream and the readable response stream.

(Request.js provides a number of benefits over the native HTTP ClientRequest and IncomingMessage classes, e.g. proxy support, gzip decompression, simpler writes, and a single unified 'error' event.)

If you want to stream the request, be sure not to pass a body option. And if you want to stream the response (without having it buffer in memory), be sure not to pass a callback. You can stream the request without streaming the response, and vice versa.

Streaming the response implies the raw option above: nodes and relationships are not transformed (as even JSON isn't parsed), and 4xx and 5xx responses are not treated as errors.

var req = db.http({
    method: 'GET',
    path: '/db/data/node/12345678',
});

req.on('error', function (err) {
    // Handle the error somehow. The default behavior is:
    throw err;
});

req.on('response', function (resp) {
    assert.equal(resp.statusCode, 200);
    assert.equal(typeof resp.headers, 'object');
    assert.equal(typeof resp.body, 'undefined');
});

var body = '';

req.on('data', function (chunk) {
    body += chunk;
});

req.on('end', function () {
    body = JSON.parse(body);
    console.log(body);
});

Errors

To achieve robustness in your app, it's vitally important to handle errors precisely. Neo4j supports this nicely by returning semantic and precise error codes.

There are multiple levels of detail, but the high-level classifications are a good granularity for decision-making:

  • ClientError (likely a bug in your code, but possibly invalid user input)
  • DatabaseError (a bug in Neo4j)
  • TransientError (occasionally expected; can/should retry)

node-neo4j translates these classifications to named Error subclasses. That means there are two ways to detect Neo4j errors:

// `instanceof` is recommended:
err instanceof neo4j.TransientError

// `name` works too, though:
err.name === 'neo4j.TransientError'

These error instances also have a neo4j property with semantic data inside. E.g. Cypher errors have data of the form {code, message}:

{
    "code": "Neo.TransientError.Transaction.DeadlockDetected",
    "message": "LockClient[83] can't wait on resource ...",
    "stackTrace": "..."
}

Other types of errors (e.g. managing schema) may have different forms of neo4j data:

{
    "exception": "BadInputException",
    "fullname": "org.neo4j.server.rest.repr.BadInputException",
    "message": "Unable to add label, see nested exception.",
    "stackTrace": [...],
    "cause": {...}
}

Finally, malformed (non-JSON) responses from Neo4j (rare) will have neo4j set to the raw response string, while native Node.js errors (e.g. DNS errors) will be propagated in their original form, to avoid masking unexpected issues.

Putting all this together, you now have the tools to handle Neo4j errors precisely! For example, we have helpers similar to these at FiftyThree:

// A query or transaction failed. Should we retry it?
// We check this in a retry loop, with proper backoff, etc.
// http://aseemk.com/talks/advanced-neo4j#/50
function shouldRetry(err) {
    // All transient errors are worth retrying, of course.
    if (err instanceof neo4j.TransientError) {
        return true;
    }

    // If the database is unavailable, it's probably failing over.
    // We expect it to come back up quickly, so worth retrying also.
    if (isDbUnavailable(err)) {
        return true;
    }

    // There are a few other non-transient Neo4j errors we special-case.
    // Important: this assumes we don't have bugs in our code that would trigger
    // these errors legitimately.
    if (typeof err.neo4j === 'object' && (
        // If a failover happened when we were in the middle of a transaction,
        // the new instance won't know about that transaction, so we re-do it.
        err.neo4j.code === 'Neo.ClientError.Transaction.UnknownId' ||
        // These are current Neo4j bugs we occasionally hit with our queries.
        err.neo4j.code === 'Neo.ClientError.Statement.EntityNotFound' ||
        err.neo4j.code === 'Neo.DatabaseError.Statement.ExecutionFailure')) {
            return true;
    }

    return false;
}

// Is this error due to Neo4j being down, failing over, etc.?
// This is a separate helper because we also retry less aggressively in this case.
function isDbUnavailable(err) {
    // If we're unable to connect, we see these particular Node.js errors.
    // https://nodejs.org/api/errors.html#errors_common_system_errors
    // E.g. http://stackoverflow.com/questions/17245881/node-js-econnreset
    if ((err.syscall === 'getaddrinfo' && err.code === 'ENOTFOUND') ||
        (err.syscall === 'connect' && err.code === 'EHOSTUNREACH') ||
        (err.syscall === 'read' && err.code === 'ECONNRESET')) {
            return true;
    }

    // We load balance via HAProxy, so if Neo4j is unavailable, HAProxy responds
    // with 5xx status codes.
    // node-neo4j sees this and translates to a DatabaseError, but the body is
    // HTML, not JSON, so the `neo4j` property is simply the HTML string.
    if (err instanceof neo4j.DatabaseError && typeof err.neo4j === 'string' &&
        err.neo4j.match(/(502 Bad Gateway|503 Service Unavailable)/)) {
            return true;
    }

    return false;
}

In addition to all of the above, node-neo4j also embeds the most useful information from neo4j into the message and stack properties, so you don't need to do anything special to log meaningful, actionable, and debuggable errors. (E.g. Node's native logging of errors, both via console.log and on uncaught exceptions, includes this info.)

Here are some example snippets from real-world stack traces:

neo4j.ClientError: [Neo.ClientError.Statement.ParameterMissing] Expected a parameter named email

neo4j.ClientError: [Neo.ClientError.Schema.ConstraintViolation] Node 15 already exists with label User and property "email"=[15]

neo4j.DatabaseError: [Neo.DatabaseError.Statement.ExecutionFailure] scala.MatchError: (email,null) (of class scala.Tuple2)
    at <Java stack first, to include in Neo4j bug report>
    at <then Node.js stack...>

neo4j.DatabaseError: 502 Bad Gateway response for POST /db/data/transaction/commit: "<html><body><h1>502 Bad Gateway</h1>\nThe server returned an invalid or incomplete response.\n</body></html>\n"

neo4j.TransientError: [Neo.TransientError.Transaction.DeadlockDetected] LockClient[1150] can't wait on resource RWLock[NODE(196), hash=2005718009] since => LockClient[1150] <-[:HELD_BY]- RWLock[NODE(197), hash=1180589294] <-[:WAITING_FOR]- LockClient[1149] <-[:HELD_BY]- RWLock[NODE(196), hash=2005718009]

Precise and helpful error reporting is one of node-neo4j's best strengths. We hope it helps your app run smoothly!

Tuning

(TODO)

Management

(TODO)

  • change password
  • get labels, etc.

Help

Questions, comments, or other general discussion? Google Group »

Bug reports or feature requests? GitHub Issues »

You can also try Gitter, Stack Overflow or Slack (sign up here).

Contributing

See CONTRIBUTING.md »

History

See CHANGELOG.md »

License

Copyright © 2016 Aseem Kishore and contributors.

This library is licensed under the Apache License, Version 2.0.