Skip to content

Commit

Permalink
Merge pull request #112 from jjg/depreciate_superblock
Browse files Browse the repository at this point in the history
Depreciate superblock
  • Loading branch information
Jason Gullickson committed Nov 12, 2015
2 parents 6958661 + 7d3ec9f commit 7b10f2c
Show file tree
Hide file tree
Showing 27 changed files with 700 additions and 3,778 deletions.
133 changes: 18 additions & 115 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ jsfs
A general-purpose, deduplicating filesystem with a REST interface, jsfs is intended to provide low-level filesystem functions for Javascript applications. Additional functionality (private file indexes, token lockers, centralized authentication, etc.) are deliberately avoided here and will be implemented in a modular fashion on top of jsfs.

#STATUS
JSFS 3.x introduces breaking changes to the REST API compared to earlier versions as well as the on-disk components (blocks, metadata, etc.). I'll be releasing an upgrade tool at some point to make migration from 2.x JSFS systems easier but for now be aware that there's not a direct upgrade path at this time.
Based on field testing with large storage pools (>1TB) JSFS 4.x features a complete overhaul of the storage pool architecture. As a result of these changes there are significant performance improvements and storage pool size is no longer constrained by avaliable memory. Unfortunately some features have been depreciated out of necessity, at least temporarilly (if these features are needed they are still avaliable in the 3.0 release).

The 4.x series server is not compatible with 3.x pools, so a migration utility (`migrate_superblock.js`) has been included in the `tools` directory.

#REQUIREMENTS
* Node.js
Expand All @@ -17,41 +19,11 @@ JSFS 3.x introduces breaking changes to the REST API compared to earlier version

If you don't like storing the data in the same directory as the code (smart), edit config.js to change the location where data (blocks) are stored and restart jsfs for the changes to take effect.

JSFS can now store blocks across physical disk boundaries which is useful when you need to store more data than a single disk can hold. At the moment JSFS simply distributes these blocks as evenly as possible across all configured storage devices. There are thoughts about supporting configurations that provide redundancy through the use of multiple storage devices but for now you'll want to make sure the devices have their own redundancy (or a good backup) as loosing a storage device will cause data loss just like it would with a single device.

In addition to multiple locations, you specify the maximum amount of data that can be stored per-location. The default config sets this very low (1024 bytes) so you can see what happens when you run out of space and respond accordingly. Useage and capacity statistics are logged to the console periodically so you can keep an eye on usage before you hit the limit.

##Peers (EXPERIMENTAL)
Peering is experimental and has some known issues. It should not be used in production unless you really know what you're doing.

Preliminary federation support has been added to allow one JSFS server to replicate data to a remote server. This is currently one-way only, so it is most useful for two scenarios:

###Redundancy
By configuring a peer, files stored to one server will be made avaliable at all servers configured as peers. The blocks of the file are replicated in parallel and the remote server's superblock is updated as well making the file avaliable from the remote server as well as the local one.

Files at the remote peer are stored using the local server's JSFS fully-qualified namespace, so in order to access them the remote server will need to receive requests for the original server's name (DNS or host file changes) or the fully-qualified JSFS name will need to be used in addition to the remote server's hostname (details about this can be found in this Issue: https://github.com/jjg/jsfs/issues/66).

###Improving network performance
A JSFS server running locally (or on a local network) can publish data to a remote peer (across a WAN connection for example) potentially faster than POSTing files directly to the remote server. The reason for this is that JSFS federation works on the block level, cutting large files into blocks that can be transmitted in parallel which ends up being faster across constrained links due to MTU limits, etc. The second reason is that the local JSFS server will deduplicate the data before transmitting it, therefore only sending unique blocks across the WAN connection.
Additional storage locations can be specified to allow the JSFS pool to span physical devices. In this configuration JSFS will spread the stored blocks evenly across multiple devices (inode files will be written to all devices for redundancy).

*Note: to use only the deduplicating front-end features of peering leave the `STORAGE_LOCATIONS` array empty.*
It's important to note that configuring multiple storage devices does not provide redundancy to the data stored in the pool. If a storage device becomes unavaliable, and a file is requested that is composed of blocks on the missing device, the file will be corrupt. If the device is restored, or the blocks that were stored on the device are added to the remaining device, JSFS will automatically return to delivering the undamaged files.

Peers can be configured by adding a host to the `PEERS` array in the config.js file as shown below:

````
module.exports = {
STORAGE_LOCATIONS:[
{"path":"./blocks/","capacity":100000000000}
],
PEERS:[
{"host":"host.domain.com","port":443}
],
BLOCK_SIZE: 1048576,
LOG_LEVEL: 0,
SERVER_PORT: 7302,
};
````
JSFS expects to talk to peers over SSL. If you want to use non-secure connections you'll have to modify the code for now.
Future versions of JSFS may include an option to use multiple storage locations for the purpose of redundancy.

#API

Expand All @@ -64,50 +36,37 @@ Tokens are more ephemeral, and any number of them can be generated to grant vary
jsfs uses several parameters to control access to objects and how they are stored. These values can also be supplied as request headers by adding a leading "x-" and changing "_" to "-" (`access_token` becomes `x-access-token`). Headers are preferred to querystring parameters because they are less likely to collide but both function the same.

###private
By default all objects stored in jsfs are public and will be accessible via `GET` request and show up in directory listings. If the `private` parameter is set to `true` a valid `access_key` or `access_token` must be supplied to access the object.

*NOTE: as `private` objects are not included in directory listings it is up to the client to keep track of them and their associated keys.*

###encrypted
Don't use encryption for now, it needs more testing in light of recent changes.
Set this paramter to `true` to encrypt data before it is stored on disk. Once enabled, decryption happens automatically on `GET` requests and additional modifications via `PUT` will be encrypted as well.

*NOTE: encryption increases CPU utilization and potentially reduces deduplication performance, so use only when necissary.*

###version
This parameter allows you to retreive a specific version of a file when making a `GET` request. Each time a `PUT` request is made to an existing URL a new version is created and an `x-version` header is returned if the `PUT` request is sucessful. A list of versions can be displayed by appending a `/` to the end of the URL for a file, which will return a JSON array of versions for the specified file.
By default all objects stored in jsfs are public and will be accessible via any `GET` request. If the `private` parameter is set to `true` a valid `access_key` or `access_token` must be supplied to access the object.

###access_key
Specifying a valid access_key authorizes the use of all supported HTTP verbs and is required for requests to change the `access_key` or generate `access_token`s. When a new object is stored jsfs will generate an `access_key` automatically and return it in the response to a `POST` request. Additionally the client can supply a custom `access_key` by supplying this parameter to the initial `POST` request.
Specifying a valid access_key authorizes the use of all supported HTTP verbs and is required for requests to change the `access_key` or generate `access_token`s. When a new object is stored, jsfs will generate an `access_key` automatically if one is not specified and return the generated key in the response to a `POST` request.

An `access_key` can be changed by supplying the current `access_key` along with the `replacement_access_key` parameter. This will cause any existing `access_token`s to become invalid.

*NOTE: changing the `access_key` of an encrypted object is currently unsupported!*.

###access_token
An `access_token` must be provided to execute any request on a `private` object, and is required for `PUT` and `DELETE` if an `access_key` is not supplied.

####Generating access_tokens
Currently there are two types of `access_token`s: durable and temporary. Both are generated by creating a string that describes what access is granted and then using SHA1 to generate a hash of this string, but the format and use is a little different.
Currently there are two types of `access_token`: durable and temporary. Both are generated by creating a string that describes what access is granted and then using SHA1 to generate a hash of this string, but the format and use is a little different.

Durable `access_token`s are generated by concatinating an object's `access_key` with the HTTP verb that the token will be used for.

Example to grant GET access:

"077785b5e45418cf6caabdd686719813fb22e3ce" + "GET"

This string is then hashed with SHA1 and can be used to perform a GET request for the object whose `access_key` was used to generate the token.
This string is then hashed with SHA1 and can be used to perform a `GET` request for the object whose `access_key` was used to generate the token.

To make a temporary token for this same object, concatinate the `access_key` with the HTTP verb and the expiration time in epoc format (milliseconds since midnight, 01/01/1970):

"077785b5e45418cf6caabdd686719813fb22e3ce" + "GET" + "1424877559581"

This string is then hashed with SHA1 and supplied as a parameter or header with the request, along with an additional parameter named `expires` which is set to match the expiration time used above. When jsfs receives the request it generates the same token based on the stored `access_key`, the HTTP method of the incoming request and the supplied `expires` parameter to validate the `access_token`.
This string is then hashed with SHA1 and supplied as a parameter or header with the request, along with an additional parameter named `expires` which is set to match the expiration time used above. When the jsfs server receives the request, it generates the same token based on the stored `access_key`, the HTTP method of the incoming request and the supplied `expires` parameter to validate the `access_token`.

*NOTE: all `access_tokens` can be immediately invalidated by changing an objects `access_key`, however if individual `access_tokens` need to be invalidated a pattern of requesting new, temporary tokens before each request is recommended.*

##POST
Stores a new object at the specified URL. If the object exists jsfs returns `405 method not allowed`.
Stores a new object at the specified URL. If the object exists and the `access_key` is not provided, jsfs returns `405 method not allowed`.

###EXAMPLE
Request:
Expand Down Expand Up @@ -171,67 +130,11 @@ This means that you can point DNS records for `foo.com` and `bar.com` to the sam
This also means that `GET http://foo.com:7302/files/baz.txt` and `GET http://bar.com:7302/files/baz.txt` do not return the same file, however if you need to access a file stored via a different host you can reach it using its absolute address (in this case, `http://bar.com:7302/.com.foo/files/baz.txt`).

##GET
Retreives the object stored at the specified URL. If the file does not exist a `404 not found` is returned. If the URL ends with a trailing slash `/` a directory listing of non-private files stored at the specified location.
Retreives the object stored at the specified URL. If the file does not exist a `404 not found` is returned.

###EXAMPLE
Request (directory):

curl http://localhost:7302/music/

Response:

````
[
{
"url": "/localhost/music/Brinstar.mp3",
"created": 1424878242595,
"version": 0,
"private": false,
"encrypted": false,
"fingerprint": "fde752ca6541c16ec626a3cf6e45e835cfd9db9b",
"access_key": "fde752ca6541c16ec626a3cf6e45e835cfd9db9b",
"content_type": "application/x-www-form-urlencoded",
"file_size": 7678080,
"block_size": 1048576,
"blocks": [
{
"block_hash": "610f0b4c20a47b4162edc224db602a040cc9d243",
"last_seen": "./blocks/"
},
{
"block_hash": "60a93a7c97fd94bb730516333f1469d101ae9d44",
"last_seen": "./blocks/"
},
{
"block_hash": "62774a105ffc5f57dcf14d44afcc8880ee2fff8c",
"last_seen": "./blocks/"
},
{
"block_hash": "14c9c748e3c67d8ec52cfc2e071bbe3126cd303a",
"last_seen": "./blocks/"
},
{
"block_hash": "8697c9ba80ef824de9b0e35ad6996edaa6cc50df",
"last_seen": "./blocks/"
},
{
"block_hash": "866581c2a452160748b84dcd33a2e56290f1b585",
"last_seen": "./blocks/"
},
{
"block_hash": "6c1527902e873054b36adf46278e9938e642721c",
"last_seen": "./blocks/"
},
{
"block_hash": "10938182cd5e714dacb893d6127f8ca89359fec7",
"last_seen": "./blocks/"
}
]
}
]
````

Request (file):
Request:

curl -o Brinstar.mp3 http://localhost:730s/music/Brinstar.mp3

Expand All @@ -247,7 +150,9 @@ Request:
curl -X PUT -H "x-access-key: 7092bee1ac7d4a5c55cb5ff61043b89a6e32cf71" --data-binary @Brinstar.mp3 "http://localhost:7302/music/Brinstar.mp3"

Result:
`HTTP 200`
`HTTP 206`

*note: `POST` and `PUT` can actualy be used interchangably, but HTTP conventions recommend using them as described here.*

##DELETE
Removes the file at the specified URL. This method requires authorization so requests must include a valid `x-access-key` or `x-access-token` header for the specified file. If the token is not supplied or is invalid `401 unauthorized` is returend. If the file does not exist `405 method not allowed` is returned.
Expand All @@ -258,7 +163,7 @@ Request:
curl -X DELETE -H "x-access-token: 7092bee1ac7d4a5c55cb5ff61043b89a6e32cf71" "http://localhost:7302/music/Brinstar.mp3"

Response
`HTTP 200` if sucessful.
`HTTP 206` if sucessful.

##HEAD
Returns status and header information for the specified URL.
Expand All @@ -279,5 +184,3 @@ Content-Length: 7678080
Date: Wed, 25 Feb 2015 15:43:03 GMT
Connection: keep-alive
````

*NOTE: some response information above may be removed in later versions of jsfs, in particular the `blocks` section as it's not directly useful to clients.*
80 changes: 80 additions & 0 deletions boot.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
// This script will boot server.js with the number of workers
// specified in WORKER_COUNT.
//
// The master will respond to SIGHUP, which will trigger
// restarting all the workers and reloading the app.

var cluster = require('cluster');
var workerCount = process.env.WORKER_COUNT || 4;

// Defines what each worker needs to run
cluster.setupMaster({ exec: 'server.js' });

// Gets the count of active workers
function numWorkers() { return Object.keys(cluster.workers).length; }

var stopping = false;

// Forks off the workers unless the server is stopping
function forkNewWorkers() {
if (!stopping) {
for (var i = numWorkers(); i < workerCount; i++) { cluster.fork(); }
}
}

// A list of workers queued for a restart
var workersToStop = [];

// Stops a single worker
// Gives 60 seconds after disconnect before SIGTERM
function stopWorker(worker) {
console.log('stopping', worker.process.pid);
worker.disconnect();
var killTimer = setTimeout(function() {
worker.kill();
}, 60000);

// Ensure we don't stay up just for this setTimeout
killTimer.unref();
}

// Tell the next worker queued to restart to disconnect
// This will allow the process to finish it's work
// for 60 seconds before sending SIGTERM
function stopNextWorker() {
var i = workersToStop.pop();
var worker = cluster.workers[i];
if (worker) stopWorker(worker);
}

// Stops all the works at once
function stopAllWorkers() {
stopping = true;
console.log('stopping all workers');
for (var id in cluster.workers) {
stopWorker(cluster.workers[id]);
}
}

// Worker is now listening on a port
// Once it is ready, we can signal the next worker to restart
cluster.on('listening', stopNextWorker);

// A worker has disconnected either because the process was killed
// or we are processing the workersToStop array restarting each process
// In either case, we will fork any workers needed
cluster.on('disconnect', forkNewWorkers);

// HUP signal sent to the master process to start restarting all the workers sequentially
process.on('SIGHUP', function() {
console.log('restarting all workers');
workersToStop = Object.keys(cluster.workers);
stopNextWorker();
});

// Kill all the workers at once
process.on('SIGTERM', stopAllWorkers);

// Fork off the initial workers
forkNewWorkers();
console.log('app master', process.pid, 'booted');
10 changes: 4 additions & 6 deletions config.ex
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
module.exports = {
STORAGE_LOCATIONS:[
{"path":"./blocks/","capacity":4294967296}
],
PEERS:[
//{"host":"host.domain.com","port":7302}
{"path":"./blocks/"}
//{"path":"./blocks1/"},
//{"path":"./blocks2/"}
],
BLOCK_SIZE: 1048576,
LOG_LEVEL: 0,
SERVER_PORT: 7302,
REQUEST_TIMEOUT: 30, // minutes,
ABEND_ON_MISSING_SUPERBLOCK: false
REQUEST_TIMEOUT: 30 // minutes,
};
35 changes: 0 additions & 35 deletions dedupe_frontend_notes.md

This file was deleted.

Loading

0 comments on commit 7b10f2c

Please sign in to comment.