Skip to content

Commit

Permalink
update the report
Browse files Browse the repository at this point in the history
  • Loading branch information
lucarin91 committed Apr 15, 2016
1 parent e0780d5 commit efcb725
Show file tree
Hide file tree
Showing 8 changed files with 76 additions and 75 deletions.
Binary file modified REPORT.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion report/cover.tex
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
\textsc{\Large Distributed Enabling Platforms}\\[2.5cm]

\HRule\\[0.4cm]
{\huge \bfseries PAD-FS}\\[0.4cm]
{\huge \bfseries PAD-FS}\\[0.5cm]
{\huge A distributed persistent data storage}\\[0.4cm]
\HRule\\[3cm]
\textsc{Project Report}\\[0.4cm]
Expand Down
12 changes: 6 additions & 6 deletions report/references.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

- [edwardcapriolo/gossip](https://github.com/edwardcapriolo/gossip),to implement the gossip protocol between the servers.
- [MapDB](http://www.mapdb.org/), to implement the persistent storage.
- [FasterXML/jackson](https://github.com/FasterXML/jackson), to easilly convert java class to json.
- [FasterXML/jackson](https://github.com/FasterXML/jackson), to easily convert Java class to JSON.
- [JUnit](http://junit.org/), to the test the project.
- [Spring](https://spring.io/), to implement the restful API.
- [JCommander](http://jcommander.org/), to parse the argument given to the programs.
Expand All @@ -15,20 +15,20 @@

## Nodejs and javascript libraries:{-}

- [Angularjs](https://angularjs.org/), to implement the one page site of the MonitorWebApp.
- [AngularJS](https://angularjs.org/), to implement the one page site of the MonitorWebApp.
- [Bootstrap](http://getbootstrap.com/), for he graphics of the MonitorWebApp.
- [nwjs](http://nwjs.io/), to transform the webapp to a native app for Mac Windows an Linux.
- [NW.js](http://nwjs.io/), to transform the webapp to a native app for Mac Windows an Linux.


## Build tools:{-}

- [Gradle](https://gradle.org/), to build all the project and manage the dependencies.
- [gradle-docker](https://github.com/Transmode/gradle-docker),the docker plugin for gradle.
- [jitpack](https://jitpack.io), to bulid java library from github.
- [gradle-docker](https://github.com/Transmode/gradle-docker),the docker plug-in for Gradle.
- [JitPack](https://jitpack.io), to build Java library from github.
- [npm](https://www.npmjs.com/), to manage the dependency of MonitorWebApp.
- [nw-builder](https://github.com/nwjs/nw-builder) to build MonitorWebApp for the different operation system.


## Docker images:{-}

- [java](https://hub.docker.com/_/java/) a docker image with the openjdk.
- [java](https://hub.docker.com/_/java/) a Docker image with the openJDK.
23 changes: 12 additions & 11 deletions report/sections/how.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# How to use
It is possible to use the distributed file-system in the thread version for a single machine, the multi server for a cluster of machine or using the docker container either in a single machine or in cluster.
# User Guide
It is possible to use the distributed file system in the multi-threaded version for a single machine, the multi- server for a cluster of machines or using the *Docker* container either in a single machine or in cluster.

The simplest way to use the file-system is to download the last release and run it with the only requirement of Java8. In this way the application can be used either in the thread version with the `app-<version>.jar` or in the cluster version with the storage node `core-<version>.jar` and the front node `api-<version>.jar`.
The simplest way to use the file system is to download the last release and run it with the only requirement of Java8. In this way, the application can be used either in the multi-thread version with the `app-<version>.jar` or in the cluster version with the storage node `core-<version>.jar` and the front node `api-<version>.jar`.
In the release is also possible to find the the MonitorWebApp for Linux, MacOS and Windows.


**Requirements**:

- Java8
- *Nodejs/npm (optional only for the MonitorWebApp)*
- *Docker>=10 (optional only for the docker version of the file-system)*
- *Docker>=10 (optional only for the Docker version of the file system)*


## Thread version
It can be build with:
It can be built with:
```bash
./gradlew app:build
```
Expand All @@ -35,7 +35,7 @@ java -jar -N 10 -n 2 -gport 3000 -mport 2000"
```
## Single server
It can be build with:
It can be built with:
```bash
./gradlew core:build api:build
```
Expand Down Expand Up @@ -70,16 +70,17 @@ To build the docker image of the front node and the storage node run
./gradlew core:build core:docker api:build api:docker
```
Now is possible to execute a demo by run the following perl script:
Now is possible to execute a demo by running the following Perl script:
```bash
perl start-docker.pl <number of storage node :default 5>
```
To manually run a file-system node you have to create a new docker network with the command:
To manually run a file system node you have to create a new docker network with the command:
```bash
docker network create --subnet=172.18.0.1/16 fs-net
```bash
then to start a two node file-system run
```
then to start a two node file system
```bash
docker run -d \
--net fs-net \
Expand All @@ -102,7 +103,7 @@ docker run -d \
```
## MonitorWebApp
The webapp can be used with one of the release version for the different OS or run with the nodejs interpreter with the following command:
The web-app can be used with one of the released versions for the different OS's or run with the Node.js interpreter with the following command:
```bash
./gradlew webapp:run
```
14 changes: 7 additions & 7 deletions report/sections/introduction.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Introduction
The aim of the project is to create a weakly consist distributed file-system by the use of gossiping, consistent hashing and vector clocks.
The aim of the project is to create a weakly consistent distributed file system by using gossiping, consistent hashing and vector clocks.

The communication between node exploit the Java socket mechanism so it can be execute in different ways: on a single machine with threads, on a cluster of servers or in virtual containers using *docker*.
The communication between nodes exploits the Java socket mechanism, thus it can be executed in different ways: on a single machine with threads, on a cluster of servers or in virtual containers using *Docker*.

The file-system is implemented as a map with a string key and a number or string value with the following operations:
The file system is implemented as a key value map of type $\langle string, string \cup number \rangle$ with the following operations:

- **add**(key, value), add only if the key is not present.
- **get**(key), get the value of the key if present.
- **update**(key, value), update the key with the new value only if the key already exists.
- **remove**(key), remove the key if present.
- **add**(key, value) that adds the pair only if the key is not present;
- **get**(key) returning the value of the key if present;
- **update**(key, value) that updates the key with the new value only if the key already exists;
- **remove**(key) that removes the key if present;
38 changes: 19 additions & 19 deletions report/sections/logic.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,38 @@
# Logical Structure

![Project logical structure](./img/pad-logic.png)

The file-system is composed by two fundamental parts:
The file system is composed by two fundamental parts:

- the front-end, that provides the external access to the file system through a Restful JSON API;
- the storage system itself where the data are stored and managed.

- the front-end, that provides the external access to the file-system through a Restful json API,
- the storage system its self where the data are stored and manage.
The former component does not keep any track of the data stored in the system. It only knows the servers, thanks to the gossiping protocol.

The first part doesn't have any information of the data store in the system. It only knows the servers, thanks to the gossiping protocol.

## Communication system
All the internal communication are done with the UDP transport protocol, to avoid the overhead of the TCP and assuming a reliable network between the servers.
All the internal communication relies upon the UDP transport protocol to avoid the overhead of the TCP and assuming a reliable network between the servers exists.

All the nodes, either the front-end and the storage one, use the gossip protocol to update the list of the servers involved in the file-system.
So that each node has two services running on different ports, one for the gossip protocol and one for receiving messages from the other nodes.
All the nodes, either the front-end and the storage one, exploit gossiping to update the list of the servers involved in the file system.
According to that, each node runs two services on different ports: one in charge of the gossiping protocol, the other for receiving messages from the other nodes.

When a new request arrives to a front node it is sent to a random storage node from its list and than it wait 5 second for an acknowledgement that the request is correctly served or an error, otherwise it assume that something goes wrong.
When a new request arrives to a front-end node it is sent to a random storage node from its list and then it waits 5 seconds for an acknowledgement that the request has been correctly served or not. If no message is received, it is assumed that something went wrong.

## Storage protocol
All the storage nodes use consistent hashing to assign a key value to a given server with the following strategy:
All the storage nodes use consistent hashing to assign a key-value pair to a given server with the following strategy:

- a server is master for all the keys with lower or equal hash value.
- each key is replicated to a fixed number of next server in the consistent hash.
- a server is master for all the keys with lower or equal hash value;
- each key is replicated to a fixed number of subsequence servers in the consistent hash ring;

The system use a single master storage protocol without consensus, so the value is written or read without waiting for an acknowledgement from the backup's servers.
The system use a single master storage protocol without consensus, so that the value is written or read without waiting for an acknowledgement from the backup servers.

Each time a new server turn on it immediately became master for the keys with a lower hash and a backup server for the keys owned by the previous servers. So after their neighbors discovered it, they either send the keys that it has to manage or the keys that it has to keep for backups.
Each time a new server turns on it immediately become master for the keys with a lower hash and the backup server for the keys owned by the previous server. So after its neighbors discovered it, they either send the keys that it has to manage or the keys that it has to keep for backup.

Within the data it is also added a vector clock to keep trace of with server update the value. The vector clock is implemented using a map where the key is the server id and for the value a counter, in this way all the serves that don't have a key are considered zero.
Within the data it is also added a vector clock to keep track of which server updated the value. The vector clock is implemented using a map where the key is the server id and the value is a counter; the servers that are not present in the map are considered with a 0 counter.

So each time a server update a value as a master it increment the counter with its id inside the object, and foreword this new vector with the key value to its backups server.
So each time a server updates a value as a master it increments the counter with its id inside the object, and forwards this new vector with the key-value to its backup servers.

This vector clock is used every time two version of a value are founded, after some key management, to decide with is the newer. If two unconfrontable version of the value are founded the node server create value with the two different version and put the `conflict` flag to true.
This vector clock is used every time two version of a value must be compared to decide which is the most recent. If two uncomparable versions of a value are found, the server node creates a value with the two different versions and sets the `conflict` flag.

At this point where a user try to get that key, it receive all the conflict version and it can decide with one it consider the correct newer version by done an update operation.
At this point when a user attempts to get a key, he/she receives all the conflicting versions and can decide which one is the correct version by performing an update operation.

After the update the server resolve the conflict and merge all the vector clock for of all the value, in this way if at same time one of this old values are founded it will be discarded.
After the update the server resolves the conflict and merges all the vector clocks together. For so, subsequent incoming old values for a key will be discarded.
Loading

0 comments on commit efcb725

Please sign in to comment.