(a framework for managing Sensitive Data)
The central ideal behind "Ginema" is to decouple the sensitive data from the domain model classes in order to perform federated search against different datasets and cloud resources. Sensitive data are stored in separated hash based structures and serialized using specific strategies transparent to the domain model.
In order word, any field of a domain object containing a sensitive data is replaced with a hash referencing a serialized structure potentially saved in another system.
This decoupling allows to store data to the could and sensitive data locally or in a separated structure and merge them when necessary, or distribute sensitive data relative to a particular domain object independently.
Getting started:
The framework is based on a declarative approach which consist of annotating a domain model in the following way:
@SensitiveDataRoot(name = "simpleDomainObject")
public class SimpleDomainObject {
private SensitiveDataID id;
private SensitiveDataField<String> name;
private SensitiveDataField<Date> dateOfBirth;
private SimpleDomainObject child;
The framework supports the process of:
-
Serialization:
-
Deserialization
The simplest enrichment supported is the one to extract from an object its sensitive data fields:
SensitiveDataHolder extract =
SensitiveDataExtractor.extractSensitiveData(object);
The opposite operation is to populate an object with sensitive data
SensitiveDataEnricher.enrich(sensitiveDataHolder,object);
Extracting a “SensitiveDataHolder” is the process of converting the sensitive data stored in the domain object into a JSON structure with a specific schema, which allows also to store all data with relative types.
The serialization mechanism is based on a data independent Json Structure used in conjunction with Apache Hadoop and in general in the Big Data scenario (Apache Avro).
An example of “SensitiveDataHolder” JSON:
{
"id":"4aaf83c4-b781-4a47-abac-4c822e2989c9",
"domain":"domain",
"dates":{
"87e2ffbd-43bb-4f78-9610-a81be22f9daf":{
"name":"87e2ffbd-43bb-4f78-9610-a81be22f9daf",
"value":1457889366167
}
},
"strings":{
"87e2ffbd-43bb-4f78-9610-a81be22f9daf":{
"name":"87e2ffbd-43bb-4f78-9610-a81be22f9daf",
"value":"iban account"
}
},
"longs":null,
"integers":null,
"floats":null,
"doubles":null,
"bytes":null,
"booleans":null
}
While the backing JSON schema can be seen at the URL:
Figure Domain Object ER
Encryption and cloud scenario
Once the data is decoupled from the domain model, it can be encrypted and distributed autonomously and stored into different system.
One of the advantages of this approach is the following:
-
Application data can be resident in any cloud while client and sensitive data can be stored in a client resident structure or similar.
-
Sensitive data can be distributed using a mechanism similar to what is used in the mail encryption (PGP)
-
The enricher mechanism can be extended to other language than Java since the backing JSON schema is standard.
Figure Sensitive Data Merging on the cloud
Structure of the Project
-
Ginema api: core API for enriching and storage
-
Ginema crypto: (Module to support encryption with normal RSA, Elliptic curve, PGP)
-
Ginema Server: A rest server to support data sensitive data distribution
Further development
The current development is aimed to create a sensitive data server which can expose fast REST API to retrieve sensitive data, but also to search them using advances queries (Lucene based).
The other stream of work is to support efficient federated research against sensitive and non sensitive data providing an unified search interface.
Technologies:
Java 1.8 Maven 3.1.1 Spring Boot