Many people like you and I feel that we are unable to cope with all the various privacy notices we have to deal with on a daily basis. Our data is important to us, and indeed defines our online identity. Your Privacy Butler will help you to understand any privacy notices. Simply tell Privacy Butler which data processing is a “no go” for you. It converts the privacy notice into icons that show you immediately whether your desired data protection standard is met or not.
This is a concept-stage project that was started at the Swiss Legal Tech 2018 hackathon in Zürich, Switzerland. The original challenge idea can be found here (PDF).
Currently our prototype has a minimal, functioning user interface. There is a basic welcome screen, followed by the configuration of tracking preferences:
Upon clicking the computer icon, a text field appears which accepts the URL of a privacy policy. If the result is not compliant with the user's preferences, the analysis is shown:
However, if the user's selections are matched by the algorithm, then simply a confirmation screen appears:
You can see a screencast of the hackathon demo here.
Technical notes about our prototype solution:
The backend uses Java with Spring Boot 2 and communicates with the Google Cloud Natural Language API. You have to create the credentials yourself in order to be able to communicate with Google Cloud.
Find the backend project files here and instructions to get started in the legal-hackathon-backend folder.
The frontend uses Typescript with Angular 6 and Material Design as a styling framework addition to Angular.
You can find the frontend project files and build instructions in legal-hackathon-frontend.
There is also a static HTML launch page defined in index.html
with resources in the web
folder. The design template used is HTML5 UP, with jQuery and FontAwesome.
In the markup of the HTML page we have a simple proposal on using schema in META tags to publish web site policy in machine readable form, e.g.:
<meta name="privacy:geotracking" value="no" />
<meta name="privacy:thirdparties" value="no" />
<meta name="privacy:profiling" value="no" />
More projects similar this can be found in our reading list below.
We used the Google Cloud Natural Language API in this project for rapid analysis of policy texts. See Quickstart, NL Samples, and Java samples for Google Cloud Platform.
You will need to obtain a developer key from the Cloud API console to use our current backend.
We also ran a short machine learning classification experiment using an open dataset of opt-out policies from usableprivacy.org in the Keras.io deep learning environment. The results can be seen in a Python notebook made with Jupyter, in the ml
subfolder.
The dataset used in the experiment above was one of the ones recommended by Pribot.org, a project that was a major motivation for our work here. Many thanks to Dr. Harkous for feedback to our concept during the hackathon.
We also considered using IBM Watson (see Fredrik Stenbeck comparison - and OpenNLP at Apache.
Further reading, in no particular order.
- Pribot - an EPFL research project - visualizes and responds to questions about privacy policies.
- Privacy Bot gathers, persists and analyzes privacy policies - based on the PrivacyGuide paper (2018, Tesfay et al).
- Terms of Service; Didn't Read crowdsources reviews of online policies.
- Privacy Badger from EFF is a browser plugin that learns to block trackers, with an policy compliance mechanism.
- PrivacyCheck is a Google Chrome browser plugin that automatically summarizes and visualizes online privacy policies.
Additional documentation that could be useful to developers of policy tools:
- Privacy Shield lists companies that comply with data protection regulations, along with policy summaries.
- A Domain Ontology For Online Privacy is a research paper that suggests improvements to policy statements.
- Data Privacy Vocabularies and Controls is a new Community Group at the W3C, that aims to define a taxonomy of privacy terms.
- Ontology for Data Privacy Policy proposes a definition for publishing privacy policies as Linked Data.
- GDPRtEXT is a representation of the text of GDPR using Linked Data.
- Open Badges are tools from Mozilla for online learning, which could in theory be adapted for privacy preferences.
A couple of introductory articles on relevant topics in Machine Learning: