Accessibility pipeline

This project facilitates the greater part of an initiative to gather information about usage of accessibility settings on mobile phones. The process is currently used for iOS and Android for several different organisations. Ultimately the results of different organisations can be combined to create insights into accessibility setting usage by mobile users. Q42 currently provides this data to the collaborating organisations using dashboards. The data is also used to supply https://appt.org with relevant accessibility statistics.

The process

The entire process consists of three main components.

1. The mobile library

The mobile library component is embedded in applications and reads user settings in regard to accessibility settings. Currently the following two libraries are being used:

2. The API

A Flask API that handles requests send by the mobile library. It performs basic validation of the requests and passes the request body in its entirety to the correct Firestore collection.

3. The vertex AI pipeline

Initiated by Cloud Scheduler and Cloud Functions a nightly Vertex AI pipeline job is executed. The pipeline job is compiled using Kubeflow.

3.1 From firestore to a Bigquery update table

Initially the Firebase export function is called to export data to Cloud Storage. Afterwards this export can be imported into Bigquery. Once the data is in a Bigquery table the data is cast to a different type and stored in a typed updates table.

3.2 From the Bigquery updates table to the events table

Data is gathered fully anonymously. To accomplish this the decision has been made to refrain from using unique identifiers or otherwise privacy-sensitive information. To be able to apply updates to the aggregation table we've introduced the concept of current measurements and previous measurements. Individual update requests always consist of a current measurement and optionally contain a previous measurement. When inserting a new event a hash is made of both the current and previous measurement.

3.3 From the Bigquery updates table to the aggregation table

When applying an update the current measurement can be immediately inserted. In the case of the previous measurement the hash is used to delete the first corresponding row from the events table.

3.4 Additional properties

Based on the data in the aggregations certain interpretations can be made. In our case we have added two properties that determine the most common values for certain Android devices to make an estimate of the default content size. Once an estimate has been made of the default content size one can determine whether a given data event has a content size larger or smaller than the default.

3.5 Cleaning up

Once the before mentioned stages have been finished a series of cleanup components clean up the different data stores.

Cloud tools

Vertex AI Pipelines: https://console.cloud.google.com/vertex-ai/training/training-pipelines
BigQuery: https://console.cloud.google.com/bigquery
Scheduler: https://console.cloud.google.com/cloudscheduler
Functions: https://console.cloud.google.com/functions/list
Storage:
- Pipeline run data: https://console.cloud.google.com/storage/browser/
- Firestore exports: https://console.cloud.google.com/storage/browser/
Alerting: https://console.cloud.google.com/monitoring/alerting/policies

Setup

Setting up storage

A plethora of data storages is used which require some setup.

Google cloud

Create a Google Cloud project

Firestore

Create a collection to store requests. We use one collection for iOS and one collection for Android. We work with one organisation that maintains their own Firestore collection. We therefore have added distinct behaviour in our API and pipeline to also incorporate data from this collection. This especially impacts the authentication processes.

Google Cloud Storage

Create a storage container in Google Cloud Storage.
Create a settings.json file in the root of the project which contains two keys: project_name and storage_bucket accompanied by the name of your Google project and the created storage bucket.

Bigquery

Create an events table per operating system per organisation
Create an aggregation table per operating system per organisation

Setting up functionality

Deploy the API

Documentation on how to deploy the API can be found in the readme in the api directory.

Deploy a first build of the pipeline

Deploy a cron job per operating system For more information take a look at the readme in the pipelines directory

Setup Cloud scheduler

Go to the cloud scheduler
Create a new job
Configure timings and such as desired
Under configure the execution:
1. Set url to the cloud function that was created by deploying the cron job
2. Place the pipeline settings in the body. (An empty example can be found in pipelines/config. Documentation for the parameters can be found in pipelines/pipelines/pipeline_ios.py)

Contact

Johan Huijkman (Accessibility Engineer) - [email protected]
Leonard Punt (Pipeline Engineer) - [email protected]
Joris Bruil (Pipeline Engineer) - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
api		api
cloud-function		cloud-function
pipelines		pipelines
.gitignore		.gitignore
README.md		README.md
pipeline_overview.png		pipeline_overview.png
pull_request_template.md		pull_request_template.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accessibility pipeline

The process

1. The mobile library

2. The API

3. The vertex AI pipeline

3.1 From firestore to a Bigquery update table

3.2 From the Bigquery updates table to the events table

3.3 From the Bigquery updates table to the aggregation table

3.4 Additional properties

3.5 Cleaning up

Cloud tools

Setup

Setting up storage

Google cloud

Firestore

Google Cloud Storage

Bigquery

Setting up functionality

Deploy the API

Deploy a first build of the pipeline

Setup Cloud scheduler

Contact

About

Releases

Packages

Contributors 3

Languages

Q42/accessibility-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Accessibility pipeline

The process

1. The mobile library

2. The API

3. The vertex AI pipeline

3.1 From firestore to a Bigquery update table

3.2 From the Bigquery updates table to the events table

3.3 From the Bigquery updates table to the aggregation table

3.4 Additional properties

3.5 Cleaning up

Cloud tools

Setup

Setting up storage

Google cloud

Firestore

Google Cloud Storage

Bigquery

Setting up functionality

Deploy the API

Deploy a first build of the pipeline

Setup Cloud scheduler

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages