Google BigQuery is a data warehousing platform with an SQL interface on top to allow fast query-based access to data.
The RIPE Atlas network measurement platform conducts network measurements, and makes the results of those measurements available via the RIPE Atlas API and bulk downloads.
The API provides some opportunities to filter data, but cannot offer significant compute cycles for calculation. To provide more scope for computation and analysis of this data, we are now storing RIPE Atlas data in Google BigQuery.
For background information on the service, please refer to https://labs.ripe.net/tools/.
In particular, note that your usage of this data falls under the RIPE Atlas Terms and Conditions.
In order to get started, you need a Google account, and you need a project to run queries under. More information here:
If you just want to jump in: the public datasets are viewable from our public project:
https://console.cloud.google.com/bigquery?project=ripencc-atlas
Initially, we will offer two datasets: samples, and measurements.
The samples dataset contains six tables with a static, 1% sample of recent measurement results. The tables are:
- ripencc-atlas.samples.dns (schema)
- ripencc-atlas.samples.http (schema)
- ripencc-atlas.samples.ntp (schema)
- ripencc-atlas.samples.ping (schema)
- ripencc-atlas.samples.sslcert (schema)
- ripencc-atlas.samples.traceroute (schema)
These are intended for you to test the service on trivial data volumes, to better understand what's in there quickly.
The measurements dataset contains six public views that are continuously updated with public RIPE Atlas measurement results. Schemas are identical to the samples tables.
The six views are:
- ripencc-atlas.measurements.dns
- ripencc-atlas.measurements.http
- ripencc-atlas.measurements.ntp
- ripencc-atlas.measurements.ping
- ripencc-atlas.measurements.sslcert
- ripencc-atlas.measurements.traceroute
These tables contain measurement results starting from 1 January 2020.
These documents are intended to help bootstrap folks into querying this data, but they're definitely not everything you can do.
- Determine minimum RTT from any probe to any measured RTT
- Subclauses and iterating on query building
- Cost efficiency: the important options to minimise or estimate query costs
We're hoping, after some time using this data, to learn more about how best to structure it, what tables would be useful for us to generate on your behalf, and how you're using it all.
If you have comments, questions, suggestions, or problems to report, please email [email protected]