Skip to content
CaptainEmerson edited this page Sep 18, 2014 · 9 revisions

The Problem

The Eclipse Usage Data Collector (UDC). Huge possibilities, right? There's lots of interesting research that uses it, from estimates of how often people use refactoring tools to extracting process models.

Since the data is freely available, the sky's the limit, eh? Not so fast. Have you considered that:

  • It'll take you a while to download and process the data. There's about 200GB of CSVs.
  • The data's a bit dirty, with unclosed quotes, for instance.
  • Once you've got the data set up, is your database going to be able to run queries fast enough? In the past, we've set up a MySQL database with the data, optimized the indices, and it still took hours to complete some queries.

The Solution

We've put up the entire public data set on Google BigQuery, which enables you to query and process this data quickly using Google's infrastructure. This document gives a couple of examples of how you can access and use this data.

Prerequisites

You'll have to have a Google account and then create an empty BigQuery-enabled project

Example 1: The BigQuery Web Interface

The easiest way to run some quick queries is to go straight to the web interface. Then, click "Compose Query", and type in

select count(*) from [udc-data:udc.all];

You should get "2323233101" -- that's about 2.3 billion records. Let's take a quick look at what the data actually looks like:

select * from [udc-data:udc.all] LIMIT 5;

Row userId what kind bundleId bundleVersion description time 1 335213 started bundle org.eclipse.equinox.app 1.1.0.v20080421-2006 org.eclipse.equinox.app 1239957498477 2 335213 started bundle org.eclipse.equinox.common 3.4.0.v20080421-2006 org.eclipse.equinox.common 1239957498477 3 335213 started bundle org.eclipse.equinox.frameworkadmin 1.0.2.R34x_v20081007 org.eclipse.equinox.frameworkadmin 1239957498477 4 335213 started bundle org.eclipse.equinox.frameworkadmin.equinox 1.0.4.v20080930 org.eclipse.equinox.frameworkadmin.equinox 1239957498477 5 335213 started bundle org.eclipse.equinox.p2.core 1.0.4.v20081112-1019 org.eclipse.equinox.p2.core 1239957498477

Example 2:

Other Solutions

Thanks

Thanks to Wayne Beaton and Mohsen Vakilian. Google. How to Cite.

Clone this wiki locally