-
Notifications
You must be signed in to change notification settings - Fork 2
Home
The Eclipse Usage Data Collector (UDC). Huge possibilities, right? There's lots of interesting research that uses it, from estimates of how often people use refactoring tools to extracting process models.
Since the data is freely available, the sky's the limit, eh? Not so fast. Have you considered that:
- It'll take you a while to download and process the data. There's about 200GB of CSVs.
- The data's a bit dirty, with unclosed quotes, for instance.
- Once you've got the data set up, is your database going to be able to run queries fast enough? In the past, we've set up a MySQL database with the data, optimized the indices, and it still took hours to complete some queries.
We've put up the entire public data set on Google BigQuery, which enables you to query and process this data quickly using Google's infrastructure. This document gives a couple of examples of how you can access and use this data.
You'll have to have a Google account and then create an empty BigQuery-enabled project
The easiest way to run some quick queries is to go straight to the web interface. Then, click "Compose Query", and type in
select count(*) from [udc-data:udc.all];
You should get "2323233101" -- that's about 2.3 billion records. Let's take a quick look at what the data actually looks like:
select * from [udc-data:udc.all] LIMIT 5;
Row userId what kind bundleId bundleVersion description time 1 335213 started bundle org.eclipse.equinox.app 1.1.0.v20080421-2006 org.eclipse.equinox.app 1239957498477 2 335213 started bundle org.eclipse.equinox.common 3.4.0.v20080421-2006 org.eclipse.equinox.common 1239957498477 3 335213 started bundle org.eclipse.equinox.frameworkadmin 1.0.2.R34x_v20081007 org.eclipse.equinox.frameworkadmin 1239957498477 4 335213 started bundle org.eclipse.equinox.frameworkadmin.equinox 1.0.4.v20080930 org.eclipse.equinox.frameworkadmin.equinox 1239957498477 5 335213 started bundle org.eclipse.equinox.p2.core 1.0.4.v20081112-1019 org.eclipse.equinox.p2.core 1239957498477
Thanks to Wayne Beaton and Mohsen Vakilian. Google. How to Cite.