Skip to content
Andreas Pieper edited this page Mar 18, 2014 · 22 revisions

Creating interactive circular plots for the web using D3

D3 is a JavaScript library which empowers creating beautiful interactive visualisations in HTML. Although not tied to the Web per se, it is predominantly used to do data-driven manipulations of Web content, especially SVG documents embedded in HTML. D3 is the fourth iteration of a visualization library, its precursors are Prefuse (Java, 2005), Flare (Actionscript, 2007) and Protoviz (Javascript, 2009), all of which the author of D3 had a leading role in.

D3 was created 2010 by Mike Bostock and sponsored by his employer, The New York Times. It has since received great attention and is used in various scenarios, especially for data visualisations. It has gained considerable traction in the relatively new discipline of data journalism.

D3 and SVG

Scalable Vector Graphics (SVG) is an XML-based vector image format that has support for interactivity and animation. Embedding SVG inside HTML documents make it possible to dynamically create various shapes like circles, bézier curves and rectangles. SVG can be scripted via DOM API and styled with CSS, similar to HTML. With D3 it is possible to build complex interactive graphics in the browser.

Data Processing

To minimize computation needs on the client, raw data is preprocessed and transformed to a structure best suited for the JavaScript language which in turn generates the visible circular plot. While the CSV format is a good fit for tabular data interchange, JSON (JavaScript Object Notation) has the ability to hold complex structured data. The output of the transform step is therefore JSON-formatted.

Preprocessing is a two-step affair:

  • Filter out Countries with too small migration flows
  • Transform into a D3-optimized JSON structure

The first step is to remove countries which have very small migration flows. Otherwise the graphic becomes too complex with too many elements and gets unresponsive. Too many fine shapes also make it difficult to read the plot and distracts from the important flows. The countries to remove are defined in a seperate CSV file.

The interchange CSV looks like this:

originregion_name,destinationregion_name,origin_iso,origin_name,destination_iso,destination_name,countryflow_1990,countryflow_1995,countryflow_2000,countryflow_2005
North America,North America,CAN,Canada,CAN,Canada,0,0,0,0
North America,North America,CAN,Canada,USA,United States,1509,190436,238,28
North America,North America,USA,United States,CAN,Canada,56108,635,84430,96074
North America,North America,USA,United States,USA,United States,0,0,0,0

The CSV file defining visible countries is formatted like this:

iso,show
USA,1
FIN,0

In this example, USA will be shown, whereas FIN will be hidden.

The result of the first step is exactly like the input CSV, except that the rows where the origin_iso or destination_iso columns have a 0 in the countries filter CSV, are filtered out.

The output of step one is then used as input of the second step, the compile step. This one creates a data structure which can be consumed by the Javascript running on the client in a fast an effisient manner.

The resulting final JSON looks like this:

{
  "regions": [0, 3, 36, 61, 74, 88, 96, 101, 110, 113],
  "names": [
    "North America",
    "Canada",
    "United States",
    "Africa",
    "Angola",
    ...
    "Venezuela"
  ],
  "matrix": {
    "2005": [
      [ 139950, ... 8621 ],
      [ 51564, ... 458 ],
      ...
    ],
    "1990": [
      ...
    ]
  }
}

To reduce the amount of chords displayed at any time, data is accumulated as region flows. The graph starts collapsed and the user can expand a region to see individual country flows by clinking on the region.

There are only two regions expanded at any time, when the user expands a third region, the first one collapses. To achieve this, the region flows are stored in the flow matrix, followed by the appropriate country flows. A regions index keeps track of the region flows. Expanding a region is then done by displaying all flows in the matrix between the current region index and the next region index. To display labels, region and country names are listed.

An implementation of these tasks as well as a description and usage instructions can be found in the Circular Migration Plot Library.

Extending D3

While D3 provides helpful layouts for generating chrords, the original implementation had to be extended to fit the requirements of migration flow charts.

One major difference between the chord provided by D3 and the one used in migration flow charts is the fact that migration flow charts displays two directed chords, one for each direction (in and out).

A chord is a shape which displays a single flow. It is a geometric shape with two arcs connected with two bezier curves. The other difference to the original D3 chord is that chord ends on a slightly smaller radius of the main circle to distinguish direction. In order to display tooltips and numbers on mouse hover, the necessary data was added.

The modified chord layout can be found alogn with the extended chord shape in the lib/ folder of the Circular Migration Plot Library.

Conclusion / Summary

TODO

  • OpenWeb technologies made interactive charts possible / made them widely used
  • Interactive visualisations sometimes make data explorable at all
Clone this wiki locally