Skip to content
Andreas Pieper edited this page Mar 18, 2014 · 22 revisions

Creating interactive circular plots for the web using D3

D3 is a JavaScript library which empowers creating beautiful interactive visualisations in HTML. Although not tied to the Web per se, it is predominantly used to do data-driven manipulations of Web content, especially SVG documents embedded in HTML. D3 is the fourth iteration of a visualization library, its precursors are Prefuse (Java, 2005), Flare (Actionscript, 2007) and Protoviz (Javascript, 2009), all of which the author of D3 had a leading role in.

D3 was created 2010 by Mike Bostock and sponsored by his employer, The New York Times. It has since received great attention and is used in various scenarios, especially for data visualisations. It has gained considerable traction in the relatively new discipline of data journalism.

D3 and SVG

Scalable Vector Graphics (SVG) is an XML-based vector image format that has support for interactivity and animation. Embedding SVG inside HTML documents make it possible to dynamically create various shapes like circles, bézier curves and rectangles. SVG can be scripted via DOM API and styled with CSS, similar to HTML. With D3 it is possible to build complex interactive graphics in the browser.

Data Processing

To minimize computation needs on the client, raw data is preprocessed and transformed to a structure best suited for the JavaScript language which in turn generates the visible circular plot. While the CSV format is a good fit for tabular data interchange, JSON (JavaScript Object Notation) has the ability to hold complex structured data and is natively parseable in Javascript programs. The output of the transform step is therefore JSON-formatted.

Preprocessing is a two-step affair:

  • Filter out Countries with too small migration flows
  • Transform into a D3-optimized JSON structure

The first step is to remove countries which have very small migration flows. Otherwise the graphic becomes too complex with too many elements and gets unresponsive. Too many fine shapes also make it difficult to read the plot and distracts from the important flows. The countries to remove are defined in a seperate CSV file.

The interchange CSV looks like this:

originregion_name,destinationregion_name,origin_iso,origin_name,destination_iso,destination_name,countryflow_1990,countryflow_1995,countryflow_2000,countryflow_2005
North America,North America,CAN,Canada,CAN,Canada,0,0,0,0
North America,North America,CAN,Canada,USA,United States,1509,190436,238,28
North America,North America,USA,United States,CAN,Canada,56108,635,84430,96074
North America,North America,USA,United States,USA,United States,0,0,0,0

The CSV file defining visible countries is formatted like this:

iso,show
USA,1
FIN,0

In this example, USA will be shown, whereas FIN will be hidden.

The result of the first step is exactly like the input CSV, except that the rows where the value of the origin_iso or destination_iso columns have a 0 in the countries filter CSV are removed.

The output of step one is then used as input of the second step, the compile step. This one creates a data structure which can be consumed by the Javascript running on the client in a fast and efficient manner.

The resulting final JSON looks like this:

{
  "regions": [0, 3, 36, 61, 74, 88, 96, 101, 110, 113],
  "names": [
    "North America",
    "Canada",
    "United States",
    "Africa",
    "Angola",
    ...
    "Venezuela"
  ],
  "matrix": {
    "2005": [
      [ 139950, ... 8621 ],
      [ 51564, ... 458 ],
      ...
    ],
    "1990": [
      ...
    ]
  }
}

To reduce the amount of chords displayed at any time, data is accumulated in region flows. Only Region flows are initially displayed in the plot. The user can expand a region to see individual country flows by clicking on the region.

There are only two regions expanded to individual countries at any time, again for perfomance and focus reasons. When the user expands a third region, the first region collapses.

To achieve this, the region flows are stored in the flow matrix in the data structure, followed by the appropriate country flows. A regions index keeps track of the region flows. Expanding a region is then done by displaying all flows in the matrix between the current region index and the next region index. To display labels, region and country names are included. More on matrices and the data format can be found here.

An implementation of these tasks as well as a description and usage instructions can be found in the Circular Migration Plot Library.

Extending D3

While D3 provides helpful layouts for generating chrords, the original implementation had to be extended to fit the requirements of migration flow charts.

One major difference between the chord provided by D3 and the one used in migration flow charts is the fact that migration flow charts displays two directed chords, one for each direction (in and out).

TODO: add screenshots of original d3 chord and our chord http://bl.ocks.org/mbostock/4062006

A chord is a shape which displays a single flow. It is a geometric shape with two arcs connected with two bezier curves. The other difference to the original D3 chord is that chord ends on a slightly smaller radius of the main circle to distinguish direction.

The modified chord layout can be found alogn with the extended chord shape in the lib/ folder of the Circular Migration Plot Library.

Conclusion / Summary

TODO

  • OpenWeb technologies made interactive charts possible / made them widely used
  • Interactive visualisations sometimes make data explorable at all
Clone this wiki locally