Hands-on tutorials for basic visualization techniques and the necessary data processing
Data visualization transforms datasets into visual and interactive representations. As we encounter growing datasets in various sectors we need to develop effective methods for making sense of data. Data visualization relies on computational means and our perceptual system to help reveal otherwise invisible patterns and gain new insights. Across various fields, there is great hope in the power of visualization to turn complex data into informative, engaging, and maybe even attractive forms. However, it typically takes several steps of data preparation and processing before a given dataset can be meaningfully visualized. While visualizations can indeed provide novel and useful perspectives on data, they can also obscure or misrepresent certain aspects of a phenomenon. Thus it is essential to develop a critical literacy towards data visualizations. One of the best ways to achieve this is to create them yourself!
The following tutorials require basic familiarity with statistics and programming. They come as Jupyter notebooks containing both human-readable explanations as well as computable code. The code blocks in the tutorials are written in Python, which you should either have already some experience with or a keen curiosity for.
The tutorials make frequent use of the data analysis library Pandas, the visualization library Altair, and a range of other packages that you can find in requirements.txt
. If you run these notebooks locally, you might want to run pip install -r requirements.txt
first. You can view the tutorials as webpages, open and run them on Deepnote and
MyBinder, or download the Jupyter notebook files to edit and run them locally in your own environment. The first four tutorials lay the groundwork, after which five common data structures are covered:
- Getting started Refresh your Python skills and meet Pandas and Altair
- Visual encoding Learn how to transform data dimensions into visual variables
- Data wrangling Load and parse different data formats and examine their contents
- Interaction techniques Add interactivity to visualizations and support data exploration
- Temporal analysis Analyze temporal data and present time spans, trends, and patterns
- Text processing Extract common words, filter them by type, and them in context
- Many dimensions Combine datasets and create multidimensional visualizations
- Network analysis Load network data, examine graph metrics, and visualize their structure
- Geovisualization Work with geospatial data and render different kinds of maps
The tutorials were written by Marian Dörk for data visualization courses in information science, interface design, and urban futures at FH Potsdam. Since their initial creation during the special summer of 2020, the tutorials have been gradually updated over time. Many thanks to Fidel Thomet, Jonas Parnow, Viktoria Brüggemann, Ilias Kyriazis et al. of the UCLAB for frequent feedback on the tutorials and to the many students at FH Potsdam who worked through pencil exercises and helped refine the tutorials over the years. Special thanks also to the many generous creators of the various open source software packages used throughout the tutorials.
Cite this resource as:
Dörk, M. (2023). Data Visualization: Hands-on tutorials for basic visualization techniques and the necessary data processing. Retrieved from https://infovis.fh-potsdam.de/tutorials/
The notebooks are released under the Creative Commons Attribution license (CC BY 4.0). Feel free to reuse, adapt, and translate! If you encounter any errors or have any suggestions for improvement, feel free to send an email fork this repository and send a pull request.