Skip to content

This repository contains exemplary Jupyter Notebooks for analysing SOEP data with Python and R.

Notifications You must be signed in to change notification settings

zbw/soep-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jupyter Notebooks for SOEP data analysis

This repository contains exemplary Jupyter Notebooks for analysing SOEP data with Python and R.

Python version R version

Author: Heinz-Alexander Fütterer

Index

  1. Datasets
  2. Notebooks
    1. Python Notebook
    2. R Notebook
  3. Acknowledgements

Datasets

The datasets we used in these notebooks are part of the bilingual Stata based distribution of the SOEP data in version 34. Researchers will find the datasets in STATA_DEEN_v34.zip. We assume the datasets to be extracted to a directory called data/.

Citation:

Liebig, Stefan; Schupp, Jürgen; Goebel, Jan; Richter, David; Schröder, Carsten et. al. (2019): Sozio-oekonomisches Panel (SOEP), Daten der Jahre 1984-2017. Version: v34. SOEP - Sozio-oekonomisches Panel. Dataset. http://doi.org/10.5684/soep.v34

In the notebooks we make use of three datasets: hgen, pgen and ppathl:

$ md5sum data/*
510427c28ed0d7113d989a3651191af2  data/hgen.dta
096d87642640a076f4514b7163e716b2  data/pgen.dta
33fef93b16c406d2a82350772d6070cc  data/ppathl.dta

see also:

Notebooks

The exemplary Notebooks in this repository demonstrate some common processing and analysis steps usually done with SOEP data using:

  • Python
  • R

The steps are among others:

  • loading data from disk (tabular data in Stata's .dta-format)
  • selecting columns of interest
  • plotting of histograms
  • crosstables
  • grouping
  • merging multiple datasets based on key columns
  • setting values to NaN
  • plotting boxplots
  • create new columns based on content of existings columns
  • prepare subset of dataset for statistical modelling
  • statistical modelling

Python Notebook

Installation and start:

git clone https://github.com/zbw/soep-notebooks.git
cd soep-notebooks/
pip install --user --upgrade pipenv
pipenv install
pipenv shell
jupyter notebook

The Python Notebook uses these libraries:

R Notebook

The R Notebook uses these libraries:

Acknowledgements

Thanks, Andreas Franken for the initial R and Stata scripts with the examples.

Also this article proved useful to structure notebooks:

Rule, Adam, et al. "Ten simple rules for reproducible research in Jupyter notebooks." arXiv preprint arXiv:1810.08055 (2018).

About

This repository contains exemplary Jupyter Notebooks for analysing SOEP data with Python and R.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published