If you're looking to make a nice binned scatter plot with a regression line and you don't need to account for any control variables use seaborn.regplot! If you're looking for a Python analog to Stata's binscatter, read on.
Stata's binscatter
is described fully by Michael Stepner
here. You can use this Python version in
essentially the same way you use Matplotlib functions like plot
and scatter
.
A more extensive description is here.
-
Copy and paste: Binscatter's meaningful code consists of consists of just one file. You can copy
binscatter/binscatter.py
into the directory the rest of your code is in. If you are missing dependencies, you may first need to install them. One way of doing that is withpip install -r requirements.txt
. -
Install
binscatter
via pip: To make it easier to usebinscatter
in multiple projects and directories, open a terminal and- git clone https://github.com/esantorella/binscatter.git
- cd binscatter
- pip install .
import binscatter
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# Create fake data
n_obs = 1000
data = pd.DataFrame({"experience": np.random.poisson(4, n_obs) + 1})
data["Tenure"] = data["experience"] + np.random.normal(0, 1, n_obs)
data["Wage"] = data["experience"] + data["Tenure"] + np.random.normal(0, 1, n_obs)
fig, axes = plt.subplots(2, sharex=True, sharey=True)
# Binned scatter plot of Wage vs Tenure
axes[0].binscatter(data["Tenure"], data["Wage"])
# Binned scatter plot that partials out the effect of experience
axes[1].binscatter(
data["Tenure"],
data["Wage"],
controls=data["experience"],
recenter_x=True,
recenter_y=True,
)
axes[1].set_xlabel("Tenure (residualized, recentered)")
axes[1].set_ylabel("Wage (residualized, recentered)")
plt.tight_layout()
plt.show()