-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Start a page for count models, work on #298
- Loading branch information
1 parent
416d39c
commit b1f3607
Showing
3 changed files
with
47 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: "Count_Models" | ||
categories: | ||
- Statistical Methods | ||
--- | ||
|
||
This page discusses why count models are necessary in certain applications, and | ||
discusses beginning details of the Poisson, negative binomial, and hurdle models. | ||
|
||
## Continuous versus count outcomes | ||
Typical [regression models](Regression_Analysis) are aimed at predicting the | ||
response of an outcome variable $y$ to a series of input variables $X = [x_1, x_2, \ldots x_p]$. | ||
The result is a linear equation of a vector $\beta$ that describes the relationship | ||
between each element of $X$ and the outcome $y$. | ||
|
||
$$y = X\beta$$ | ||
This regression framework assumes that $y$ is a continuous variable, meaning | ||
that it can take any numeric value within a particular range. The plot below | ||
shows the relationship between the distance between home and workplace on the | ||
$x$ axis, and the total miles driven on all vehicle trips on the $y$ axis, | ||
for a sample of 5,000 reported car commuters who responded to the 2017 NHTS. | ||
Both of these variables are continuous, meaning that a simple $y = X\beta$ | ||
regression model is appropriate, though more information might need to be | ||
added to the model below to improve its fit and help explain outlying observations | ||
or control for heteroskedasticity. | ||
|
||
|
||
![](count_continuous-1.png)<!-- --> | ||
|
||
But consider the plot below, showing the distance between home and work on the $x$ axis and | ||
the number of vehicles owned by the commuter's household on the $y$ axis. Because | ||
the number of vehicles is discrete and not continuous, the plot looks kind of | ||
funny. But more importantly than this, we want a model that will predict | ||
a discrete number of vehicles as an outcome variable, and the blue regression | ||
line we estimated below will predict between 2.5 and 3.2 vehicles per household; | ||
this isn't ideal. | ||
|
||
|
||
![](count_discrete-1.png)<!-- --> | ||
|
||
## Poisson Model | ||
|
||
|
||
### Negative Binomial Model | ||
|
||
|
||
## Hurdle Model |