Skip to content

Commit

Permalink
Merge pull request #14 from rickecon/chaps
Browse files Browse the repository at this point in the history
Merging
  • Loading branch information
rickecon authored Nov 8, 2023
2 parents 2a63243 + a3b3579 commit 70cf559
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 7 deletions.
6 changes: 6 additions & 0 deletions docs/book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ parts:
chapters:
- file: contrib/contributing
- caption: Coding in Python
numbered: True
chapters:
- file: python/intro
- file: python/StandardLibrary
Expand All @@ -17,18 +18,23 @@ parts:
- file: python/DocStrings
- file: python/UnitTesting
- caption: Git and GitHub
numbered: True
chapters:
- file: git/intro
- caption: Basic Empirical Methods
numbered: True
chapters:
- file: basic_empirics/BasicEmpirMethods
- caption: Basic Machine Learning
numbered: True
chapters:
- file: basic_ml/ml_intro
- caption: Neural Nets and Deep Learning
numbered: True
chapters:
- file: deep_learn/intro
- caption: Structural Estimation
numbered: True
chapters:
- file: struct_est/intro
- file: struct_est/MaxLikelihood
Expand Down
32 changes: 25 additions & 7 deletions docs/book/basic_empirics/BasicEmpirMethods.md
Original file line number Diff line number Diff line change
Expand Up @@ -548,31 +548,49 @@ OLS predicted values for Acemoglu, et al, 2001 data
:label: ExerBasicEmpir_MultLinRegress
:class: green
```
For this problem, you will use the 397 observations from the [`Auto.csv`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics/Auto.csv) dataset in the [`/data/basic_empirics/`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics) folder of the repository for this book.[^Auto] This dataset includes 397 observations on miles per gallon (`mpg`), number of cylinders (`cylinders`), engine displacement (`displacement`), horsepower (`horsepower`), vehicle weight (`weight`), acceleration (`acceleration`), vehicle year (`year`), vehicle origin (`origin`), and vehicle name (`name`).
For this problem, you will use the 397 observations from the [`Auto.csv`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics/Auto.csv) dataset in the [`/data/basic_empirics/`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics) folder of the repository for this book.[^Auto] This dataset includes 397 observations on the following variables:
* `mpg`: miles per gallon
* `cylinders`: number of cylinders
* `displacement`: engine displacement (cubic inches)
* `horsepower`: engine horsepower
* `weight`: vehicle weight (lbs.)
* `acceleration`: time to accelerate from 0 to 60 mph (sec.)
* `year`: vehicle year
* `origin`: origin of car (1=American, 2=European, 3=Japanese)
* `name`: vehicle name
1. Import the data using the [`pandas.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function. Look for characters that seem out of place that might indicate missing values. Replace them with missing values using the `na_values=...` option.
2. Produce a scatterplot matrix which includes all of the quantitative variables `mpg`, `cylinders`, `displacement`, `horsepower`, `weight`, `acceleration`, `year`, `origin`. Call your DataFrame of quantitative variables `df_quant`. [Use the pandas scatterplot function in the code block below.]
2. Create descriptive statistics for each of the numerical variables (count, mean, standard deviation, min, 25%, 50%, 75%, max). How do you interpret the descriptive statistics on the `origin` variable? What might be a better way to report descriptive statistics for this categorical variable?
3. Produce a scatterplot matrix which includes all of the numerical variables `mpg`, `cylinders`, `displacement`, `horsepower`, `weight`, `acceleration`, `year`, `origin`. Call your DataFrame of numerical variables `df_numer`. [Use the pandas scatterplot function in the code block below.]
```python
from pandas.plotting import scatter_matrix

scatter_matrix(df_quant, alpha=0.3, figsize=(6, 6), diagonal='kde')
scatter_matrix(df_numer, alpha=0.3, figsize=(6, 6), diagonal='kde')
```
3. Compute the correlation matrix for the quantitative variables ($8\times 8$) using the [`pandas.DataFrame.corr()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html) method.
4. Estimate the following multiple linear regression model of $mpg_i$ on all other quantitative variables, where $u_i$ is an error term for each observation, using Python's `statsmodels.api.OLS()` function.
4. Compute the correlation matrix for the numerical variables ($8\times 8$) using the [`pandas.DataFrame.corr()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html) method.
5. What is wrong with estimating the following linear regression model? How would you fix this problem? (Hint: There is an issue with one of the variables)
\begin{equation*}
\begin{split}
mpg_i &= \beta_0 + \beta_1 cylinders_i + \beta_2 displacement_i + \beta_3 horsepower_i + ... \\
&\qquad \beta_4 weight_i + \beta_5 acceleration_i + \beta_6 year_i + \beta_7 origin_i + u_i
\end{split}
\end{equation*}
6. Estimate the following multiple linear regression model of $mpg_i$ on all other numerical variables, where $u_i$ is an error term for each observation, using Python's `statsmodels.api.OLS()` function, with indicator variables created for two out of the three `origin` categories (2=European, 3=Japanese).
\begin{equation*}
\begin{split}
mpg_i &= \beta_0 + \beta_1 cylinders_i + \beta_2 displacement_i + \beta_3 horsepower_i + ... \\
&\qquad \beta_4 weight_i + \beta_5 acceleration_i + \beta_6 year_i + ...\\
&\qquad \beta_7 european_i + \beta_8 japanese_i + u_i
\end{split}
\end{equation*}
* Which of the coefficients is statistically significant at the 1\% level?
* Which of the coefficients is NOT statistically significant at the 10\% level?
* Give an interpretation in words of the estimated coefficient $\hat{\beta}_6$ on $year_i$ using the estimated value of $\hat{\beta}_6$.
5. Looking at your scatterplot matrix from part (2), what are the three variables that look most likely to have a nonlinear relationship with $mpg_i$?
7. Looking at your scatterplot matrix from part (2), what are the three variables that look most likely to have a nonlinear relationship with $mpg_i$?
* Estimate a new multiple regression model by OLS in which you include squared terms on the three variables you identified as having a nonlinear relationship to $mpg_i$ as well as a squared term on $acceleration_i$.
* Report your adjusted R-squared statistic. Is it better or worse than the adjusted R-squared from part (4)?
* What happened to the statistical significance of the $displacement_i$ variable coefficient and the coefficient on its squared term?
* What happened to the statistical significance of the cylinders variable?
6. Using the regression model from part (5) and the `.predict()` function, what would be the predicted miles per gallon $mpg$ of a car with 6 cylinders, displacement of 200, horsepower of 100, a weight of 3,100, acceleration of 15.1, model year of 1999, and origin of 1?
8. Using the regression model from part (6) and the `.predict()` function, what would be the predicted miles per gallon $mpg$ of a car with 6 cylinders, displacement of 200, horsepower of 100, a weight of 3,100, acceleration of 15.1, model year of 1999, and origin of 1 (American)?
```{exercise-end}
```

Expand Down

0 comments on commit 70cf559

Please sign in to comment.