Daron Acemoglu, Simon Johnson, and James A. Robinson. The colonial origins of comparative development: an empirical investigation. American Economic Review, 91(5):1369–1401, December 2001. URL: https://www.aeaweb.org/articles?id=10.1257/aer.91.5.1369.
+
[AC03]
Jérôme Adda and Russell Cooper. Dynamic Economics: Quantitative Methods and Applications. MIT Press, 2003.
Orazio Attanasio, Sarah Cattan, Emla Fitzsimons, Costas Meghir, and Marta Rubio-Codina. Estimating the production function for human capital: results from a randomized control trial in colombia. American Economic Review, 110(1):48–85, January 2020. URL: https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.20150183.
-
+
[BMM19]
Martha J. Bailey, Olga Malkova, and Zoë M. McLaren. Does access to family planning increase children's opportunities? evidence from the war on poverty and the early years of title x. Journal of Human Resources, 54(4):825–856, Fall 2019. URL: https://jhr.uwpress.org/content/54/4/825.
William A. Brock and Leonard J. Mirman. Optimal economic growth and uncertainty: the discounted case. Journal of Economic Theory, 4(3):479–513, June 1972.
-
+
[DM04]
Russell Davidson and James G. MacKinnon. Econometric Theory and Methods. Oxford University Press, 2004.
-
+
[DS93]
Darrell Duffie and Kenneth J. Singleton. Simulated moment estimation of markov models of asset prices. Econometrica, 61(4):929–952, July 1993.
-
+
[EP17]
Richard W. Evans and Kerk L. Phillips. Advantages of an ellipse when modeling leisure utility. Computational Economics, 51(3):513–533, March 2017.
GitHub. Octoverse 2022: the state of open source software. Report, GitHub, Inc., November 17 2022. URL: https://octoverse.github.com/.
-
+
[GvanRossum01]
David Goodger and Guido van Rossum. Pep 257–Docstring conventions. Python Enhancement Proposal 257, Python Steering Council, May 29 2001. URL: https://peps.python.org/pep-0257.
-
+
[GM96]
Christian Gourieroux and Alain Monfort. Simulation-based Econometric Methods. Oxford University Press, 1996.
-
+
[HJ20]
Jeffrey Humpherys and Tyler J. Jarvis. Foundations of Applied Mathematics: Algorithms, Approximation, Optimization. Volume 2. SIAM, Society for Industrial and Applied Mathematics, 2020.
-
+
[JWHT17]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer Texts in Statistics. Springer, 2017.
-
+
[Jud98]
Kenneth L. Judd. Numerical Methods in Economics. MIT Press, 1998.
-
+
[Kea10]
Michael P. Keane. Structural vs. atheoretic approaches to econometrics. Journal of Econometrics, 156(1):3–20, May 2010.
-
+
[LSalanie93]
G. Laroque and B. Salanié. Simulation based estimation models with lagged latent variables. Journal of Applied Econometrics, 8(Supplement):119–133, December 1993.
-
+
[LI91]
Bong-Soo Lee and Beth Fisher Ingram. Simulation estimation of time series models. Journal of Econometrics, 47(2-3):197–205, February 1991.
-
+
[McF89]
Daniel McFadden. A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica, 57(5):995–1026, September 1989.
Whitney K. Newey and Kenneth D. West. A simple, positive, semi-definite, heteroskedasticy and autocorrelation consistent covariance matrix. Econometrica, 55(3):703–708, May 1987.
-
+
[Rus87]
John Rust. Optimal replacement of gmc bus engines: an empirical model of harold zurcher. Econometrica, 55(5):999–1033, September 1987. URL: https://www.jstor.org/stable/1911259.
-
+
[Rus10]
John Rust. Comments on: `structural vs. atheoretic approaches to econometrics'. Journal of Econometrics, 156(1):21–24, May 2010.
-
+
+[SS23a]
+
Thomas J. Sargent and John Stachurski. Simple linear regression. In QuantEcon Python Lectures: A First Course in Quantitative Economics With Python, chapter 36. 2023. URL: https://intro.quantecon.org/simple_linear_regression.html.
+
+
+[SS23b]
+
Thomas J. Sargent and John Stachurski. Https://python.quantecon.org/ols.html. In QuantEcon Python Lectures: Intermediate Quantitative Economics with Python, chapter 79. 2023. URL: https://python.quantecon.org/ols.html.
+
+
[Smi20]
Anthony A. Jr. Smith. Indirect inference. In Matias Vernengo, Esteban Perez Caldentey, and Barkley J. Rosser Jr., editors, New Palgrave Dictionary of Economics. Palgrave MacMillan, 2020. URL: http://www.econ.yale.edu/smith/palgrave7.pdf.
BYU ACME. Animations and 3d plotting in matplotlib. In BYU ACME 2023-2024 Incoming Senior Materials, chapter 5. 2023. URL: https://acme.byu.edu/2023-2024-materials.
Guido van Rossum, Barry Warsaw, and Nick Coghlan. Pep 8–Style guide for python code. Python Enhancement Proposal 8, Python Steering Council, July 5 2001. URL: https://peps.python.org/pep-0008.
-
+
[WikipediaContributors20a]
Wikipedia Contributors. "Git". Wikipedia, The Free Encyclopedia, 2020. [Online; accessed 19-August-2020]. URL: https://en.wikipedia.org/wiki/Git.
GitHub or GitHub.com is a cloudsource code management service platform designed to enable scalable, efficient, and secure version controlled collaboration by linking localGit version controlled software development by users. GitHub’s main business footprint is hosting a collection of millions of version controlled code repositories. In addition to being a platform for distributed version control system (DVCS), GitHub’s primary features include code review, project management, continuous integrationunit testing, GitHub actions, and associated web page (GitHub pages) and documentation hosting and deployment.
Integrated development environment or IDE is a software application that comsolidates many of the functions of software development under one program. IDE’s often include a code editor, object memory and identification, debugger, and build automation tools. (See IDE Wikipedia entry[Wikipedia Contributors, 2020].)
Integrated development environment or IDE is a software application that comsolidates many of the functions of software development under one program. IDE’s often include a code editor, object memory and identification, debugger, and build automation tools. (See IDE Wikipedia entry[Wikipedia Contributors, 2020].)
The focus of this chapter is to give the reader a basic introduction to the standard empirical methods in data science, policy analysis, and economics. I want each reader to come away from this chapter with the following basic skills:
+
+
Difference between correlation and causation
+
Standard data description
+
Basic understanding of linear regression
+
+
What do regression coefficients mean?
+
What do standard errors mean?
+
How can I estimate my own linear regression with standard errors?
Ideas behind bigger extensions of linear regression
+
+
Instrumental variables (omitted variable bias)
+
Logistic regression
+
Multiple equation models
+
Panel data
+
Time series data
+
Vector autoregression
+
+
+
+
In the next chapter Basic Machine Learning, I give a more detailed treatment of logistic regression as a bridge to learning the basics of machine learning.
Any paper that uses data needs to spend some ink summarizing and describing the data. This is usually done in tables. But it can also be done in cross tabulation, which is descriptive statistics by category. The most common types of descriptive statistics are the following:
The research question of the paper “The Colonial Origins of Comparative Development: An Empirical Investigation” [Acemoglu et al., 2001] is to determine whether or not differences in institutions can help to explain observed economic outcomes. How do we measure institutional differences and economic outcomes? In this paper:
+
+
economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange rates,
+
institutional differences are proxied by an index of protection against expropriation on average over 1985-95, constructed by the Political Risk Serivces Group.
+
+
These variables and other data used in the paper are available for download on Daron Acemoglu’s webpage.
The following cells downloads the data from [Acemoglu et al., 2001] from the file maketable1.dta and displays the first five observations from the data.
The pandas.DataFrame.head method returns the first \(n\) forws of a DataFrame with column headings and index numbers. The default is n=5.
+
+
+
df1.head()
+
+
+
+
+
+
+
+
+
+
+
shortnam
+
euro1900
+
excolony
+
avexpr
+
logpgp95
+
cons1
+
cons90
+
democ00a
+
cons00a
+
extmort4
+
logem4
+
loghjypl
+
baseco
+
+
+
+
+
0
+
AFG
+
0.000000
+
1.0
+
NaN
+
NaN
+
1.0
+
2.0
+
1.0
+
1.0
+
93.699997
+
4.540098
+
NaN
+
NaN
+
+
+
1
+
AGO
+
8.000000
+
1.0
+
5.363636
+
7.770645
+
3.0
+
3.0
+
0.0
+
1.0
+
280.000000
+
5.634789
+
-3.411248
+
1.0
+
+
+
2
+
ARE
+
0.000000
+
1.0
+
7.181818
+
9.804219
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
+
+
3
+
ARG
+
60.000004
+
1.0
+
6.386364
+
9.133459
+
1.0
+
6.0
+
3.0
+
3.0
+
68.900002
+
4.232656
+
-0.872274
+
1.0
+
+
+
4
+
ARM
+
0.000000
+
0.0
+
NaN
+
7.682482
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
NaN
+
+
+
+
+
+
How many observations are in this dataset? What are the different countries in this dataset?
+
+
+
print("The number of observations (rows) in the dataset is:",df1.size)
+print("")
+print("A list of all the",len(df1["shortnam"].unique()),
+ 'unique countries in the "shortnam" variable is:')
+print(df1["shortnam"].unique())
+
For this problem, you will use the 397 observations from the Auto.csv dataset in the /data/basic_empirics/ folder of the repository for this book.[2] This dataset includes 397 observations on miles per gallon (mpg), number of cylinders (cylinders), engine displacement (displacement), horsepower (horsepower), vehicle weight (weight), acceleration (acceleration), vehicle year (year), vehicle origin (origin), and vehicle name (name).
+
+
Import the data using the pandas.read_csv() function. Look for characters that seem out of place that might indicate missing values. Replace them with missing values using the na_values=... option.
+
Produce a scatterplot matrix which includes all of the quantitative variables mpg, cylinders, displacement, horsepower, weight, acceleration, year, origin. Call your DataFrame of quantitative variables df_quant. [Use the pandas scatterplot function in the code block below.]
Compute the correlation matrix for the quantitative variables (\(8\times 8\)) using the pandas.DataFrame.corr() method.
+
Estimate the following multiple linear regression model of \(mpg_i\) on all other quantitative variables, where \(u_i\) is an error term for each observation, using Python’s statsmodels.api.OLS() function.
Which of the coefficients is statistically significant at the 1% level?
+
Which of the coefficients is NOT statistically significant at the 10% level?
+
Give an interpretation in words of the estimated coefficient \(\hat{\beta}_6\) on \(year_i\) using the estimated value of \(\hat{\beta}_6\).
+
+
+
Looking at your scatterplot matrix from part (2), what are the three variables that look most likely to have a nonlinear relationship with \(mpg_i\)?
+
+
Estimate a new multiple regression model by OLS in which you include squared terms on the three variables you identified as having a nonlinear relationship to \(mpg_i\) as well as a squared term on \(acceleration_i\).
+
Report your adjusted R-squared statistic. Is it better or worse than the adjusted R-squared from part (4)?
+
What happened to the statistical significance of the \(displacement_i\) variable coefficient and the coefficient on its squared term?
+
What happened to the statistical significance of the cylinders variable?
+
+
+
Using the regression model from part (5) and the .predict() function, what would be the predicted miles per gallon \(mpg\) of a car with 6 cylinders, displacement of 200, horsepower of 100, a weight of 3,100, acceleration of 15.1, model year of 1999, and origin of 1?
For this problem, you will use the 397 observations from the Auto.csv dataset in the /data/BasicEmpirMethods/ folder of the repository for this book.[1] This dataset includes 397 observations on miles per gallon (mpg), number of cylinders (cylinders), engine displacement (displacement), horsepower (horsepower), vehicle weight (weight), acceleration (acceleration), vehicle year (year), vehicle origin (origin), and vehicle name (name).
-
-
Import the data using the pandas.read_csv() function. Look for characters that seem out of place that might indicate missing values. Replace them with missing values using the na_values=... option.
-
Produce a scatterplot matrix which includes all of the quantitative variables mpg, cylinders, displacement, horsepower, weight, acceleration, year, origin. Call your DataFrame of quantitative variables df_quant. [Use the pandas scatterplot function in the code block below.]
Compute the correlation matrix for the quantitative variables (\(8\times 8\)) using the pandas.DataFrame.corr() method.
-
Estimate the following multiple linear regression model of \(mpg_i\) on all other quantitative variables, where \(u_i\) is an error term for each observation, using Python’s statsmodels.api.OLS() function.
Which of the coefficients is statistically significant at the 1% level?
-
Which of the coefficients is NOT statistically significant at the 10% level?
-
Give an interpretation in words of the estimated coefficient \(\hat{\beta}_6\) on \(year_i\) using the estimated value of \(\hat{\beta}_6\).
-
-
-
Looking at your scatterplot matrix from part (2), what are the three variables that look most likely to have a nonlinear relationship with \(mpg_i\)?
-
-
Estimate a new multiple regression model by OLS in which you include squared terms on the three variables you identified as having a nonlinear relationship to \(mpg_i\) as well as a squared term on \(acceleration_i\).
-
Report your adjusted R-squared statistic. Is it better or worse than the adjusted R-squared from part (4)?
-
What happened to the statistical significance of the \(displacement_i\) variable coefficient and the coefficient on its squared term?
-
What happened to the statistical significance of the cylinders variable?
-
-
-
Using the regression model from part (5) and the .predict() function, what would be the predicted miles per gallon \(mpg\) of a car with 6 cylinders, displacement of 200, horsepower of 100, a weight of 3,100, acceleration of 15.1, model year of 1999, and origin of 1?
Good documentation is critical to the ability of yourself and others to understand and disseminate your work and to allow others to reproduce it. As Eagleson’s Law of Programming implies in Observation 2 above, one of the biggest benefits of good documentation might be to the core maintainers and original code writers of a project. Despite the aspiration that the Python programming language be easy and intuitive enough to be its own documentation, we have often found than any not-well-documented code written by ourselves that is only a few months old is more likely to require a full rewrite rather than incremental additions and improvements.
Python scripts allow for two types of comments: inline comments (which are usually a line or two at a time) and docstrings, which are longer blocks set aside to document the source code. We further explore other more extensive types of documentation including README files, Jupyter notebooks, cloud notebooks, Jupyter Books, and published documentation forms.
“PEP 257–Docstring Conventions” differentiates between inline comments, which use the # symbol, and one-line docstrings, which use the """...""" format [Goodger and van Rossum, 2001]. An block of code with inline comments might look like:
+
“PEP 257–Docstring Conventions” differentiates between inline comments, which use the # symbol, and one-line docstrings, which use the """...""" format [Goodger and van Rossum, 2001]. An block of code with inline comments might look like:
# importsimportnumpyasnp
@@ -497,7 +497,7 @@
Contents
A few notes on this documentation of the FOC_savings function. First, see that the docstring starts of with a clear description of what the function does. Second, you can see the :math tags that allow you to write LaTeX equations that will be rendered in the documentation. Docstrings written using reStructuredText markup can be compiled through various packages to render equations and other formatting options. Third, the Args and Returns sections are used to document the arguments and return values of the function.
-
“PEP 257–Docstring Conventions” give suggested format and usage for docstrings in Python [Goodger and van Rossum, 2001]. And there are two main styles for writing docstrings, the [Google style]*(https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) and the NumPy style. While there are other ways to write docstrings (even those that meet PEP 257 standards), these two styles are so commonly used and are compatible with the Sphinx documentation package that we recommend using one of these two styles. OG-Core used the Google style, so you might adopt that to be consistent.
+
“PEP 257–Docstring Conventions” give suggested format and usage for docstrings in Python [Goodger and van Rossum, 2001]. And there are two main styles for writing docstrings, the [Google style]*(https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) and the NumPy style. While there are other ways to write docstrings (even those that meet PEP 257 standards), these two styles are so commonly used and are compatible with the Sphinx documentation package that we recommend using one of these two styles. OG-Core used the Google style, so you might adopt that to be consistent.
This chapter was coauthored by Jason DeBacker and Richard W. Evans.
Python stops the computation process when it encounters an error. Sometimes you want to describe certain errors with descriptive error messages. And sometimes you want your code to move past errors while saving them and including descriptive error messages. Other times, you want to ensure that no errors occur and that your program stops and informs you in the case of an error.
Python’s error handling, assertion functionality, traceback capability, and type hinting are powerful methods to make sure your code does what you expect it to do, breaks when you expect it to break, and moves past issues when you don’t want the computation to stop.
-
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Exceptions and File Input/Output”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 8 below has you work through the problems in this BYU ACME lab. The Python file (exceptions_fileIO.py) and associated text files (.txt) associated with this lab are stored in the ./code/Exceptions_FileIO/ directory.
+
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Exceptions and File Input/Output”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 8 below has you work through the problems in this BYU ACME lab. The Python file (exceptions_fileIO.py) and associated text files (.txt) associated with this lab are stored in the ./code/Exceptions_FileIO/ directory.
This chapter was coauthored by Jason DeBacker and Richard W. Evans.
Matplotlib is Python’s most widely used and most basic visualization package.[1] Some of the other most popular Python visualization packages Bokeh, Plotly, and Seaborn. Of these, Matplotlib is the most general for static images and is what is used on OG-Core. Once you have a general idea of how to create plots in Python, that knowlege will generalize (to varying degrees) to the other plotting packages.
-
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Introduction to Matplotlib”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 37 below has you work through the problems in this BYU ACME lab. A Python file template (matplotlib_intro.py) and a data file (FARS.npy) used in the lab are stored in the ./code/Matplotlib1/ directory.
+
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Introduction to Matplotlib”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 37 below has you work through the problems in this BYU ACME lab. A Python file template (matplotlib_intro.py) and a data file (FARS.npy) used in the lab are stored in the ./code/Matplotlib1/ directory.
-
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Pandas 2: Plotting”. In spite of having “Pandas” in the title, we include this lab here in this Matplotlib chapter because all of the plotting uses the Matplotlib package. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 38 below has you work through the problems in this BYU ACME lab. A Jupyter notebook file template (matplotlib2.ipynb) used in the lab is stored in the ./code/Matplotlib2/ directory. The budget.csv and crime_data.csv data files are stored in the ./data/Pandas1 directory, which were used in the “Pandas 1: Introduction” lab. And the other data file used in this lab college.csv is stored in the ./data/Pandas3 directory, which was used in the “Pandas 3: Grouping” lab.
+
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Pandas 2: Plotting”. In spite of having “Pandas” in the title, we include this lab here in this Matplotlib chapter because all of the plotting uses the Matplotlib package. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 38 below has you work through the problems in this BYU ACME lab. A Jupyter notebook file template (matplotlib2.ipynb) used in the lab is stored in the ./code/Matplotlib2/ directory. The budget.csv and crime_data.csv data files are stored in the ./data/Pandas1 directory, which were used in the “Pandas 1: Introduction” lab. And the other data file used in this lab college.csv is stored in the ./data/Pandas3 directory, which was used in the “Pandas 3: Grouping” lab.
Many systems of equations that occur in theoretical models are exactly identified as defined in Definition 2. The easiest way to solve exactly identified systems of linear equations—systems in which each function \(f_i(x)\) is a linear function of the vector of \(x\) variables—is often matrix inversion. When the system of equations \(f(x)\) is nonlinear, finding the solution takes more advanced methods.[2]
It is not widely recognized among researchers that even in systems of equations for which existence and uniqueness of the solution can be proven, no root finding algorithm with finite computational power exists that guarantees convergence to the solution of the system in every case. Judd states:
-
Nonlinear equation solving presents problems not present with linear equations or optimization. In particular, the existence problem is much more difficult for nonlinear systems. Unless one has an existence proof in had, a programmer must keep in mind that the absence of a solutino may explain a program’s failure to converge. Ever if there exists a solution, all methods will do poorly if the problem is poorly conditioned near a solution. Transforming the problem will often improve performance.[Judd, 1998] (p. 192)
+
Nonlinear equation solving presents problems not present with linear equations or optimization. In particular, the existence problem is much more difficult for nonlinear systems. Unless one has an existence proof in had, a programmer must keep in mind that the absence of a solutino may explain a program’s failure to converge. Ever if there exists a solution, all methods will do poorly if the problem is poorly conditioned near a solution. Transforming the problem will often improve performance.[Judd, 1998] (p. 192)
Because root finding in nonlinear systems can be so difficult, much research into the best methods has accumulated over the years. And the approaches to solving nonlinear systems can be an art as much as a science. This is also true of minimization problems discussed in the next section (Minimization). For this reason, the scipy.optimize.root module has many different solution algorithms you can use to find the solution to a nonlinear system of equations (e.g., hybr, lm, linearmixing).
All of the root finder methods in scipy.optimize.root are iterative. They take an initial guess for the solution for the variable vector \(x_i\), evaluate the functions \(f(x_i)\) in (4) at \(x_i\), and guess a new value for the solution vector \(x_{i+1}\) until the errors on the left-hand-side of the functions in (4) get arbitrarily close to zero.
The solution for (x, y) is: [0.84630378 2.33101497]
-The error values for eq1 and eq2 at the solution are: [-1.2390089e-13 -1.0125234e-13]
+The error values for eq1 and eq2 at the solution are: [-1.23456800e-13 -1.01696429e-13]
Minimization problems are a more general type of problem than root finding problems. Any root finding problem can be reformulated as a minimization problem. But it is not the case that any minimization problem can be reformulated as a root finding problem. Furthermore, if a minimization problem can be reformulated as a root finding problem, it is often much faster to compute the root finding problem. But the minimization problem allows for more generality and often more robustness.
-
Exercise 46 has you compute the solution to a problem using minimization and root finding, respectively, and to compare the corresponding computation times. One of our favorite books and resources on the mathematics behind minimization problems is [Humpherys and Jarvis, 2020] (section IV, pp.519-760).
+
Exercise 46 has you compute the solution to a problem using minimization and root finding, respectively, and to compare the corresponding computation times. One of our favorite books and resources on the mathematics behind minimization problems is [Humpherys and Jarvis, 2020] (section IV, pp.519-760).
[Brock and Mirman, 1972] is a simple two-period-lived overlapping generations model, the stochastic equilibrium of which is characterized by six dynamic equations (equations in which the variables are changing over time). The deterministic steady-state of the model is characterized by the variables reaching constant values that do not change over time. The deterministic steady state of the [Brock and Mirman, 1972] is characterized by the following five equations and five unknown variables \((c, k, y, w, r)\),
+
[Brock and Mirman, 1972] is a simple two-period-lived overlapping generations model, the stochastic equilibrium of which is characterized by six dynamic equations (equations in which the variables are changing over time). The deterministic steady-state of the model is characterized by the variables reaching constant values that do not change over time. The deterministic steady state of the [Brock and Mirman, 1972] is characterized by the following five equations and five unknown variables \((c, k, y, w, r)\),
This chapter was coauthored by Jason DeBacker and Richard W. Evans.
The standard library of Python is all the built-in functions of the programming language as well as the modules included with the most common Python distributions. The Python online documentation has an excellent page describing the standard library. These functionalities include built-in functions, constants, and object types, and data types. We recommend that you read these sections in the Python documentation.
-
In addition, the iframe below contains a PDF of the BYU ACME open-access lab entitled, “The Standard Library”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 3 below has you work through the problems in this BYU ACME lab. The two Python files used in this lab are stored in the ./code/StandardLibrary/ directory.
+
In addition, the iframe below contains a PDF of the BYU ACME open-access lab entitled, “The Standard Library”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 3 below has you work through the problems in this BYU ACME lab. The two Python files used in this lab are stored in the ./code/StandardLibrary/ directory.
Testing of your source code is important to ensure that the results of your code are accurate and to cut down on debugging time. Fortunately, Python has a nice suite of tools for unit testing. In this section, we will introduce the pytest package and show how to use it to test your code.
-
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Unit Testing”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 50 below has you work through the problems in this BYU ACME lab. Two Python scripts (specs.py and test_specs.py) used in the lab are stored in the ./code/UnitTest/ directory.
+
The iframe below contains a PDF of the BYU ACME open-access lab entitled, “Unit Testing”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 50 below has you work through the problems in this BYU ACME lab. Two Python scripts (specs.py and test_specs.py) used in the lab are stored in the ./code/UnitTest/ directory.
This chapter was coauthored by Jason DeBacker and Richard W. Evans.
-
Many models are written in the Python programming language. Python is the 2nd most widely used language on all GitHub repository projects [GitHub, 2022], and Python is the 1st most used programming language according to the PYPL ranking of September 2023 [Stackscale, 2023].
+
Many models are written in the Python programming language. Python is the 2nd most widely used language on all GitHub repository projects [GitHub, 2022], and Python is the 1st most used programming language according to the PYPL ranking of September 2023 [Stackscale, 2023].
As these tutorials walk you through the basics of Python, they will leverage some excellent open source materials put together by QuantEcon and the Applied and Computational Mathematics Emphasis at BYU (BYU ACME). And while the tutorials will point you to those of these other organizations, we have customized all our excercises to be relevant to the work and research of economists.
In addition, GitHub Copilot is an amazing resource and can be added as an extension to VS Code. However, this service is not free of charge and does require an internet connection to work.
-
In the iframe below is a PDF of the BYU ACME open-access lab entitled, “Python Intro”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 1 below has you work through the problems in this BYU ACME lab. The Python code file (python_intro.py) used in the lab is stored in the ./code/PythonIntro/ directory.
+
In the iframe below is a PDF of the BYU ACME open-access lab entitled, “Python Intro”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 1 below has you work through the problems in this BYU ACME lab. The Python code file (python_intro.py) used in the lab is stored in the ./code/PythonIntro/ directory.
Unix is an old operating system that is the basis for the Linux and Mac operating systems. Many Python users with Mac or Linux operating systems follow a workflow that includes working in the terminal and using Unix commands. This section is optional because Windows terminals do not have the same Unix commands. For those interested, feel free to work through the Unix lab below from BYU ACME. This lab features great examples and instruction, and also has seven good exercises for you to practice on.
-
In the iframe below is a PDF of the BYU ACME open-access lab entitled, “Unix Shell 1: Introduction”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 2 below has you work through the problems in this BYU ACME lab. The shell script file (unixshell1.sh) used in the lab, along with the associated zip file (Shell1.zip), are stored in the ./code/UnixShell1/ directory.
+
In the iframe below is a PDF of the BYU ACME open-access lab entitled, “Unix Shell 1: Introduction”. You can either scroll through the lab on this page using the iframe window, or you can download the PDF for use on your computer. See [BYU ACME, 2023]. Exercise 2 below has you work through the problems in this BYU ACME lab. The shell script file (unixshell1.sh) used in the lab, along with the associated zip file (Shell1.zip), are stored in the ./code/UnixShell1/ directory.
This chapter describes the simulated method of moments (SMM) estimation method. All data and images from this chapter can be found in the data directory (./data/smm/) and images directory (./images/smm/) for the GitHub repository for this online book.
Simulated method of moments (SMM) is analogous to the generalized method of moments (GMM) estimator. SMM could really be thought of as a particular type of GMM estimator. The SMM estimator chooses a vector of model parameters \(\theta\) to make simulated model moments match data moments. Seminal papers developing SMM are [McFadden, 1989], [Lee and Ingram, 1991], and [Duffie and Singleton, 1993]. Good textbook treatments of SMM are found in [Adda and Cooper, 2003], (pp. 87-100) and [Davidson and MacKinnon, 2004], (pp. 383-394).
+
Simulated method of moments (SMM) is analogous to the generalized method of moments (GMM) estimator. SMM could really be thought of as a particular type of GMM estimator. The SMM estimator chooses a vector of model parameters \(\theta\) to make simulated model moments match data moments. Seminal papers developing SMM are [McFadden, 1989], [Lee and Ingram, 1991], and [Duffie and Singleton, 1993]. Good textbook treatments of SMM are found in [Adda and Cooper, 2003], (pp. 87-100) and [Davidson and MacKinnon, 2004], (pp. 383-394).
Let the data be represented, in general, by \(x\). This could have many variables, and it could be cross-sectional or time series. We define the estimation problem as one in which we want to model the data \(x\) using some parameterized model \(g(x|\theta)\) in which \(\theta\) is a \(K\times 1\) vector of parameters.
The following difficulties can arise with GMM making it not possible or very difficult.
The model moment function \(m(x|\theta)\) is not known analytically.
-
The data moments you are trying to match come from another model (indirect inference, see [Smith, 2020]).
-
The model moments \(m(x|\theta)\) are derived from latent variables that are not observed by the modeler. You only have moments, not the underlying data. See [Laroque and Salanié, 1993].
+
The data moments you are trying to match come from another model (indirect inference, see [Smith, 2020]).
+
The model moments \(m(x|\theta)\) are derived from latent variables that are not observed by the modeler. You only have moments, not the underlying data. See [Laroque and Salanié, 1993].
The model moments \(m(x|\theta)\) are derived from censored variables that are only partially observed by the modeler.
-
The model moments \(m(x|\theta)\) are just difficult to derive analytically. Examples include moments that include multiple integrals over nonlinear functions as in [McFadden, 1989].
+
The model moments \(m(x|\theta)\) are just difficult to derive analytically. Examples include moments that include multiple integrals over nonlinear functions as in [McFadden, 1989].
SMM estimation is simply to simulate the model data \(S\) times, and use the average values of the moments from the simulated data as the estimator for the model moments. Let \(\tilde{x}\equiv\{\tilde{x}_1,\tilde{x}_2,...\tilde{x}_s,...\tilde{x}_S\}\) be the \(S\) simulations of the model data. And let the maximization problem in (14) be characterized by \(R\) average moments across simulations, where \(\hat{m}_r\) is the average value of the \(r\)th moment across the \(S\) simulations where,
@@ -574,7 +574,7 @@
Contents
Newey-West consistent estimator of \(\Omega\) and W#
-
The Newey-West estimator of the optimal weighting matrix and variance covariance matrix is consistent in the presence of heteroskedasticity and autocorrelation in the data (See [Newey and West, 1987]). [Adda and Cooper, 2003] (p. 82) have a nice exposition of how to compute the Newey-West weighting matrix \(\hat{W}_{nw}\). The asymptotic representation of the optimal weighting matrix \(\hat{W}^{opt}\) is the following:
+
The Newey-West estimator of the optimal weighting matrix and variance covariance matrix is consistent in the presence of heteroskedasticity and autocorrelation in the data (See [Newey and West, 1987]). [Adda and Cooper, 2003] (p. 82) have a nice exposition of how to compute the Newey-West weighting matrix \(\hat{W}_{nw}\). The asymptotic representation of the optimal weighting matrix \(\hat{W}^{opt}\) is the following:
Indirect inference is a particular application of SMM with some specific characteristics. As moments to match it uses parameters of an auxiliary model that can be estimated both on the real-world data and on the simulated data. [Smith, 2020] gives a great summary of the topic with some examples. See also [Gourieroux and Monfort, 1996] (ch. 4) for a textbook treatment of the topic.
+
Indirect inference is a particular application of SMM with some specific characteristics. As moments to match it uses parameters of an auxiliary model that can be estimated both on the real-world data and on the simulated data. [Smith, 2020] gives a great summary of the topic with some examples. See also [Gourieroux and Monfort, 1996] (ch. 4) for a textbook treatment of the topic.
Restatement of the general SMM estimation problem#
Define a model or data generating process (DGP) as a system of equations,
@@ -2328,7 +2328,7 @@
Contents
Exercise 57 (Estimating the Brock and Mirman (1972) model by SMM)
-
You can observe time series data in an economy for the following variables: \((c_t, k_t, w_t, r_t, y_t)\). The data can be loaded from the file NewMacroSeries.txt in the online book repository data folder data/smm/. This file is a comma separated text file with no labels. The variables are ordered as \((c_t, k_t, w_t, r_t, y_t)\). These data have 100 periods, which are quarterly (25 years). Suppose you think that the data are generated by a process similar to the [Brock and Mirman, 1972] paper. A simplified set of characterizing equations of the Brock and Mirman model are the following six equations.
+
You can observe time series data in an economy for the following variables: \((c_t, k_t, w_t, r_t, y_t)\). The data can be loaded from the file NewMacroSeries.txt in the online book repository data folder data/smm/. This file is a comma separated text file with no labels. The variables are ordered as \((c_t, k_t, w_t, r_t, y_t)\). These data have 100 periods, which are quarterly (25 years). Suppose you think that the data are generated by a process similar to the [Brock and Mirman, 1972] paper. A simplified set of characterizing equations of the Brock and Mirman model are the following six equations.
Assume that the first observation in the data file variables is \(t=1\). Let \(k_1\) be the first observation in the data fil for the variable \(k_t\). One nice property of the [Brock and Mirman, 1972] model is that the household decision has a known analytical solution in which the optimal savings decision \(k_{t+1}\) is a function of the productivity shock today \(z_t\) and the amount of capital today \(k_t\).
+
Assume that the first observation in the data file variables is \(t=1\). Let \(k_1\) be the first observation in the data fil for the variable \(k_t\). One nice property of the [Brock and Mirman, 1972] model is that the household decision has a known analytical solution in which the optimal savings decision \(k_{t+1}\) is a function of the productivity shock today \(z_t\) and the amount of capital today \(k_t\).
With this solution (44) and equations (39) through (42), it is straightforward to simulate the data of the [Brock and Mirman, 1972] model given parameters \((\alpha, \beta, \rho, \mu, \sigma)\).
+
With this solution (44) and equations (39) through (42), it is straightforward to simulate the data of the [Brock and Mirman, 1972] model given parameters \((\alpha, \beta, \rho, \mu, \sigma)\).
First, assume that \(z_0=\mu\) and that \(k_1=\text{mean}(k_t)\) from the data. These are initial values that will not change across simulations. Also assume that \(\beta=0.99\).
Next, draw a matrix of \(S=1,000\) simulations (columns) of \(T=100\) (rows) from a uniform distribution \(u_{s,t}\sim U(0,1)\). These draws will not change across this SMM estimation procedure.
For each guess of the parameter vector \((\alpha,\rho,\mu,\sigma)\) given \(\beta=0.99\), you can use \(u_{s,t}\) to generate normally distributed errors \(\varepsilon_{s,t}\sim N(0,\sigma^2)\) using the inverse cdf of the normal distribution, where \(s\) is the index of the simulation number (columns).
@@ -2360,7 +2360,7 @@
Contents
With \(w_{s,t}\), \(r_{s,t}\), and \(k_{s,t}\), you can use (39) to generate simulated values for \(c_{s,t}\).
With \(\alpha\), \(z_{s,t}\), and \(k_{s,t}\), you can use (43) to generate simulated values for \(y_{s,t}\).
-
Estimate four parameters \((\alpha, \rho,\mu,\sigma)\) given \(\beta=0.99\) of the [Brock and Mirman, 1972] model described by equations (38) through (43) and (44) by SMM. Choose the four parameters to match the following six moments from the 100 periods of empirical data \(\{c_t,k_t, w_t, r_t, y_t\}_{t=1}^{100}\) in NewMacroSeries.txt: \(\text{mean}(c_t)\), \(\text{mean}(k_t)\), \(\text{mean}(c_t/y_t)\), \(\text{var}(y_t)\), \(\text{corr}(c_t, c_{t-1})\), \(\text{corr}(c_t, k_t)\). In your simulations of the model, set \(T=100\) and \(S=1,000\). Input the bounds to be \(\alpha\in[0.01,0.99]\), \(\rho\in[-0.99,0.99]\), \(\mu\in[5, 14]\), and \(\sigma\in[0.01, 1.1]\).
+
Estimate four parameters \((\alpha, \rho,\mu,\sigma)\) given \(\beta=0.99\) of the [Brock and Mirman, 1972] model described by equations (38) through (43) and (44) by SMM. Choose the four parameters to match the following six moments from the 100 periods of empirical data \(\{c_t,k_t, w_t, r_t, y_t\}_{t=1}^{100}\) in NewMacroSeries.txt: \(\text{mean}(c_t)\), \(\text{mean}(k_t)\), \(\text{mean}(c_t/y_t)\), \(\text{var}(y_t)\), \(\text{corr}(c_t, c_{t-1})\), \(\text{corr}(c_t, k_t)\). In your simulations of the model, set \(T=100\) and \(S=1,000\). Input the bounds to be \(\alpha\in[0.01,0.99]\), \(\rho\in[-0.99,0.99]\), \(\mu\in[5, 14]\), and \(\sigma\in[0.01, 1.1]\).
Also, use the identity matrix as your weighting matrix \(\textbf{W}=\textbf{I}\) as shown in section The identity matrix (W=I). Report your solution \(\hat{\theta} = \left(\hat{\alpha},\hat{\rho},\hat{\mu},\hat{\sigma}\right)\), the vector of moment differences at the optimum, and the criterion function value. Also report your standard errors for the estimated parameter vector \(\hat{\theta} = \left(\hat{\alpha},\hat{\rho},\hat{\mu},\hat{\sigma}\right)\) based on the identity matrix for the optimal weighting matrix.
Perform the estimation using the two-step estimator for the optimal weighting matrix \(\textbf{W}_{2step}\), as shown in section Two-step variance-covariance estimator of W. Report your solution \(\hat{\theta} = \left(\hat{\alpha},\hat{\rho},\hat{\mu},\hat{\sigma}\right)\), the vector of moment differences at the optimum, and the criterion function value. Also report your standard errors for the estimated parameter vector \(\hat{\theta} = \left(\hat{\alpha},\hat{\rho},\hat{\mu},\hat{\sigma}\right)\) based on the two-step optimal weighting matrix \(\textbf{W}_{2step}\).
A good introduction to structural estimation is to compare it to other types of research designs. Exercise 56 asks you to compare the structural approach to the reduced form approach. The following are some prominent research designs in economics, only some of which are structural estimations.
Write a short persuasive paper of about one page (maximum of 1.5 pages) in which you make your case for either structural estimation or reduced form estimation or both. Note that both Keane and Rust are biased toward structural estimation.
Make sure that you cite arguments that they use as evidence for or against your thesis.
Refute (or temper) at least one of their arguments.