-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LAB-03 #5
Comments
That's a typo! Should have been 0.92, not 1.036. It has been updated. |
Thank you. @lecy |
Section 2.2 of Lecture Chapter. |
a.It is the same reason "2" > "100": > x <- sample( 1:10, 100, replace=T )
> x.f <- factor(x) # x is numeric
> levels( x.f )
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
> x <- as.character(x)
> x.f <- factor(x) # x is character
> levels( x.f )
[1] "1" "10" "2" "3" "4" "5" "6" "7" "8" "9" b.The dummy variables and the intercept still operate the same. So in section 2.1 the intercept for Company 1 is 56,569 (b0) and for Company 2 it is 56,569 + 35,794 (b0 +b1). |
@sunaynagoel If you want the companies to sort correctly you would need to use leading zeros :-) |
Thank you. |
Is it correct that the difference between |
Yes, that is correct. When you suppress the intercept (case b) then each group gets its own intercept. There is no longer a reference point - each coefficient represents a distinct group mean, and they all share the same slopes similar to the male female wage example above (note there are no interactions between dummies and slopes). It is fine to do this because we are not interpreting intercepts directly. They basically act as controls in this model. You would not want to do this if the groups represented hypotheses of interest. In the wage example above the female dummy tests the difference between male and female wages in the first job. In the diff-in-diff each group dummy represents a different test - test for pre-treatment differences between the treatment and control groups, test for trend (C2-C1), and test for whether post-treatment mean differs from the counterfactual. If you include distinct dummies for each group (d.treat.pre, d.treat.post, d.control.pre, & d.control.post) the model would report the group means but you would lose all of the tests of your hypotheses. Make sense? |
Yes it makes sense. Thank you @lecy |
@lecy This question is not directly related to CPP 525, but I was curious if the de-meaning process in the OLS model is similar to what we're doing in CPP 528 with finding the z-scores and centering the data from the census to make comparisons. Would that be another example of de-meaning? -Courtney |
@castower It's related to that, yes. By centering the data in panel models we shifting all of the distributions over to a common axis. In inferential terms this is to account for different initial conditions. Standardizing the data takes it one step further. We also divide by the standard deviation of each variable so that each variable now has a mean of zero and sd of one. If we are creating an index we care about the variance because we want one unit of item A to be similar to one unit of B . library( ggplot2 )
library( ggpubr )
library( dplyr )
data( iris )
iris <-
iris %>%
group_by( Species ) %>%
mutate( centered = Petal.Length - mean(Petal.Length),
standardized = ( Petal.Length - mean(Petal.Length) ) / sd(Petal.Length) )
p1 <- ggplot( iris ) +
geom_density( aes(x = Petal.Length, fill = Species),
alpha = 0.6 )
p2 <- ggplot( iris ) +
geom_density( aes(x = centered, fill = Species),
alpha = 0.6 )
p3 <- ggplot( iris ) +
geom_density( aes(x = standardized, fill = Species),
alpha = 0.6 )
ggarrange( p1, p2, p3,
labels = c("actual", "centered", "standardized"),
ncol = 1, nrow = 3 )
|
@lecy thank you! |
These are some examples of linear transformations of variables, which are useful in regression when you get into more advanced models. A linear transformation converts X to a new variable X2 by adding and multiplying by a constant: Y = mX + b centering: X2 = (1)(X) - mean(x) standardizing: X2 = (1/sd)(X) - mean(x) These transformations impact means and variances, and thus are important to pay attention to in the regression context for understanding how changing the scale of a measure can change inferences. Measurement error, for example, adds a new perturbation variable similar to the residual term in a regression that has a mean of zero, thus it increases variance but doesn't shift the mean: u = measurement error random variable X2 = X + u There are some examples starting on slide 9 here: https://github.com/DS4PS/cpp-523-spr-2020/raw/master/lectures/p-09-specification.pdf |
The section 2.1 of Lecture chapter, shows the statistical model of an OLS regression. When I read the interpretation of result, the coefficient does not make sense to me. I am not sure if I am reading the result wrong.
The text was updated successfully, but these errors were encountered: