Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LAB-03 #5

Open
sunaynagoel opened this issue Apr 2, 2020 · 14 comments
Open

LAB-03 #5

sunaynagoel opened this issue Apr 2, 2020 · 14 comments

Comments

@sunaynagoel
Copy link

The section 2.1 of Lecture chapter, shows the statistical model of an OLS regression. When I read the interpretation of result, the coefficient does not make sense to me. I am not sure if I am reading the result wrong.

Screen Shot 2020-04-02 at 12 17 19 PM

Screen Shot 2020-04-02 at 12 17 34 PM

@lecy
Copy link
Contributor

lecy commented Apr 2, 2020

That's a typo! Should have been 0.92, not 1.036.

It has been updated.

@sunaynagoel
Copy link
Author

That's a typo! Should have been 0.92, not 1.036.

It has been updated.

Thank you. @lecy

@sunaynagoel
Copy link
Author

Section 2.2 of Lecture Chapter.
I have a few questions.
a. While calculating Fixed effect models (OLS with dummy and Panel FE), is there a reason that company 10 appears right after Public R&D?
b. What is the relationship between the OLS with dummy coefficient and intercepts of Panel FE?
~nina

@lecy
Copy link
Contributor

lecy commented Apr 3, 2020

a.

It is the same reason "2" > "100":

> x <- sample( 1:10, 100, replace=T )
> x.f <- factor(x)  # x is numeric
> levels( x.f )
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
> x <- as.character(x)
> x.f <- factor(x)  # x is character
> levels( x.f )
 [1] "1"  "10" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"

b.

The dummy variables and the intercept still operate the same.

image

So in section 2.1 the intercept for Company 1 is 56,569 (b0) and for Company 2 it is 56,569 + 35,794 (b0 +b1).

image

@lecy
Copy link
Contributor

lecy commented Apr 3, 2020

@sunaynagoel If you want the companies to sort correctly you would need to use leading zeros :-)

@sunaynagoel
Copy link
Author

a.

It is the same reason "2" > "100":

> x <- sample( 1:10, 100, replace=T )
> x.f <- factor(x)  # x is numeric
> levels( x.f )
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
> x <- as.character(x)
> x.f <- factor(x)  # x is character
> levels( x.f )
 [1] "1"  "10" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"

b.

The dummy variables and the intercept still operate the same.

image

So in section 2.1 the intercept for Company 1 is 56,569 (b0) and for Company 2 it is 56,569 + 35,794 (b0 +b1).

image

Thank you.

@sunaynagoel
Copy link
Author

Is it correct that the difference between
a. lm (y ~ x + factor (group)) and
b. lm(y ~x +factor(group)-1) is only in terms of intercept?
In Model "a" constant is also the intercept for group 1 and model "b" generates coefficient for all the groups (including group 1).
If that is the case, then what happens to the dummy variable in Model "b" ? and what is the relationship between constant and each individual group's coefficient?

@lecy
Copy link
Contributor

lecy commented Apr 3, 2020

Yes, that is correct.

When you suppress the intercept (case b) then each group gets its own intercept. There is no longer a reference point - each coefficient represents a distinct group mean, and they all share the same slopes similar to the male female wage example above (note there are no interactions between dummies and slopes).

It is fine to do this because we are not interpreting intercepts directly. They basically act as controls in this model.

You would not want to do this if the groups represented hypotheses of interest. In the wage example above the female dummy tests the difference between male and female wages in the first job. In the diff-in-diff each group dummy represents a different test - test for pre-treatment differences between the treatment and control groups, test for trend (C2-C1), and test for whether post-treatment mean differs from the counterfactual.

If you include distinct dummies for each group (d.treat.pre, d.treat.post, d.control.pre, & d.control.post) the model would report the group means but you would lose all of the tests of your hypotheses.

Make sense?

@lecy
Copy link
Contributor

lecy commented Apr 3, 2020

image

image

@sunaynagoel
Copy link
Author

Yes, that is correct.

When you suppress the intercept (case b) then each group gets its own intercept. There is no longer a reference point - each coefficient represents a distinct group mean, and they all share the same slopes similar to the male female wage example above (note there are no interactions between dummies and slopes).

It is fine to do this because we are not interpreting intercepts directly. They basically act as controls in this model.

You would not want to do this if the groups represented hypotheses of interest. In the wage example above the female dummy tests the difference between male and female wages in the first job. In the diff-in-diff each group dummy represents a different test - test for pre-treatment differences between the treatment and control groups, test for trend (C2-C1), and test for whether post-treatment mean differs from the counterfactual.

If you include distinct dummies for each group (d.treat.pre, d.treat.post, d.control.pre, & d.control.post) the model would report the group means but you would lose all of the tests of your hypotheses.

Make sense?

Yes it makes sense. Thank you @lecy

@castower
Copy link

castower commented Apr 6, 2020

@lecy This question is not directly related to CPP 525, but I was curious if the de-meaning process in the OLS model is similar to what we're doing in CPP 528 with finding the z-scores and centering the data from the census to make comparisons. Would that be another example of de-meaning?

-Courtney

@lecy
Copy link
Contributor

lecy commented Apr 6, 2020

@castower It's related to that, yes.

By centering the data in panel models we shifting all of the distributions over to a common axis. In inferential terms this is to account for different initial conditions.

Standardizing the data takes it one step further. We also divide by the standard deviation of each variable so that each variable now has a mean of zero and sd of one. If we are creating an index we care about the variance because we want one unit of item A to be similar to one unit of B .

image

library( ggplot2 )
library( ggpubr )
library( dplyr )

data( iris )

iris <-
  iris %>%
  group_by( Species ) %>%
  mutate( centered = Petal.Length - mean(Petal.Length),
          standardized = ( Petal.Length - mean(Petal.Length) ) / sd(Petal.Length) )

p1 <- ggplot( iris ) +
      geom_density( aes(x = Petal.Length, fill = Species),
               alpha = 0.6 )

p2 <- ggplot( iris ) +
      geom_density( aes(x = centered, fill = Species),
               alpha = 0.6 )

p3 <- ggplot( iris ) +
      geom_density( aes(x = standardized, fill = Species),
               alpha = 0.6 )

ggarrange( p1, p2, p3, 
          labels = c("actual", "centered", "standardized"),
          ncol = 1, nrow = 3 )

@castower
Copy link

castower commented Apr 6, 2020

@lecy thank you!

@lecy
Copy link
Contributor

lecy commented Apr 6, 2020

These are some examples of linear transformations of variables, which are useful in regression when you get into more advanced models.

A linear transformation converts X to a new variable X2 by adding and multiplying by a constant:

Y = mX + b

centering: X2 = (1)(X) - mean(x)

standardizing: X2 = (1/sd)(X) - mean(x)

These transformations impact means and variances, and thus are important to pay attention to in the regression context for understanding how changing the scale of a measure can change inferences.

Measurement error, for example, adds a new perturbation variable similar to the residual term in a regression that has a mean of zero, thus it increases variance but doesn't shift the mean:

u = measurement error random variable

X2 = X + u

There are some examples starting on slide 9 here:

https://github.com/DS4PS/cpp-523-spr-2020/raw/master/lectures/p-09-specification.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants