LAB-03 #5

sunaynagoel · 2020-04-02T19:20:21Z

The section 2.1 of Lecture chapter, shows the statistical model of an OLS regression. When I read the interpretation of result, the coefficient does not make sense to me. I am not sure if I am reading the result wrong.

lecy · 2020-04-02T21:21:08Z

That's a typo! Should have been 0.92, not 1.036.

It has been updated.

sunaynagoel · 2020-04-02T22:00:24Z

That's a typo! Should have been 0.92, not 1.036.

It has been updated.

Thank you. @lecy

sunaynagoel · 2020-04-02T22:41:54Z

Section 2.2 of Lecture Chapter.
I have a few questions.
a. While calculating Fixed effect models (OLS with dummy and Panel FE), is there a reason that company 10 appears right after Public R&D?
b. What is the relationship between the OLS with dummy coefficient and intercepts of Panel FE?
~nina

lecy · 2020-04-03T00:37:19Z

a.

It is the same reason "2" > "100":

> x <- sample( 1:10, 100, replace=T )
> x.f <- factor(x)  # x is numeric
> levels( x.f )
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
> x <- as.character(x)
> x.f <- factor(x)  # x is character
> levels( x.f )
 [1] "1"  "10" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"

b.

The dummy variables and the intercept still operate the same.

So in section 2.1 the intercept for Company 1 is 56,569 (b0) and for Company 2 it is 56,569 + 35,794 (b0 +b1).

lecy · 2020-04-03T00:42:19Z

@sunaynagoel If you want the companies to sort correctly you would need to use leading zeros :-)

sunaynagoel · 2020-04-03T03:46:39Z

a.

It is the same reason "2" > "100":
> x <- sample( 1:10, 100, replace=T )
> x.f <- factor(x)  # x is numeric
> levels( x.f )
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
> x <- as.character(x)
> x.f <- factor(x)  # x is character
> levels( x.f )
 [1] "1"  "10" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"
b.

The dummy variables and the intercept still operate the same.

So in section 2.1 the intercept for Company 1 is 56,569 (b0) and for Company 2 it is 56,569 + 35,794 (b0 +b1).

Thank you.

sunaynagoel · 2020-04-03T04:00:22Z

Is it correct that the difference between
a. lm (y ~ x + factor (group)) and
b. lm(y ~x +factor(group)-1) is only in terms of intercept?
In Model "a" constant is also the intercept for group 1 and model "b" generates coefficient for all the groups (including group 1).
If that is the case, then what happens to the dummy variable in Model "b" ? and what is the relationship between constant and each individual group's coefficient?

lecy · 2020-04-03T05:04:29Z

Yes, that is correct.

When you suppress the intercept (case b) then each group gets its own intercept. There is no longer a reference point - each coefficient represents a distinct group mean, and they all share the same slopes similar to the male female wage example above (note there are no interactions between dummies and slopes).

It is fine to do this because we are not interpreting intercepts directly. They basically act as controls in this model.

You would not want to do this if the groups represented hypotheses of interest. In the wage example above the female dummy tests the difference between male and female wages in the first job. In the diff-in-diff each group dummy represents a different test - test for pre-treatment differences between the treatment and control groups, test for trend (C2-C1), and test for whether post-treatment mean differs from the counterfactual.

If you include distinct dummies for each group (d.treat.pre, d.treat.post, d.control.pre, & d.control.post) the model would report the group means but you would lose all of the tests of your hypotheses.

Make sense?

lecy · 2020-04-03T05:06:11Z

sunaynagoel · 2020-04-03T14:59:25Z

Yes, that is correct.

When you suppress the intercept (case b) then each group gets its own intercept. There is no longer a reference point - each coefficient represents a distinct group mean, and they all share the same slopes similar to the male female wage example above (note there are no interactions between dummies and slopes).

It is fine to do this because we are not interpreting intercepts directly. They basically act as controls in this model.

You would not want to do this if the groups represented hypotheses of interest. In the wage example above the female dummy tests the difference between male and female wages in the first job. In the diff-in-diff each group dummy represents a different test - test for pre-treatment differences between the treatment and control groups, test for trend (C2-C1), and test for whether post-treatment mean differs from the counterfactual.

If you include distinct dummies for each group (d.treat.pre, d.treat.post, d.control.pre, & d.control.post) the model would report the group means but you would lose all of the tests of your hypotheses.

Make sense?

Yes it makes sense. Thank you @lecy

castower · 2020-04-06T01:28:08Z

@lecy This question is not directly related to CPP 525, but I was curious if the de-meaning process in the OLS model is similar to what we're doing in CPP 528 with finding the z-scores and centering the data from the census to make comparisons. Would that be another example of de-meaning?

-Courtney

lecy · 2020-04-06T02:26:08Z

@castower It's related to that, yes.

By centering the data in panel models we shifting all of the distributions over to a common axis. In inferential terms this is to account for different initial conditions.

Standardizing the data takes it one step further. We also divide by the standard deviation of each variable so that each variable now has a mean of zero and sd of one. If we are creating an index we care about the variance because we want one unit of item A to be similar to one unit of B .

library( ggplot2 )
library( ggpubr )
library( dplyr )

data( iris )

iris <-
  iris %>%
  group_by( Species ) %>%
  mutate( centered = Petal.Length - mean(Petal.Length),
          standardized = ( Petal.Length - mean(Petal.Length) ) / sd(Petal.Length) )

p1 <- ggplot( iris ) +
      geom_density( aes(x = Petal.Length, fill = Species),
               alpha = 0.6 )

p2 <- ggplot( iris ) +
      geom_density( aes(x = centered, fill = Species),
               alpha = 0.6 )

p3 <- ggplot( iris ) +
      geom_density( aes(x = standardized, fill = Species),
               alpha = 0.6 )

ggarrange( p1, p2, p3, 
          labels = c("actual", "centered", "standardized"),
          ncol = 1, nrow = 3 )

castower · 2020-04-06T02:29:13Z

@lecy thank you!

lecy · 2020-04-06T02:33:59Z

These are some examples of linear transformations of variables, which are useful in regression when you get into more advanced models.

A linear transformation converts X to a new variable X2 by adding and multiplying by a constant:

Y = mX + b

centering: X2 = (1)(X) - mean(x)

standardizing: X2 = (1/sd)(X) - mean(x)

These transformations impact means and variances, and thus are important to pay attention to in the regression context for understanding how changing the scale of a measure can change inferences.

Measurement error, for example, adds a new perturbation variable similar to the residual term in a regression that has a mean of zero, thus it increases variance but doesn't shift the mean:

u = measurement error random variable

X2 = X + u

There are some examples starting on slide 9 here:

https://github.com/DS4PS/cpp-523-spr-2020/raw/master/lectures/p-09-specification.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LAB-03 #5

LAB-03 #5

sunaynagoel commented Apr 2, 2020

lecy commented Apr 2, 2020

sunaynagoel commented Apr 2, 2020

sunaynagoel commented Apr 2, 2020

lecy commented Apr 3, 2020

lecy commented Apr 3, 2020

sunaynagoel commented Apr 3, 2020

a.

b.

sunaynagoel commented Apr 3, 2020

lecy commented Apr 3, 2020

lecy commented Apr 3, 2020

sunaynagoel commented Apr 3, 2020

castower commented Apr 6, 2020

lecy commented Apr 6, 2020

castower commented Apr 6, 2020

lecy commented Apr 6, 2020 •

edited

Loading

LAB-03 #5

LAB-03 #5

Comments

sunaynagoel commented Apr 2, 2020

lecy commented Apr 2, 2020

sunaynagoel commented Apr 2, 2020

sunaynagoel commented Apr 2, 2020

lecy commented Apr 3, 2020

a.

b.

lecy commented Apr 3, 2020

sunaynagoel commented Apr 3, 2020

a.

b.

sunaynagoel commented Apr 3, 2020

lecy commented Apr 3, 2020

lecy commented Apr 3, 2020

sunaynagoel commented Apr 3, 2020

castower commented Apr 6, 2020

lecy commented Apr 6, 2020

castower commented Apr 6, 2020

lecy commented Apr 6, 2020 • edited Loading

lecy commented Apr 6, 2020 •

edited

Loading