Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting random-effects estimates and ensuring that they are uncorrelated with covariates.

    I have been working with a proprietary data set to calculate school value added using fixed effects, and I have started trying to run an equivalent model with random effects as a robustness check. I wanted to make sure that the random effects were calculated correctly, and found that they were still correlated with the other variables in the regression. I was able to reproduce this using a publicly-available data set, and want to check three things:
    1. That xtreg, re followed by predict, u are the proper commands to obtain the random effects. Notably, the data set is not a panel in the traditional "unit+time" sense, as the panel variable is "school" (or "division" in the example below) and the observations are "students".
    2. That these commands impose the random effects assumptions rather than requiring the data to fit them.
    3. That I am checking the correlations between the random effects and the covariates properly.
    Here is some sample code that produces a similar result.

    Code:
    /* Load data from web. */
    webuse citytemp
    
    /* Set the data in order to use panel-data commands. */
    xtset division
    
    /* Run random-effects regression. */
    xtreg heatdd tempjan cooldd, re
    
    /* Extract random effects. */
    predict random_effect, u
    
    /* Test correlations between random effects and covariates in the original regression. */
    reg random_effect tempjan
    /* Coefficient on tempjan is 2.814, p < 0.001. */
    
    reg random_effect cooldd
    /* Coefficient on cooldd is -0.089, p < 0.001. */
    My guess is that I'm making a silly mistake and using the wrong command, but it's worth checking. Thank you for any help that you can provide.

  • #2
    That xtreg, re followed by predict, u are the proper commands to obtain the random effects. Notably, the data set is not a panel in the traditional "unit+time" sense, as the panel variable is "school" (or "division" in the example below) and the observations are "students".
    That depends. If you have multiple observations per student, then you do not have anything like a panel data set, you have a three-level data set and the use of -xtreg, re- is inappropriate--use -mixed- instead. If what you have are single observations per student nested within schools, then this would be a reasonable way to do random-effects regression and estimate the random effects.

    That these commands impose the random effects assumptions rather than requiring the data to fit them.
    That's false. The estimates are made by a method which is consistent if the random effects are independent of the predictors at the population level. This assumption, by the way, is entirely analogous to the assumption that the error terms are independent of the predictors in single-level OLS regression. It's about omitted variable bias.

    That I am checking the correlations between the random effects and the covariates properly.
    You're not checking the correlations at all. You're estimating a regression coefficient. The analysis you are doing is, in particular, sensitive to the measurement scales involved. Change tempjan from F to C, or whatever, and you'll get a different result. Same for the scale of heatdd. If you want a correlation, use the -corr- command. Yes, the p-value will come out the same. But for purposes of whether your estimates from -xtreg, re- are consistent, it doesn't matter what the p-value is. The degree of inconsistency depends on the correlation between the random effects and the predictors. If your sample is very large, then even small correlations that would not materially affect the consistency of the estimates will be statistically significant. Similarly, if your sample is too small, you may have large correlations that really do matter from a consistency perspective but fail to be statistically significant.

    That said, if you are looking for a test of the assumption, the commonly used test is in the -hausman- command. Sensitivity to sample size is also a worry with this test, but at least you would have the advantage of using an approach that is widely accepted and can be referenced.





    Comment

    Working...
    X