Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects regression Prob > F too high -> random effects?

    Dear community,

    I am currently working with panel data for my masters thesis. I conducted a Hausman test to see whether I should use a random effects or fixed effects model. The Hausman test was in favor of the fixed effects model. The F statistic is not good for this model though and it therefore does not seem appropriate (F(26,317)=1.24; Prob > F=0.1947). For my random effects model I get (wald chi2(36)=102.65; Prob > chi2=0.00000), so this model seems appropriate.
    Should I argue that I should use the random effects model model because of this? I am open to any other tipps!

    Thank you in advance
    Victoria

  • #2
    No! The Hausman test is, at bottom, a test of the equality of the coefficients between the two models. The fixed effects model is, by its construction, a consistent estimator, whereas random effects is not. The Hausman test is oriented towards consistency: it only says to use the random effects model if the coefficients come out, to within a small amount of noise, the same as you would get from fixed effects. The benefit of using random effects in that situation is that the random effects model is more efficient: you will have narrower confidence intervals, smaller standard errors. But if the models disagree, the random effects model is not consistent and, at least for those who believe in model selection by Hausman test (I am not among them, by the way), it should not be used at all.

    The overall F statistic of a model is usually of little or no interest in any case. It is an omnibus test of all the regression coefficients being zero. But usually one is only interested in the coefficients of certain specific predictors, the rest being included only to adjust ("control") for their effects on the outcome. So the omnibus F statistic answers a question that we are seldom interested in asking in the first place. The same applies to the overall model chi2 for a random effects model. Unless you are specifically interested in a test of the hypothesis that all of the coefficients are simultaneously zero, you should just ignore it.

    Finally, even if this is a situation where the overall F (or chi 2) statistic really is what you are looking for (that is, your principal aim is to test the null hypothesis that all of the coefficients are zero) it is completely illegitimate, an abuse of statistics, to select your model based on the resulting p-value! That practice goes by various names: p-hacking, data dredging, noise mining. And it is an egregious form of statistical malpractice; some even consider it scientific misconduct.

    Comment


    • #3
      Dear Mr. Schechter,

      thank you for your quick and detailed response. It helped me a lot.
      I truely did not mean to cheat or deceive - but it‘s good that you pointed that out to me!
      Have a great day!

      Comment


      • #4
        I truely did not mean to cheat or deceive - but it‘s good that you pointed that out to me!
        Just to be clear, I never thought you intended to do anything deceptive. Like most investigators, you are eager to get "significant" results in your analysis, and it wouldn't even surprise me to learn that you have been taught to use this approach. It is not at all uncommon. But it is wrong, and tolerance for it is rapidly decreasing as it is one of the major contributors to the reproducibility crisis in statistical research. So I wanted to make a clear and pointed statement that this is a practice you should not take up, or should abandon if you have been using it.

        Good luck! Have a great day, too. And stay safe and healthy.

        Comment

        Working...
        X