Importance of misspecification test vs. R-sq. and consequences of xtsktest

Helen Hickmann

Join Date: May 2020

Posts: 24
#16

03 Jul 2020, 08:22

Upon researching poisson regressions I have also read about negative binomial regressions. As my standard deviations for my DV is much higher than its mean, could this be a more appropriate approach for my data?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2169
#17

03 Jul 2020, 13:39

Under no circumstances should you use negative binomial in an FE context. Much is known about its undesirable properties. And you cannot know if, after you conditional on the unobserved effects whether the variance is greater than the mean. There's a reason a said to use FE Poisson.

The xtpoisson command simply is up front about how many observations do not contribute to the estimation. If you just have one time period, those observations are useless in an FE environment. If you have no variation in y(i,t) -- such as all zeros -- that unit is also useless. It happens in the linear case, too, but no one says anything about it. The same observations that get dropped in the Poisson FE case are effectively dropped int he linear case, too.

I would use ln(total) instead. The coefficients have percentage interpretations when multiplied by 100. What variable(s) do you care about most?

Jeff
1 like
Comment
Helen Hickmann

Join Date: May 2020

Posts: 24
#18

04 Jul 2020, 10:49

Dear Jeff,

thanks for the clarification about the dropped observations. So will a firm only be dropped if it has zero highskilled employees over all years or are single years dropped? And they also get dropped if the number does not change over all years? I could try to use an unbalanced panel instead of restricting my panel to firms who have participated all 12 years, that might give me some more observations.

The variables I care most about are my dependent variable (number of highskilled, has zeros) and then as explanatory variables the dummies investict, process_inno and project_inno. The rest are controls but they should stay in the regression for the validity of the results. I have seen in histograms that my DV, the number of total employees, investment and exportshare are all heavily concentrated around lower numbers and via scatter plots I could tell that the relationship between my DV and these variables is non-linear. That’s why I would have liked to be able to transform them via log, also due to the nice interpretation, but the zeros unfortunately do not allow that.

When I use -xtpoisson, fe vce(robust)- my Main explanatory variables are not significant but as Long as I can argue that I chose the correct specification for my data this should not be a problem, there might simply not be a relationship.

Helen
Comment

Helen Hickmann

Join Date: May 2020
Posts: 24

#19

06 Jul 2020, 03:35

I have now used the model specification

Code:

 xtreg lnhighskill investict product_inno process_inno lntotal collective lnexportshare lninvestment rnd tech i.industry i.year, fe vce(robust)

and applied it to my original data (which I am not allowed to publish here), the balanced as well as the unbalanced panel. As said before my main IVs are insignificant, but what worries me more is that when testing the model specification via

Code:

 xtpoisson highskill investict product_inno process_inno lntotal collective exportshare investment rnd tech i.industry i.year, fe vce(robust)
predict xbhat, xb
g xbhatsq=xbhat^2
g xbhatcu=xbhat^3
xtpoisson highskill investict product_inno process_inno lntotal collective exportshare investment rnd tech xbhatsq xbhatcu i.industry i.year, fe vce(robust)
test xbhatsq xbhatcu

I get Prob > chi2 = 0.0000, indicating my model is misspecified.

I really don't know which other specification to try anymore...

Announcement

Comment

Comment

Comment

Comment