I'm working on a project that seeks to calculate the impact of the relative allocation of capital and labour within a particular sector on total expenditures, holding output constant. I have panel data on 10 states over 20 years. I want to run either a fixed-effects or random-effects model (whatever the Hausman test indicates is appropriate) to control for time-invariant differences between states.
I've collected data on my explanatory variables and controls and run a few preliminary regressions, but now I want to make sure my model is econometrically sound. I've been going down the list of the FE/RE assumptions in Wooldridge's Introductory Econometrics textbook. However, I'm not sure if there's a right order to be checking things in. If someone could help me verify that a) my approaches to verifying each assumption are correct; and b) I am doing them in an order that makes sense, that would be great! Also, I am stuck on the first step due to a technical issue which I made a separate post about; I would still appreciate advice on my other steps and order of events.
1) Perform a Levin-Lin-Chu test with the -xtunitroot llc- command to determine whether my data are stationary in levels or if first-differencing is required.
I am getting conflicting results here when I specify different maximum lag lengths and this is a point where I'm stuck (see: https://www.statalist.org/forums/for...unit-root-test).
Assuming I get this sorted out, I will know whether to run my regression using the level variables or first-differenced variables (or second-differenced variables, etc...). My goal here is to avoid issues of spurious regression.
2) Conduct a Hausman test with the -hausman- command to determine whether to use a fixed effects or random effects model.
3) Check and correct for heteroskedasticity and/or serial correlation.
I will use the -xttest3- command to check for heteroskedasticity if the Hausman test indicates I should use an FE model. However, I'm not aware of any checks for heteroskedasticity with a RE model. Does a relevant command exist? In any case, I assume there likely is some heteroskedasticity, and so I will use robust standard errors with the -vce(robust)- command in my regressions.
I will use the -xtserial- command to check for serial correlation. If there is serial correlation, using the robust standard errors will account for its impact on my estimates.
4) Check that the errors are normally distributed.
I will collect error terms with -predict res, e- after running regressions and then use the skewness-kurtosis test command -sktest- and the Jarque-Bera test command -jb- to test whether my error terms are normally distributed.
If they are not normally distributed, I'm not sure what I would need to do next. Does anyone have any suggestions here? Would this indicate omitted variable bias?
One of the most important assumptions for a FE/RE model is strict exogeneity, but as far as I'm aware, this must be argued for qualitatively and cannot be proven empirically. I will thus build that argument qualitatively, as well as running models with and without controls.
What are your overall impressions of this plan and am I executing it properly?
I've collected data on my explanatory variables and controls and run a few preliminary regressions, but now I want to make sure my model is econometrically sound. I've been going down the list of the FE/RE assumptions in Wooldridge's Introductory Econometrics textbook. However, I'm not sure if there's a right order to be checking things in. If someone could help me verify that a) my approaches to verifying each assumption are correct; and b) I am doing them in an order that makes sense, that would be great! Also, I am stuck on the first step due to a technical issue which I made a separate post about; I would still appreciate advice on my other steps and order of events.
1) Perform a Levin-Lin-Chu test with the -xtunitroot llc- command to determine whether my data are stationary in levels or if first-differencing is required.
I am getting conflicting results here when I specify different maximum lag lengths and this is a point where I'm stuck (see: https://www.statalist.org/forums/for...unit-root-test).
Assuming I get this sorted out, I will know whether to run my regression using the level variables or first-differenced variables (or second-differenced variables, etc...). My goal here is to avoid issues of spurious regression.
2) Conduct a Hausman test with the -hausman- command to determine whether to use a fixed effects or random effects model.
3) Check and correct for heteroskedasticity and/or serial correlation.
I will use the -xttest3- command to check for heteroskedasticity if the Hausman test indicates I should use an FE model. However, I'm not aware of any checks for heteroskedasticity with a RE model. Does a relevant command exist? In any case, I assume there likely is some heteroskedasticity, and so I will use robust standard errors with the -vce(robust)- command in my regressions.
I will use the -xtserial- command to check for serial correlation. If there is serial correlation, using the robust standard errors will account for its impact on my estimates.
4) Check that the errors are normally distributed.
I will collect error terms with -predict res, e- after running regressions and then use the skewness-kurtosis test command -sktest- and the Jarque-Bera test command -jb- to test whether my error terms are normally distributed.
If they are not normally distributed, I'm not sure what I would need to do next. Does anyone have any suggestions here? Would this indicate omitted variable bias?
One of the most important assumptions for a FE/RE model is strict exogeneity, but as far as I'm aware, this must be argued for qualitatively and cannot be proven empirically. I will thus build that argument qualitatively, as well as running models with and without controls.
What are your overall impressions of this plan and am I executing it properly?
Comment