Diagnostic tests and set-up for a panel data empirical project

Noah Spencer

Join Date: Jan 2019

Posts: 125
#1

Diagnostic tests and set-up for a panel data empirical project

18 Jun 2019, 10:44

I'm working on a project that seeks to calculate the impact of the relative allocation of capital and labour within a particular sector on total expenditures, holding output constant. I have panel data on 10 states over 20 years. I want to run either a fixed-effects or random-effects model (whatever the Hausman test indicates is appropriate) to control for time-invariant differences between states.

I've collected data on my explanatory variables and controls and run a few preliminary regressions, but now I want to make sure my model is econometrically sound. I've been going down the list of the FE/RE assumptions in Wooldridge's Introductory Econometrics textbook. However, I'm not sure if there's a right order to be checking things in. If someone could help me verify that a) my approaches to verifying each assumption are correct; and b) I am doing them in an order that makes sense, that would be great! Also, I am stuck on the first step due to a technical issue which I made a separate post about; I would still appreciate advice on my other steps and order of events.

1) Perform a Levin-Lin-Chu test with the -xtunitroot llc- command to determine whether my data are stationary in levels or if first-differencing is required.

I am getting conflicting results here when I specify different maximum lag lengths and this is a point where I'm stuck (see: https://www.statalist.org/forums/for...unit-root-test).

Assuming I get this sorted out, I will know whether to run my regression using the level variables or first-differenced variables (or second-differenced variables, etc...). My goal here is to avoid issues of spurious regression.

2) Conduct a Hausman test with the -hausman- command to determine whether to use a fixed effects or random effects model.

3) Check and correct for heteroskedasticity and/or serial correlation.

I will use the -xttest3- command to check for heteroskedasticity if the Hausman test indicates I should use an FE model. However, I'm not aware of any checks for heteroskedasticity with a RE model. Does a relevant command exist? In any case, I assume there likely is some heteroskedasticity, and so I will use robust standard errors with the -vce(robust)- command in my regressions.

I will use the -xtserial- command to check for serial correlation. If there is serial correlation, using the robust standard errors will account for its impact on my estimates.

4) Check that the errors are normally distributed.

I will collect error terms with -predict res, e- after running regressions and then use the skewness-kurtosis test command -sktest- and the Jarque-Bera test command -jb- to test whether my error terms are normally distributed.

If they are not normally distributed, I'm not sure what I would need to do next. Does anyone have any suggestions here? Would this indicate omitted variable bias?

One of the most important assumptions for a FE/RE model is strict exogeneity, but as far as I'm aware, this must be argued for qualitatively and cannot be proven empirically. I will thus build that argument qualitatively, as well as running models with and without controls.

What are your overall impressions of this plan and am I executing it properly?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

18 Jun 2019, 10:49

Noah:
just an aside:
points 3) and 4) should be accomplished before point 2) (ie, hausman test).
Please also niote that, if you invoke non-default standard errors, -hausman- should be replaced by the user-written command -xtoverid- (that, being glorious but a bit old-fashioned, does not allow -fvvarlist- notation. The usual fix is to use -xi.- prefix).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Noah Spencer

Join Date: Jan 2019

Posts: 125
#3

18 Jun 2019, 11:15

Thank you, Carlo!

Don't (3) and (4) require knowing whether I am using FE or RE? For (3), it determines which command I can use. For (4), won't I get different residuals with different regression models?

Thank you for the tip about -xtoverid-, I will use that.

Otherwise, does my approach look okay to you?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

18 Jun 2019, 11:26

Noah:
the main issue is that, if you detect heteroskedasticity and/or autocorrelation (regardless -fe- or -re- specification) you should account for them before testing which specification fits your data better.
The normality of the residual distribution does not seem to be an issue of great concern, especially if you can rely on a pretty large sample size.
In additon, in panel data analysis you have two kind of residuals, the first one (u_i) is the panel-wise effect and the other (epsilon_it) is the idiosyncratic error: as far as normality and heteroskedastcity you actually investigated the latter.
As a global opinion, your approach sounds wise. However, as you will probably deal with a T>N panel dataset, you should consider -xtgls- and/or -xtregar- instead of -xtreg-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Noah Spencer

Join Date: Jan 2019

Posts: 125
#5

18 Jun 2019, 14:43

Ok, so my procedure would then be:

1. Perform the Levin-Lin-Chu test to determine whether I need to difference my variables to make them stationary.

2. Test for heteroskedasticity and autocorrelation.

Test heteroskedasticity by using the GLS-based approach recommended by the Stata FAQ, where the null hypothesis is homoskedasticity (https://www.stata.com/support/faqs/s...tocorrelation/).

Test autocorrelation using -xtserial-.

3. Test for Normality of the idiosyncratic error.

How would I do this without first specifying the regression model? Would I just use the residuals from xtgls?

If I find non-Normality, what should I do?

4. If there is no heteroskedasticity or autocorrelation, use -hausman- to choose between FE and RE. If there is heteroskedasticity and/or autocorrelation, use -xtoverid- to choose between FE and RE.

5. Based on the above results, use one of -xtgls-, -xtregar-, or -xtreg- with a FE or RE model.

How do I decide which of these three options to use?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

19 Jun 2019, 00:22

Noah:
as far your point #5 is concerned, -xtreg- is good for N>T panel datasets, whereas the opposite holds (ie, T>=N) for -xtgls- and -xtregar-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Noah Spencer

Join Date: Jan 2019

Posts: 125
#7

19 Jun 2019, 07:54

Ok, thank you! I have first-order autocorrelation when I use my level variables, but I can't reject the null of no first-order autocorrelation when I first-difference. If my regression is going to use first-differenced terms, is it okay to use xtreg? I apologize for being hesitant to use -xtgls- or -xtregar-, it's just that I am unfamiliar with these commands and my supervisor had recommended -xtreg-. Could you point me to a resource that illustrates why xtgls is better than xtreg for T>=N cases?

I greatly appreciate your help!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

19 Jun 2019, 09:03

Noah:
- if you are planning to use first-difference estimator, you can run an OLS on first differenced variables (instead of using -xtreg-) (see chapter 8.9. in https://www.stata.com/bookstore/micr...trics-stata/);
- if you detected autocorrelation, you can invoke clustered standard errors with -xtreg- (see chapter 10.4.5 in https://www.stata.com/bookstore/micr...trics-stata/);
- eventually, long panels are covered in chapter 8.10 of https://www.stata.com/bookstore/micr...metrics-stata/

Kind regards,
Carlo
(Stata 19.0)
Comment
Noah Spencer

Join Date: Jan 2019

Posts: 125
#9

20 Jun 2019, 09:01

Ok, thank you! I think I will use a first-difference estimator since differencing seems to get rid of my autocorrelation, heteroskedasticity, and unit root issues. I'll also do some regressions with robust standard errors just to be safe with regards to autocorrelation and heteroskedasticty, though tests do not find significant evidence of them.
Comment

Announcement

Diagnostic tests and set-up for a panel data empirical project

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment