Panel Data Modeling - Please Help

Doug Bujakowski

Join Date: Mar 2017

Posts: 42
#1

Panel Data Modeling - Please Help

21 May 2018, 17:54

I'm working with a small unbalanced panel dataset (N=24, T=30, Obs.=590). My goal is a typical one... to test hypothesized relationships between Xs and Y. When using a FE or RE framework, I've noticed that my errors are serially correlated. As a result, I'm thinking about making my model dynamic by adding a lag of the dependent variable as a regressor. I'm aware that doing so creates an endogeneity issue, leading to biased estimates. From what I've read, this bias diminishes as T increases.

Q1) Is T=30 large enough to ignore the endogeneity bias issue? Or do I need to address it via instrumentation?

Q2) More generally when are T and N considered "large/small"?

Q3) Can/should time fixed effects be used in a dynamic model?

Q4) How can I decide whether a single lag of the dependent variable enough? My dataset may be too small to support multiple lags but I'd still like to know.

Perhaps a dynamic model isn't the way to go. Alternatively, I could use first differencing or a time polynomial to alleviate nonstationarity issues.

Q5) How do I know which combination of these modeling techniques is most appropriate (dynamic methods, first-differencing, detrending, inclusion/exclusion of year dummies)?

Q6) At least one message board I found suggested that stationarity isn't a major concern when using panel data. I can't imagine how this could be true, as I think nonstationarity would lead to spurious results. Am I correct or am I missing something?

I've been reading message boards, online lecture notes, and academic papers for days but can't find practical answers to these questions.

If you can address ANY of these questions, I would greatly appreciate it. When doing so, please bear in mind that I'm looking for practical approaches and don't have the ability to understand highly technical/theoretical papers. Thank you!
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

22 May 2018, 11:50

You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. If you provided us more of that, we could be clearer about your problem.

You have a reasonable sized data set, but seem to be moving into relatively complicated issues when simpler may suffice. Dynamic models are trickier to estimate than non-dynamic. When you start to add lags and use more lags as instruments, your sample size will drop. If you are not ready to read technical papers, then you should probably avoid the more complex estimators recommended for dynamic models if you don't really need them.

Serial correlation does not immediately imply a lagged dv. A lagged dv really changes the model in substantive ways that you will need to think about. T seems large enough to justify xtgls over xtreg. xtgls will handle serial correlation. If you don't have heteroscedasticity, you can look at xtregar.

Time fixed effects are often helpful in panel data. Sometimes, I have used the mean of the dv for all other observations in a given time as a control instead of the time dummies.

A fixed effects regression should give results almost identical to first differenced estimation.

Different disciplines place different emphasis on stationarity. You may want to think about what your discipline does.
Comment
Doug Bujakowski

Join Date: Mar 2017

Posts: 42
#3

25 May 2018, 16:27

Hi Phil.

Thank you for your reply. I see your point about addressing serial correlation in ways other than the inclusion of a lagged DV. I will consider your recommendations, such as xtregar and xtgls.

I still worry that nonstationarity may be an issue. Panel data unit root tests suggest as much. You mention that "Different disciplines place different emphasis on stationarity." My feeling is that this issue shouldn't depend on my discipline. Nonstationarity can lead to spurious results, which are problematic regardless of field.

I know that I can take first differences to try to deal with the nonstationarity issue. When I do, my results are much different than those under a FE specification. You mention they should be nearly identical. Why might this be happening? And which set of results should I trust?

One issue with first differencing is that it involves considerable information loss. Are there are other methods to make series stationary? If so, I'd like to explore those options. In particular, does using a dynamic approach resolve the nonstationarity issue? I can't seem to find a clear answer to this question.
Comment

Announcement

Panel Data Modeling - Please Help

Comment

Comment