xtpcse vs. xtabond2 in panel data

Alex Lukassen

Join Date: Aug 2015

Posts: 8
#1

xtpcse vs. xtabond2 in panel data

26 Aug 2015, 12:03

Dear Statalisters,

I need some help concerning the right choice of commands in my panel data set. It includes 266 cross sections and 18 time periods (strongly balanced). Random effects are rejected in favour of fixed effects. Since the data is heteroskedastic, autocorrelated, contemporaneously correlated and includes a lagged dependent variable, I would take first differences to eliminate autocorrelation, explicit fixed effects and the correlation of the lagged dependent variable with the disturbances.
Usually, I would then use the xtabond2 estimator to account for heteroskedasticity and contemporaneous correlation. However, many of the independent variables are dummies and Roodman (2009, p. 115) stated that this command shouldn't be applied if any dummy is 0 for almost all or 1 for almost all observations which is the case with my data. Therefore, I considered the xtpcse estimator. Unfortunately, this resulted in an error message.

Code:

xi: xtpcse D.(y laggedy indvars i.countryid i.year), correlation(ar1) no time periods are common to all panels, cannot estimate disturbance covariance matrix using casewise inclusion r(459);

Questions
1. Can I use the xtpcse estimator in this setting?
2. Can anyone tell me where the presented command goes wrong?

If you need more details, let me know.

Kind regards,
Alex
Tags: None
Joseph L. Staats

Join Date: Aug 2015

Posts: 28
#2

26 Aug 2015, 21:02

Alex,

Try this:

xtpcse D.(y laggedy indvars i.countryid i.year), pairwise correlation(ar1) ... or correlation(psar1)

You might also try using Driscoll-Kraay standard errors (xtscc), which is available as a user-designed program.

Best,

Joe

Joseph L. Staats, J.D., Ph.D.
Associate Professor
Department of Political Science
University of Minnesota Duluth
Cina Hall 307
1123 University Drive
Duluth, MN 55812
(218) 726-6641
[email protected]
Comment
Alex Lukassen

Join Date: Aug 2015

Posts: 8
#3

26 Aug 2015, 21:49

Thank you Joseph.

I made a mistake in my initial post. I posted this command:

Code:

xi: xtpcse D.(y laggedy indvars i.countryid i.year), correlation(ar1)

However, combining the D. operator with i.countryid and i.year is not allowed.

Running the following comands

Code:

xtpcse D.(y laggedy indvars) i.countryid i.year xtpcse D.(y laggedy indvars) i.countryid xtpcse D.(y laggedy indvars) i.year

all lead to the following error message: "Warning: variance matrix is nonsymmetric or highly singular". Standard errors, p-values, and confidence intervals are not reported.

When I use

Code:

xtpcse D.(y laggedy indvars), pairwise

it works. I left out the correlation(ar1) option since I thought autocorrelation would be accounted for through first differencing (please correct me if I am wrong). Also, I am not sure if this command eliminates the other problems of my data set which I mentioned before (heteroskedasticity, contemporaneous correlation).

Running the following command, which has previously been recommended to me in case I am able to refrain from including the lagged dependent variable, lead to an R squared of about 0.16 whereas the xtpcse estimator (including the lagged dependent value) reduces this value to 0.05.

Code:

xi: reg D.(y depvars i.year), cluster(countryid)

This decrease leaves me a bit puzzled. I am not sure which model is more suitable.

Last edited by Alex Lukassen; 26 Aug 2015, 21:54.
Comment
Joseph L. Staats

Join Date: Aug 2015

Posts: 28
#4

27 Aug 2015, 11:13

Alex,

I'm not so sure you need a lagged dependent variable, especially since you are using ar1 or psar1. You have to make that call based on the unique nature of your research project. Also, I assume that you are lagging your independent/control variables. I am familiar with regressions of panel data in international political economy research (my own and others). Based on this, I see something like this working for your project:

Code:

xtpcse y indvars year_dum* cnty_dum*, pairwise corr(psar1)

Again, I recommend you consider Driscoll-Kraay standard errors. You can read about this method here:

http://www.stata-journal.com/sjpdf.h...iclenum=st0128

In Driscoll-Kraay your command line would be:

Code:

xtscc y indvars year_dum*, fe

I hope this helps.

Best,

Joe
Comment

Alex Lukassen

Join Date: Aug 2015
Posts: 8

27 Aug 2015, 18:58

Thanks again for your help, Joseph.

I ran different regressions according to your recommendation:

Code:

xi: reg D.( y depvars i.year), cluster (countyid)
xtpcse y laggedy depvars year* countyid*, corr(psar1)
xtpcse y laggedy depvars year* countyid*, pairwise corr(psar1)
xtscc y laggedy depvars year*, fe

The results looked as follows:

Code:

 Variable |     xireg       xtpcse_casewisee   xtpcse_pairwise       xtscc      
-------------+----------------------------------------------------------------
percentchangepop
         D1. | -65.267144**                                                   
         --. |                 -55.785918      -55.785918      -59.344459     
             |
unemploymentrate
         D1. | -54.985701***                                                  
         --. |                 -28.517336**    -28.517336**    -28.744504     
             |
  stadiumcap
         D1. | -.00084621                                                     
         --. |                 -.00226165**    -.00226165*     -.00326379*    
             |
nrfirstbundteams
         D1. | -15.693539                                                     
         --. |                  10.486173       10.486173       4.5096825     
             |
nrsecondbundteams
         D1. |  4.8190392                                                     
         --. |                  6.0215406       6.0215406       .50076176     
             |
nr1stdivhockeyteams
         D1. |  -9.078215                                                     
         --. |                  70.161801***    70.161801**    -6.6679097     
             |
nr1stdivbasketteams
         D1. | -112.38692                                                     
         --. |                  -9.785193       -9.785193      -42.021028     
             |
nr1sthballteams
         D1. |  34.417208                                                     
         --. |                 -28.416358      -28.416358       42.256221     
             |
nr3rddivteams
         D1. |  70.066827*                                                    
         --. |                  56.963029**     56.963029*      89.682623*    
             |
nr4thdivteams
         D1. |  34.440157                                                     
         --. |                 -3.2123857      -3.2123857       17.203549     
             |
    stadyear
         D1. |  .00484439                                                     
         --. |                  .01618633       .01618633          .01304     
             |
   stadrenov
         D1. | -19.947809                                                     
         --. |                   -69.9948*       -69.9948      -82.564576     
             |
3rddivstadhoney
         D1. | -27.782703                                                     
         --. |                  48.361165       48.361165       41.668089     
             |
4thdivstadhoney
         D1. |  25.622879                                                     
         --. |                  30.625975       30.625975       7.3490174     
             |
nrpromo5thto4th
         D1. |  37.663843                                                     
         --. |                  22.009596       22.009596       20.572219     
             |
nrpromo4thto3rd
         D1. | -.63851746                                                     
         --. |                  9.4968656       9.4968656       23.045861     
             |
nrpromo3rdto2nd
         D1. | -14.014563                                                     
         --. |                 -14.211144      -14.211144      -8.8363123     
             |
nrrele2ndto3rd
         D1. |  48.668203                                                     
         --. |                  49.922576       49.922576        71.70564     
             |
nrrele3rdto4th
         D1. |  -24.57393                                                     
         --. |                 -62.075376      -62.075376      -55.160333     
             |
nrrele4thto5th
         D1. | -15.840651                                                     
         --. |                  15.025695       15.025695      -10.459854     
             |
nrrele4thto6th
         D1. |  120.57864                                                     
         --. |                  260.22644       260.22644       167.47553*  

R-squared                                                    
             |    0.4138           0.9943        0.9943            0.8293

Just a bit of context: I am analyzing the effect of 3rd and 4th division soccer teams and stadiums on local income (county-level). As I said, there are 266 counties and 18 time periods.
Similar research yielded R-squared values between approximately 0.3 and 0.65. Therefore, I think that the "xireg" model seems to be the most realistic although (or because?) it doesn't include a lagged dependent variable. The value of the R-squared of the two "xtpcse" estimations makes me believe that something went wrong (high R-squared and few significant coefficients). Concerning the xtscc estimator, the previously mentioned paper ("Robust standard errors for panel regressions with cross-sectional dependence", Hoechle, 2007) states that results should be considered with caution when N is large and T is short. However, "large" and "short" is not clearly specified.

Some feedback on whether the "xireg" model is an appropriate choice and if there is a mistake in the "xtpcse" estimations would be highly appreciated. Again, if more details are needed to be able to give some feedback, please let me know.

Thanks and kind regards,
Alex

Comment

Alex Lukassen

Join Date: Aug 2015

Posts: 8
#6

27 Aug 2015, 21:06

Sorry, there is one more question on my mind. Assuming that I will use the "xi: reg D.(y depvars i.year), cluster(countyid)"-estimator, does not including a lagged dependent variable lead to omitted variable bias when analyzing the sports environment's impact on income?
Comment
Joseph L. Staats

Join Date: Aug 2015

Posts: 28
#7

28 Aug 2015, 09:28

Alex,

I don't see the ratio between your N and T as being a problem for Driscoll-Kraay standard errors regression. The Hoechle article uses data with an N of 1000 and a T of 40, which is a greater ratio than your N of 266 and T of 18.

I recommend that xtreg is more useful as a choice for comparison than xi:reg. See why here:

http://www.statalist.org/forums/foru...-data-analysis

As to whether to include a lagged dependent variable, I suggest you read the following and look up relevant works listed in the bibliography, most especially Achen (2000) and Beck and Katz (2011):

http://web.stanford.edu/~arjunw/LaggedDVs.pdf

Using xi:reg or xtreg doesn't take care of potential heteroskedasticity. See this work by Richard Williams for guidance on ways to detect and correct for heteroskedasticity:

https://www3.nd.edu/~rwilliam/stats2/l25.pdf

Good luck.

Best,

Joe
Comment
Alex Lukassen

Join Date: Aug 2015

Posts: 8
#8

28 Aug 2015, 11:23

Thanks for the insights.
I knew about most of the papers dealing with lagged dependent variables that you mentioned. Anyway, it was good to see that I had a look at the "right" ones.

Xi:reg has been recommended to me earlier (see previous post). Since I would use it with first differences, autocorrelation and explicit fixed effects would be eliminated. By introducing the cluster option, any remaining serial correlation or heteroskedasticity would be accounted for. However, including a lagged dependent variable would bias the results.

This leaves me with either xtscc or xtpcse. Yet, using xtscc with a lagged dependent variable generates unreasonably high R-squared values with few significant coefficients. Therefore, I would like to use one of the following commands since xtpcse assumes that the disturbances are, by default, heteroskedastic and contemporaneously correlated across panels:

Code:

xtpcse dY dLaggedY dDepVars xi: xtpcse dY dLaggedY dDepVars i.countyid xi: xtpcse dY dLaggedY dDepVars i.year xi: xtpcse dY dLaggedY dDepVars i.year i.countyid (where small "d" stands for first-differenced)

Question
Do I need to include dummies for either county or year or both? From a theoretical point of view, I am not quite sure about the assumptions behind each of these commands. Thus, selecting one of them gets difficult.

Kind regards,
Alex
Comment
Joseph L. Staats

Join Date: Aug 2015

Posts: 28
#9

28 Aug 2015, 13:55

I'm not sure what your question is. Dummy variables for unit (counties) or time (year) are used to model for fixed effects of either of these. The Hausman test tells you whether you need to model for unit fixed effects. To determine whether you need to model for time fixed effects, run your regression with your year dummy variable included and then enter the following command:

Code:

testparm year_dummy_variable

If p<.10, you need to model for time fixed effects. Keep in mind that you might try modeling for some group of years rather than each and every year. It's quite common in the work I do to model for decade fixed effects, or some specific group of years that is known by researchers to be problematic (such as an economic crisis that lasted for a number of years).

If your question only had to do with whether you need to add dummy variables for county or year in the commands you listed in your latest message, my answer is that using a command with xi: and i.year and/or i.countyid accomplishes the same thing as adding dummy variables for either of these. Therefore, in such a case you would not add dummy variables.
Comment
Alex Lukassen

Join Date: Aug 2015

Posts: 8
#10

28 Aug 2015, 15:23

I know that the command xi followed by i.year and/or i.countyid creates dummy variables. That's why I presented 4 different commands (3 of them including dummies for year, county, or both) and asked which one would be most suitable. Thanks to your recommendation I (re)ran the hausman and testparm statisitic which made me revise a bit of the theory and finally confirmed my assumptions.

The linear reduced form model I specified looks as follows:

Code:

y_it= β₁X_it+ β₂ Z_it+ ϑ_i+ μ_t+ ε_it where ϑ_idenotes a county i specific fixed effect and μ_t a time t specific fixed effect

When first-differenced and simplified:

Code:

y_it - y_it-1 = β₁(X_it - X_it-1) + β₂(Z_it - Z_it-1) + (μ_t - μ_t-1) + (ε_it - ε_it-1)

So, theoretically, the county specific fixed effects (ϑi) are eliminated through first-differencing and I do not need to account for them afterwards. This has been confirmed by the hausman test. The time specific fixed effects remain which has been confirmed by the testparm statistic. (If there is a flaw in my reasoning, please let me know.)

To be able to include a lagged dependent variable, the estimator I will ultimately use looks like this:

Code:

xi: xtpcse dY dLaggedY dDepVars i.year

Thank you very much for your help, Joseph!
Comment

Announcement