Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtpcse vs. xtabond2 in panel data

    Dear Statalisters,

    I need some help concerning the right choice of commands in my panel data set. It includes 266 cross sections and 18 time periods (strongly balanced). Random effects are rejected in favour of fixed effects. Since the data is heteroskedastic, autocorrelated, contemporaneously correlated and includes a lagged dependent variable, I would take first differences to eliminate autocorrelation, explicit fixed effects and the correlation of the lagged dependent variable with the disturbances.
    Usually, I would then use the xtabond2 estimator to account for heteroskedasticity and contemporaneous correlation. However, many of the independent variables are dummies and Roodman (2009, p. 115) stated that this command shouldn't be applied if any dummy is 0 for almost all or 1 for almost all observations which is the case with my data. Therefore, I considered the xtpcse estimator. Unfortunately, this resulted in an error message.
    Code:
    xi: xtpcse D.(y laggedy indvars i.countryid i.year), correlation(ar1)
    
    no time periods are common to all panels, cannot estimate disturbance
    covariance matrix using casewise inclusion
    r(459);
    Questions
    1. Can I use the xtpcse estimator in this setting?
    2. Can anyone tell me where the presented command goes wrong?


    If you need more details, let me know.

    Kind regards,
    Alex

  • #2
    Alex,

    Try this:

    xtpcse D.(y laggedy indvars i.countryid i.year), pairwise correlation(ar1) ... or correlation(psar1)

    You might also try using Driscoll-Kraay standard errors (xtscc), which is available as a user-designed program.


    Best,

    Joe

    Joseph L. Staats, J.D., Ph.D.

    Associate Professor
    Department of Political Science
    University of Minnesota Duluth
    Cina Hall 307
    1123 University Drive
    Duluth, MN 55812
    (218) 726-6641
    [email protected]



    Comment


    • #3
      Thank you Joseph.

      I made a mistake in my initial post. I posted this command:
      Code:
      xi: xtpcse D.(y laggedy indvars i.countryid i.year), correlation(ar1)
      However, combining the D. operator with i.countryid and i.year is not allowed.

      Running the following comands
      Code:
      xtpcse D.(y laggedy indvars) i.countryid i.year
      xtpcse D.(y laggedy indvars) i.countryid
      xtpcse D.(y laggedy indvars) i.year
      all lead to the following error message: "Warning: variance matrix is nonsymmetric or highly singular". Standard errors, p-values, and confidence intervals are not reported.

      When I use
      Code:
      xtpcse D.(y laggedy indvars), pairwise
      it works. I left out the correlation(ar1) option since I thought autocorrelation would be accounted for through first differencing (please correct me if I am wrong). Also, I am not sure if this command eliminates the other problems of my data set which I mentioned before (heteroskedasticity, contemporaneous correlation).

      Running the following command, which has previously been recommended to me in case I am able to refrain from including the lagged dependent variable, lead to an R squared of about 0.16 whereas the xtpcse estimator (including the lagged dependent value) reduces this value to 0.05.
      Code:
      xi: reg D.(y depvars i.year), cluster(countryid)
      This decrease leaves me a bit puzzled. I am not sure which model is more suitable.
      Last edited by Alex Lukassen; 26 Aug 2015, 21:54.

      Comment


      • #4
        Alex,

        I'm not so sure you need a lagged dependent variable, especially since you are using ar1 or psar1. You have to make that call based on the unique nature of your research project. Also, I assume that you are lagging your independent/control variables. I am familiar with regressions of panel data in international political economy research (my own and others). Based on this, I see something like this working for your project:

        Code:
        xtpcse y indvars year_dum* cnty_dum*, pairwise corr(psar1)

        Again, I recommend you consider Driscoll-Kraay standard errors. You can read about this method here:

        http://www.stata-journal.com/sjpdf.h...iclenum=st0128

        In Driscoll-Kraay your command line would be:

        Code:
         xtscc y indvars year_dum*, fe
        I hope this helps.

        Best,

        ​Joe

        Comment


        • #5
          Thanks again for your help, Joseph.

          I ran different regressions according to your recommendation:
          Code:
          xi: reg D.( y depvars i.year), cluster (countyid)
          xtpcse y laggedy depvars year* countyid*, corr(psar1)
          xtpcse y laggedy depvars year* countyid*, pairwise corr(psar1)
          xtscc y laggedy depvars year*, fe
          The results looked as follows:
          Code:
           Variable |     xireg       xtpcse_casewisee   xtpcse_pairwise       xtscc      
          -------------+----------------------------------------------------------------
          percentchangepop
                   D1. | -65.267144**                                                   
                   --. |                 -55.785918      -55.785918      -59.344459     
                       |
          unemploymentrate
                   D1. | -54.985701***                                                  
                   --. |                 -28.517336**    -28.517336**    -28.744504     
                       |
            stadiumcap
                   D1. | -.00084621                                                     
                   --. |                 -.00226165**    -.00226165*     -.00326379*    
                       |
          nrfirstbundteams
                   D1. | -15.693539                                                     
                   --. |                  10.486173       10.486173       4.5096825     
                       |
          nrsecondbundteams
                   D1. |  4.8190392                                                     
                   --. |                  6.0215406       6.0215406       .50076176     
                       |
          nr1stdivhockeyteams
                   D1. |  -9.078215                                                     
                   --. |                  70.161801***    70.161801**    -6.6679097     
                       |
          nr1stdivbasketteams
                   D1. | -112.38692                                                     
                   --. |                  -9.785193       -9.785193      -42.021028     
                       |
          nr1sthballteams
                   D1. |  34.417208                                                     
                   --. |                 -28.416358      -28.416358       42.256221     
                       |
          nr3rddivteams
                   D1. |  70.066827*                                                    
                   --. |                  56.963029**     56.963029*      89.682623*    
                       |
          nr4thdivteams
                   D1. |  34.440157                                                     
                   --. |                 -3.2123857      -3.2123857       17.203549     
                       |
              stadyear
                   D1. |  .00484439                                                     
                   --. |                  .01618633       .01618633          .01304     
                       |
             stadrenov
                   D1. | -19.947809                                                     
                   --. |                   -69.9948*       -69.9948      -82.564576     
                       |
          3rddivstadhoney
                   D1. | -27.782703                                                     
                   --. |                  48.361165       48.361165       41.668089     
                       |
          4thdivstadhoney
                   D1. |  25.622879                                                     
                   --. |                  30.625975       30.625975       7.3490174     
                       |
          nrpromo5thto4th
                   D1. |  37.663843                                                     
                   --. |                  22.009596       22.009596       20.572219     
                       |
          nrpromo4thto3rd
                   D1. | -.63851746                                                     
                   --. |                  9.4968656       9.4968656       23.045861     
                       |
          nrpromo3rdto2nd
                   D1. | -14.014563                                                     
                   --. |                 -14.211144      -14.211144      -8.8363123     
                       |
          nrrele2ndto3rd
                   D1. |  48.668203                                                     
                   --. |                  49.922576       49.922576        71.70564     
                       |
          nrrele3rdto4th
                   D1. |  -24.57393                                                     
                   --. |                 -62.075376      -62.075376      -55.160333     
                       |
          nrrele4thto5th
                   D1. | -15.840651                                                     
                   --. |                  15.025695       15.025695      -10.459854     
                       |
          nrrele4thto6th
                   D1. |  120.57864                                                     
                   --. |                  260.22644       260.22644       167.47553*  
          
          R-squared                                                    
                       |    0.4138           0.9943        0.9943            0.8293 
          Just a bit of context: I am analyzing the effect of 3rd and 4th division soccer teams and stadiums on local income (county-level). As I said, there are 266 counties and 18 time periods.
          Similar research yielded R-squared values between approximately 0.3 and 0.65. Therefore, I think that the "xireg" model seems to be the most realistic although (or because?) it doesn't include a lagged dependent variable. The value of the R-squared of the two "xtpcse" estimations makes me believe that something went wrong (high R-squared and few significant coefficients). Concerning the xtscc estimator, the previously mentioned paper ("Robust standard errors for panel regressions with cross-sectional dependence", Hoechle, 2007) states that results should be considered with caution when N is large and T is short. However, "large" and "short" is not clearly specified.

          Some feedback on whether the "xireg" model is an appropriate choice and if there is a mistake in the "xtpcse" estimations would be highly appreciated. Again, if more details are needed to be able to give some feedback, please let me know.

          Thanks and kind regards,
          Alex

          Comment


          • #6
            Sorry, there is one more question on my mind. Assuming that I will use the "xi: reg D.(y depvars i.year), cluster(countyid)"-estimator, does not including a lagged dependent variable lead to omitted variable bias when analyzing the sports environment's impact on income?

            Comment


            • #7
              Alex,

              I don't see the ratio between your N and T as being a problem for Driscoll-Kraay standard errors regression. The Hoechle article uses data with an N of 1000 and a T of 40, which is a greater ratio than your N of 266 and T of 18.

              I recommend that xtreg is more useful as a choice for comparison than xi:reg. See why here:

              http://www.statalist.org/forums/foru...-data-analysis

              As to whether to include a lagged dependent variable, I suggest you read the following and look up relevant works listed in the bibliography, most especially Achen (2000) and Beck and Katz (2011):

              http://web.stanford.edu/~arjunw/LaggedDVs.pdf

              Using xi:reg or xtreg doesn't take care of potential heteroskedasticity. See this work by Richard Williams for guidance on ways to detect and correct for heteroskedasticity:

              https://www3.nd.edu/~rwilliam/stats2/l25.pdf

              Good luck.

              Best,

              Joe

              Comment


              • #8
                Thanks for the insights.
                I knew about most of the papers dealing with lagged dependent variables that you mentioned. Anyway, it was good to see that I had a look at the "right" ones.

                Xi:reg has been recommended to me earlier (see previous post). Since I would use it with first differences, autocorrelation and explicit fixed effects would be eliminated. By introducing the cluster option, any remaining serial correlation or heteroskedasticity would be accounted for. However, including a lagged dependent variable would bias the results.

                This leaves me with either xtscc or xtpcse. Yet, using xtscc with a lagged dependent variable generates unreasonably high R-squared values with few significant coefficients. Therefore, I would like to use one of the following commands since xtpcse assumes that the disturbances are, by default, heteroskedastic and contemporaneously correlated across panels:
                Code:
                xtpcse dY dLaggedY dDepVars
                xi: xtpcse dY dLaggedY dDepVars i.countyid
                xi: xtpcse dY dLaggedY dDepVars i.year
                xi: xtpcse dY dLaggedY dDepVars i.year i.countyid    (where small "d" stands for first-differenced)
                Question
                Do I need to include dummies for either county or year or both? From a theoretical point of view, I am not quite sure about the assumptions behind each of these commands. Thus, selecting one of them gets difficult.


                Kind regards,
                Alex

                Comment


                • #9
                  I'm not sure what your question is. Dummy variables for unit (counties) or time (year) are used to model for fixed effects of either of these. The Hausman test tells you whether you need to model for unit fixed effects. To determine whether you need to model for time fixed effects, run your regression with your year dummy variable included and then enter the following command:

                  Code:
                   testparm year_dummy_variable
                  If p<.10, you need to model for time fixed effects. Keep in mind that you might try modeling for some group of years rather than each and every year. It's quite common in the work I do to model for decade fixed effects, or some specific group of years that is known by researchers to be problematic (such as an economic crisis that lasted for a number of years).

                  If your question only had to do with whether you need to add dummy variables for county or year in the commands you listed in your latest message, my answer is that using a command with xi: and i.year and/or i.countyid accomplishes the same thing as adding dummy variables for either of these. Therefore, in such a case you would not add dummy variables.

                  Comment


                  • #10
                    I know that the command xi followed by i.year and/or i.countyid creates dummy variables. That's why I presented 4 different commands (3 of them including dummies for year, county, or both) and asked which one would be most suitable. Thanks to your recommendation I (re)ran the hausman and testparm statisitic which made me revise a bit of the theory and finally confirmed my assumptions.

                    The linear reduced form model I specified looks as follows:
                    Code:
                    yit= β1 Xit+ β2 Zit+ ϑi+ μt+ εit
                    where ϑi denotes a county i specific fixed effect and μt a time t specific fixed effect
                    When first-differenced and simplified:
                    Code:
                    yit - yit-1 = β1(Xit - Xit-1) + β2(Zit - Zit-1) + (μt - μt-1) + (εit - εit-1)
                    So, theoretically, the county specific fixed effects (ϑi) are eliminated through first-differencing and I do not need to account for them afterwards. This has been confirmed by the hausman test. The time specific fixed effects remain which has been confirmed by the testparm statistic. (If there is a flaw in my reasoning, please let me know.)

                    To be able to include a lagged dependent variable, the estimator I will ultimately use looks like this:
                    Code:
                    xi: xtpcse dY dLaggedY dDepVars i.year
                    Thank you very much for your help, Joseph!

                    Comment

                    Working...
                    X