Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with unbalanced Panel Data

    Hello everyone:

    I have a few questions concerning the choices I have to make with respect to unbalanced panel data. I know that similar questions have already been asked here, but I am still struggling with the choices I have to make. I am using StataMP 13 on a Macbook Pro.

    I am dealing with a large dataset of 181 cases and 44 time-periods (n=7486). When I run the xtset command I am told that the panel variable is unbalanced. I also know that the data is unbalanced because my independent variables have randomly missing data. I am now faced with a number of options from which I don't know how to select.

    1. I have read that the use of panel corrected standard errors is suggested for panel data because such standard errors are more reliable (Beck & Katz 1995)*. The issue here, however, is that when I run my model through the xtpcse command I get the following error: "Number of gaps in sample: 70. No time periods are common to all panels, cannot estimate disturbance covariance matrix using casewise inclusion." I know what this means, but I don't know what to do about it. I have tried using the pairwise command which allows me to run the model successfully, but I don't know what types of calculation problems this may be causing. I have also repeated the pairwise approach by removing all cases with less than 5 observations, but I am still not sure as to what the problems may be with this approach. If the pairwise approach is acceptable, then what is the minimum number of observations necessary, and do these observations need to be continuous, e.g. 2001, 2002, 2003, 2004 as opposed to 2000, 2005, 2007, 2010?
    *Beck, N., & Katz, J. N. (1995). What to do (and not to do) with time-series cross-section data. American Political Science Review, 89(3), 634-647.

    2. The second option that I have followed is through the use of the xtreg command. I am familiar with xtreg and the choice between fixed-effect and random-effect models, but I am not sure if the unbalanced dataset is causing problems here as well. My question here is, which approach is better: xtpcse or xtreg, and why?

    I am pasting the results in the code box below. I am only running simplified models here so to minimize confusion:
    • My dependent variables is level of democracy: v2x_libdem
    • My independent variables are fragmentations in government and opposition: govfrac oppfrac
    • The other two variables in the dataset are Country_id and year.
    • I am also providing an example of the dataset generated by -dataex-.
    Any assistance would be greatly appreciated!

    Code:
    . xtset  country_id year
           panel variable:  country_id (unbalanced)
            time variable:  year, 1975 to 2018
                    delta:  1 unit
    
    
    . xtdescribe
    
    country_id:  3, 4, ..., 236                                  n =        181
        year:  1975, 1976, ..., 2018                             T =         44
               Delta(year) = 1 unit
               Span(year)  = 44 periods
               (country_id*year uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             8      28      44        44        44      44      44
    
         Freq.  Percent    Cum. |  Pattern
     ---------------------------+----------------------------------------------
          155     85.64   85.64 |  11111111111111111111111111111111111111111111
           14      7.73   93.37 |  ...............11111111111111111111111111111
            3      1.66   95.03 |  ................1111111111111111111111111111
            2      1.10   96.13 |  1111111111111111............................
            1      0.55   96.69 |  ....................................11111111
            1      0.55   97.24 |  ................................111111111111
            1      0.55   97.79 |  ........................11111111111111111111
            1      0.55   98.34 |  .......................111111111111111111111
            1      0.55   98.90 |  ..................11111111111111111111111111
            2      1.10  100.00 | (other patterns)
     ---------------------------+----------------------------------------------
          181    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(country_id year oppfrac govfrac v2x_libdem)
    3 1975   .   . .109
    3 1976   .   .  .11
    3 1977 .64   0 .118
    3 1978 .64   0 .123
    3 1979 .64   0 .125
    3 1980 .76   0 .127
    3 1981 .76   0 .133
    3 1982 .76   0 .145
    3 1983 .65   0 .147
    3 1984 .65   0 .147
    3 1985 .65   0 .148
    3 1986  .8   0 .152
    3 1987 .81   0 .152
    3 1988 .81   0 .176
    3 1989 .49   0 .186
    3 1990 .49   0 .198
    3 1991 .49   0 .217
    3 1992 .49   0 .231
    3 1993 .49   0 .248
    3 1994 .49   0 .273
    3 1995 .52   0 .293
    3 1996 .52   0 .305
    3 1997 .52   0  .36
    3 1998 .55   0 .394
    3 1999 .55   0  .41
    3 2000 .55   0 .461
    3 2001  .4 .13 .495
    3 2002  .4 .13 .513
    3 2003  .4 .13 .531
    3 2004 .46 .18 .546
    3 2005 .46 .18 .546
    3 2006 .46 .18 .517
    3 2007 .68   0 .485
    3 2008 .68   0 .505
    3 2009 .68   0 .496
    3 2010 .52   0 .489
    3 2011 .52   0 .489
    3 2012 .52   0 .453
    3 2013 .59 .21 .458
    3 2014 .59 .21 .429
    3 2015 .59 .21 .465
    3 2016 .73 .31 .487
    3 2017 .73 .31 .491
    3 2018   .   . .527
    4 1975   .   . .566
    4 1976  .3   0 .671
    4 1977  .3   0 .657
    4 1978   0   0 .686
    4 1979   0   0 .686
    4 1980   0   0  .17
    4 1981   .   . .151
    4 1982   .   . .153
    4 1983   .   .  .15
    4 1984   .   .  .15
    4 1985   .   . .201
    4 1986   .   . .201
    4 1987   .   . .213
    4 1988 .57 .72 .516
    4 1989 .57 .72 .564
    4 1990 .57 .72 .199
    4 1991 .57 .72 .472
    4 1992 .51 .71 .655
    4 1993 .51 .71 .669
    4 1994 .51 .71 .681
    4 1995 .51 .71 .681
    4 1996 .51 .71 .695
    4 1997   . .66 .698
    4 1998   . .66 .698
    4 1999   . .66 .698
    4 2000   . .66 .686
    4 2001 .67   0 .691
    4 2002 .67   0 .686
    4 2003 .67   0 .698
    4 2004 .67   0 .689
    4 2005 .67   0 .674
    4 2006 .53  .3 .696
    4 2007 .53  .3 .667
    4 2008 .53  .3 .688
    4 2009 .53  .3 .688
    4 2010 .53  .3 .671
    4 2011 .13 .54 .699
    4 2012 .13 .54 .694
    4 2013 .13 .54  .67
    4 2014 .13 .54  .67
    4 2015 .13 .54 .669
    4 2016   0 .37 .645
    4 2017   0 .37 .633
    4 2018   .   . .635
    5 1975 .62 .19 .865
    5 1976 .62 .19 .867
    5 1977 .18 .63 .868
    5 1978 .18 .63 .868
    5 1979 .65   0 .864
    5 1980  .2 .65 .858
    5 1981  .2 .65 .858
    5 1982 .52 .47 .861
    5 1983 .66   0 .868
    5 1984 .66   0 .868
    5 1985 .66   0 .865
    5 1986 .65 .19 .867
    end
    
    . xtpcse  v2x_libdem oppfrac govfrac
    
    Number of gaps in sample:  70
    no time periods are common to all panels, cannot estimate disturbance
    covariance matrix using casewise inclusion
    r(459);
    
    
    . xtpcse  v2x_libdem oppfrac govfrac, pairwise
    
    Number of gaps in sample:  70
    (note: at least one disturbance covariance assumed 0, no common time periods
           between panels)
    
    Linear regression, correlated panels corrected standard errors (PCSEs)
    
    Group variable:   country_id                    Number of obs      =      4475
    Time variable:    year                          Number of groups   =       154
    Panels:           correlated (unbalanced)       Obs per group: min =         4
    Autocorrelation:  no autocorrelation                           avg =  29.05844
    Sigma computed by pairwise selection                           max =        43
    Estimated covariances      =     11935          R-squared          =    0.0607
    Estimated autocorrelations =         0          Wald chi2(2)       =    175.55
    Estimated coefficients     =         3          Prob > chi2        =    0.0000
    
    ------------------------------------------------------------------------------
                 |           Panel-corrected
      v2x_libdem |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         oppfrac |  -.0563738   .0149227    -3.78   0.000    -.0856218   -.0271259
         govfrac |   .2400054   .0193359    12.41   0.000     .2021078    .2779031
           _cons |   .4531467   .0091391    49.58   0.000     .4352344    .4710591
    ------------------------------------------------------------------------------
    
    
    . xtreg  v2x_libdem oppfrac govfrac
    
    Random-effects GLS regression                   Number of obs      =      4475
    Group variable: country_id                      Number of groups   =       154
    
    R-sq:  within  = 0.0284                         Obs per group: min =         4
           between = 0.0516                                        avg =      29.1
           overall = 0.0403                                        max =        43
    
                                                    Wald chi2(2)       =    130.66
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
    
    ------------------------------------------------------------------------------
      v2x_libdem |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         oppfrac |   .0309818   .0064597     4.80   0.000      .018321    .0436427
         govfrac |    .075405    .007288    10.35   0.000     .0611207    .0896892
           _cons |   .3987555   .0195239    20.42   0.000     .3604893    .4370216
    -------------+----------------------------------------------------------------
         sigma_u |  .23772996
         sigma_e |  .09036332
             rho |  .87375703   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------

  • #2
    Eltion:
    I can answer to your second question, first: you're seemingly dealing a T>N panel dataset: hence, -xtreg- is niot the way to go.
    -xtpcse- implies a balacend panel dataset; you may want to check whether -xtgls- and -xtregar- can help you out.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Eltion:
      in addition, please note that -xtgls- and -xtregar- differs as far as heteroskedastcity and autocorrelation can be dealt with.
      Last edited by Carlo Lazzaro; 27 May 2020, 07:12.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Sorry for the previous multiple replies.
        Stata website was down for some minutes.
        Last edited by Carlo Lazzaro; 27 May 2020, 07:11.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo!
          Thank you for your input. Excuse my ignorance, but I haven't come across any arguments that xtreg is not ideal for dealing with a T>N panel dataset. Any particular reason for this?

          Comment


          • #6
            Eltion:
            basically, it's a matter of modelling serial correlation, that -xtreg-, being conceived for short (ie, N>T) panel datasets, does not allow.
            Se paragraph 8.10 (and related subparagraphs) in the valuable https://www.stata.com/bookstore/micr...metrics-stata/.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Eltion:
              for the sake of precision, I should have added in my previous reply that -xtreg,pa- allows autocorrelation modelling.
              However, the fact that -xtreg- has been conceived for short (ie, N>T) panel datasets still holds.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X