Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the most suitable analysis for an unbalanced panel data?

    Hello,

    I have an unbalanced panel dataset based on surveys at the subregional level taken from 8 countries at different points in time. Therefore the individual observations are nested within regions, and regions within the 8 countries listed below (with the survey years mentioned inside the brackets):
    India (1999, 2006, 2012, 2016),
    Bangladesh (2004, 2011, 2014),
    Pakistan (2007, 2012, 2018),
    Nepal (1996, 2006, 2011, 2016),
    Cambodia (2000, 2005, 2010, 2014),
    Vietnam (1997, 2002, 2006, 2010, 2014),
    Indonesia (1997, 2003, 2007, 2012),
    Philippines (1998, 2003, 2008, 2013, 2017).
    The dependent variable is a wealth index (g_iwi - growth in wealth index).
    The main independent variable is dependency ratio (ln_dep - initial values of dep and g_dep - growth in dep) & ln_iwi (initial value of the wealth index)
    Other independent variables are education (ln_educ), urbanization (ln_urban), population (ln_pop), share of population (share_pop), etc (at the regional level) and ethnic fractionalization (ln_ethnic) (at the country level). & dummy variable for each country & ln_ethnic_m (dummy to account for some missing values)
    Therefore I ran several regressions in Stata but every time I get insignificant results or contradictory coefficients to the underlying theory (derived from the growth literature). My main concern is how can I deal with the different years for each country's , since I want to include all observations in a single model and what estimator is most suitable?

    I use Stata 16.0. Below you have a description of the data and a simple OLS regression.

    Code:
    .
    xtset region_c year
           panel variable:  region_c (unbalanced)
            time variable:  year, 1996 to 2018, but with gaps
                    delta:  1 unit
    
    . xtdescribe
    
    region_c:  1, 2, ..., 143                                    n =        142
        year:  1996, 1997, ..., 2018                             T =         20
               Delta(year) = 1 unit
               Span(year)  = 23 periods
               (region_c*year uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       2       3         4         4       5       5
    
         Freq.  Percent    Cum. |  Pattern
     ---------------------------+-------------------------
           26     18.31   18.31 |  ...1......1.....1...1..
           23     16.20   34.51 |  ........1......1..1....
           23     16.20   50.70 |  .1.....1...1....1......
           17     11.97   62.68 |  ....1....1....1...1....
           16     11.27   73.94 |  ..1....1....1....1...1.
            6      4.23   78.17 |  ................1...1..
            6      4.23   82.39 |  .1....1...1...1...1....
            5      3.52   85.92 |  ...........1....1.....1
            5      3.52   89.44 |  1.........1....1....1..
           15     10.56  100.00 | (other patterns)
     ---------------------------+-------------------------
          142    100.00         |  XXXXX.XXXXXXX.XXXXX.XXX
    
    . reg g_iwi ln_iwi g_dep ln_dep ln_educ ln_urban ln_pop share_pop c.ln_ethnic##c.ln_dep ln_ethnic_m Bangladesh India Pakistan Nepal Indonesia Vietnam Philippines Cambodia
    note: ln_dep omitted because of collinearity
    note: Cambodia omitted because of collinearity
    
          Source |       SS           df       MS                                           Number of obs   =       379
    -------------+----------------------------------                                          F(17, 361)      =     23.69
           Model |  5.41297573        17  .318410337                            Prob > F        =    0.0000
        Residual |  4.85207147       361  .013440641                          R-squared       =    0.5273
    -------------+----------------------------------                                           Adj R-squared   =    0.5051
           Total |  10.2650472       378   .02715621                               Root MSE        =    .11593
    
    --------------------------------------------------------------------------------------
                   g_iwi |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
                  ln_iwi |   .1155921   .0485601     2.38   0.018      .020096    .2110883
                   g_dep |  -.4594956   .0515608    -8.91   0.000    -.5608928   -.3580985
                  ln_dep |   .2367487   .0615646     3.85   0.000     .1156784     .357819
                 ln_educ |  -.0506995   .0558058    -0.91   0.364    -.1604448    .0590458
                ln_urban |  -.0303772   .0120283    -2.53   0.012    -.0540315   -.0067229
                  ln_pop |   .0015716   .0081499     0.19   0.847    -.0144556    .0175988
               share_pop |    .000078   .0015892     0.05   0.961    -.0030472    .0032032
               ln_ethnic |   .0931254   .1855702     0.50   0.616    -.2718091    .4580599
                  ln_dep |          0  (omitted)
                         |
    c.ln_ethnic#c.ln_dep |  -.0379962   .0437562    -0.87   0.386    -.1240452    .0480528
                         |
             ln_ethnic_m |   .0170842   .0282287     0.61   0.545    -.0384292    .0725976
              Bangladesh |  -.0248929   .0342641    -0.73   0.468    -.0922752    .0424893
                   India |   .1015927   .0420808     2.41   0.016     .0188385    .1843469
                Pakistan |  -.1086989   .0485216    -2.24   0.026    -.2041195   -.0132783
                   Nepal |   .2189254   .0416671     5.25   0.000     .1369846    .3008661
               Indonesia |  -.0080362   .0459966    -0.17   0.861    -.0984911    .0824188
                 Vietnam |    .053028   .0418545     1.27   0.206    -.0292814    .1353373
             Philippines |  -.0750558   .0487437    -1.54   0.124    -.1709131    .0208014
                Cambodia |          0  (omitted)
                   _cons |  -1.140357   .3646102    -3.13   0.002    -1.857383   -.4233299
    --------------------------------------------------------------------------------------
    I am a novice in Stata and statistics so I would very much appreciate if anyone can guide me.

    Best wishes,
    Last edited by Stefan Bradeanu; 25 Jun 2020, 13:06.

  • #2
    Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions – provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    You have sort of panel data although the times vary. So I would be strongly tempted to xtset the data by individual or region and use panel data estimators. Unbalanced panels are not generally a problem although they can reflect sample selection issues.

    The documentation in the PDF documentation on time series cross-sectional analysis and in particular xtreg is very good. I suggest you look at it.

    That said, the dummies you include result in you estimating a model that is pretty much equivalent to the panel estimation. You seem to have strong results on the first three variables and the fourth variable.

    Comment


    • #3
      Dear Phil,

      Thank you very much for your suggestion. I will definitely have a look on the PDF documentation.

      Best regards,

      Comment

      Working...
      X