What is the most suitable analysis for an unbalanced panel data?

Stefan Bradeanu

Join Date: Jun 2020
Posts: 11

What is the most suitable analysis for an unbalanced panel data?

25 Jun 2020, 13:02

Hello,

I have an unbalanced panel dataset based on surveys at the subregional level taken from 8 countries at different points in time. Therefore the individual observations are nested within regions, and regions within the 8 countries listed below (with the survey years mentioned inside the brackets):
India (1999, 2006, 2012, 2016),
Bangladesh (2004, 2011, 2014),
Pakistan (2007, 2012, 2018),
Nepal (1996, 2006, 2011, 2016),
Cambodia (2000, 2005, 2010, 2014),
Vietnam (1997, 2002, 2006, 2010, 2014),
Indonesia (1997, 2003, 2007, 2012),
Philippines (1998, 2003, 2008, 2013, 2017).
The dependent variable is a wealth index (g_iwi - growth in wealth index).
The main independent variable is dependency ratio (ln_dep - initial values of dep and g_dep - growth in dep) & ln_iwi (initial value of the wealth index)
Other independent variables are education (ln_educ), urbanization (ln_urban), population (ln_pop), share of population (share_pop), etc (at the regional level) and ethnic fractionalization (ln_ethnic) (at the country level). & dummy variable for each country & ln_ethnic_m (dummy to account for some missing values)
Therefore I ran several regressions in Stata but every time I get insignificant results or contradictory coefficients to the underlying theory (derived from the growth literature). My main concern is how can I deal with the different years for each country's , since I want to include all observations in a single model and what estimator is most suitable?

I use Stata 16.0. Below you have a description of the data and a simple OLS regression.

Code:

.
xtset region_c year
       panel variable:  region_c (unbalanced)
        time variable:  year, 1996 to 2018, but with gaps
                delta:  1 unit

. xtdescribe

region_c:  1, 2, ..., 143                                    n =        142
    year:  1996, 1997, ..., 2018                             T =         20
           Delta(year) = 1 unit
           Span(year)  = 23 periods
           (region_c*year uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       2       3         4         4       5       5

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+-------------------------
       26     18.31   18.31 |  ...1......1.....1...1..
       23     16.20   34.51 |  ........1......1..1....
       23     16.20   50.70 |  .1.....1...1....1......
       17     11.97   62.68 |  ....1....1....1...1....
       16     11.27   73.94 |  ..1....1....1....1...1.
        6      4.23   78.17 |  ................1...1..
        6      4.23   82.39 |  .1....1...1...1...1....
        5      3.52   85.92 |  ...........1....1.....1
        5      3.52   89.44 |  1.........1....1....1..
       15     10.56  100.00 | (other patterns)
 ---------------------------+-------------------------
      142    100.00         |  XXXXX.XXXXXXX.XXXXX.XXX

. reg g_iwi ln_iwi g_dep ln_dep ln_educ ln_urban ln_pop share_pop c.ln_ethnic##c.ln_dep ln_ethnic_m Bangladesh India Pakistan Nepal Indonesia Vietnam Philippines Cambodia
note: ln_dep omitted because of collinearity
note: Cambodia omitted because of collinearity

      Source |       SS           df       MS                                           Number of obs   =       379
-------------+----------------------------------                                          F(17, 361)      =     23.69
       Model |  5.41297573        17  .318410337                            Prob > F        =    0.0000
    Residual |  4.85207147       361  .013440641                          R-squared       =    0.5273
-------------+----------------------------------                                           Adj R-squared   =    0.5051
       Total |  10.2650472       378   .02715621                               Root MSE        =    .11593

--------------------------------------------------------------------------------------
               g_iwi |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
              ln_iwi |   .1155921   .0485601     2.38   0.018      .020096    .2110883
               g_dep |  -.4594956   .0515608    -8.91   0.000    -.5608928   -.3580985
              ln_dep |   .2367487   .0615646     3.85   0.000     .1156784     .357819
             ln_educ |  -.0506995   .0558058    -0.91   0.364    -.1604448    .0590458
            ln_urban |  -.0303772   .0120283    -2.53   0.012    -.0540315   -.0067229
              ln_pop |   .0015716   .0081499     0.19   0.847    -.0144556    .0175988
           share_pop |    .000078   .0015892     0.05   0.961    -.0030472    .0032032
           ln_ethnic |   .0931254   .1855702     0.50   0.616    -.2718091    .4580599
              ln_dep |          0  (omitted)
                     |
c.ln_ethnic#c.ln_dep |  -.0379962   .0437562    -0.87   0.386    -.1240452    .0480528
                     |
         ln_ethnic_m |   .0170842   .0282287     0.61   0.545    -.0384292    .0725976
          Bangladesh |  -.0248929   .0342641    -0.73   0.468    -.0922752    .0424893
               India |   .1015927   .0420808     2.41   0.016     .0188385    .1843469
            Pakistan |  -.1086989   .0485216    -2.24   0.026    -.2041195   -.0132783
               Nepal |   .2189254   .0416671     5.25   0.000     .1369846    .3008661
           Indonesia |  -.0080362   .0459966    -0.17   0.861    -.0984911    .0824188
             Vietnam |    .053028   .0418545     1.27   0.206    -.0292814    .1353373
         Philippines |  -.0750558   .0487437    -1.54   0.124    -.1709131    .0208014
            Cambodia |          0  (omitted)
               _cons |  -1.140357   .3646102    -3.13   0.002    -1.857383   -.4233299
--------------------------------------------------------------------------------------

I am a novice in Stata and statistics so I would very much appreciate if anyone can guide me.

Best wishes,

Last edited by Stefan Bradeanu; 25 Jun 2020, 13:06.

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

26 Jun 2020, 15:23

Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions – provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

You have sort of panel data although the times vary. So I would be strongly tempted to xtset the data by individual or region and use panel data estimators. Unbalanced panels are not generally a problem although they can reflect sample selection issues.

The documentation in the PDF documentation on time series cross-sectional analysis and in particular xtreg is very good. I suggest you look at it.

That said, the dummies you include result in you estimating a model that is pretty much equivalent to the panel estimation. You seem to have strong results on the first three variables and the fourth variable.
1 like
Comment
Stefan Bradeanu

Join Date: Jun 2020

Posts: 11
#3

26 Jun 2020, 15:41

Dear Phil,

Thank you very much for your suggestion. I will definitely have a look on the PDF documentation.

Best regards,
Comment

Announcement

What is the most suitable analysis for an unbalanced panel data?

Comment

Comment