Panel data analysis with AR(1)

Sara Zakaryan

Join Date: Mar 2016
Posts: 30

Panel data analysis with AR(1)

07 Dec 2018, 07:35

Hello everyone!

I have a panel data for labor productivity for 74 countries for period of 1992-2016.I am trying to understand the convergence rate based on romers model of convergence.
I attach also the data

I am using xtregar for my model and getting such results.

Code:

. xtregar ln_diff ln_prod_lagged t18-t19 L1.ln_diff L2.ln_diff, fe rhotype(tscorr) twostep

FE (within) regression with AR(1) disturbances  Number of obs     =      1,628
Group variable: id                              Number of groups  =         74

R-sq:                                           Obs per group:
     within  = 0.0917                                         min =         22
     between = 0.0056                                         avg =       22.0
     overall = 0.0043                                         max =         22

                                                F(5,1549)         =      31.26
corr(u_i, Xb)  = -0.9112                        Prob > F          =     0.0000

--------------------------------------------------------------------------------
       ln_diff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
ln_prod_lagged |  -.0742815   .0073197   -10.15   0.000     -.088639   -.0599239
           t18 |   -.038016   .0108618    -3.50   0.000    -.0593213   -.0167108
           t19 |   .0602955   .0109165     5.52   0.000     .0388827    .0817082
               |
       ln_diff |
           L1. |   .0431449   .0222561     1.94   0.053    -.0005104    .0868002
           L2. |  -.0367265   .0210011    -1.75   0.081    -.0779201    .0044671
               |
         _cons |   .7583084   .0742463    10.21   0.000     .6126746    .9039422
---------------+----------------------------------------------------------------
        rho_ar | -.03962128
       sigma_u |  .09476397
       sigma_e |  .09042197
       rho_fov |  .52343382   (fraction of variance because of u_i)
--------------------------------------------------------------------------------
F test that all u_i=0: F(73,1549) = 3.58                     Prob > F = 0.0000

I have several questions regarding the results I am getting, and will be grateful if someone could answer.
1. I am getting 0.52 for rho_fov and with no any linear model i cannot increase it, does it mean this models are not good to use in case of my data?

2. My second question might sound weird but is it possible after estimation of coefficients somehow go deeper and inderstand what the coefficient/rate would be for specific country, for example Armenia as each country has different moving path and the speed can be different.

3.Are GMM models more appropriate for such analysis? I do not understand very well dynamic panel models and what is going on in background, so for now trying to stay away from them.

Thank you very much in advace!
Best,
Sara Zakaryan

Attached Files

data.xlsx (240.1 KB, 1 view)

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

08 Dec 2018, 04:09

Sara:
1) as per FAQ, please do not attach files in non-Stata format. As far as spreadsheets are concerned, please note that there's a pretty widespread (and wise) attitude not to download them, due to the risk of malwares (no blame to the original poster, as she/he is usually unaware of them);
2) the F-test appearing at the foot of the outcome table tells that you're on the right track;
3) you can inlcude an indicator variable (-i.country-) amonbg your predictors;
4) dynamic panel data models are really difficult to manage, unless you have a strong background on econometrics. The guru on this list for this kind of stuff is Sebastian Kripfganz (http://www.kripfganz.de/index.html). Let's hope he'll chime in.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Sara Zakaryan

Join Date: Mar 2016

Posts: 30
#3

08 Dec 2018, 07:02

Dear Carlo,

Thank you very much for your reply.
1) Yes, i completely understand, somehow didnt think of it, sorry.

2) There is no way to assume what will be the learnt parameter for individual countries right? Panel regression is in the end ignoring the ids and giving me only one coefficient, so the interpretation is in average all those countires are getting close to each other with 7.4% yearly? but what is the speed for seperate countries no way to check right?

Best,
Sara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#4

08 Dec 2018, 09:14

Sara:
I fail to get your 2). if you include a categorical variable among your predictors, you should have N-1 coefficients.
In all likelihood, things would be clearer if you posted what you typed and what Stata gave you back (as you correctly did in your original post).

Kind regards,
Carlo
(Stata 19.0)
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#5

08 Dec 2018, 10:46

It is hard to provide meaningful help without knowing what your variables are. Is ln_diff the first difference of ln_prod?

It looks like you are already estimating a dynamic panel model, so I do not understand your comment 3. When you estimate such a dynamic model with lags of the dependent variable, xtregar is not an appropriate command as it does not deal with the dynamic panel bias due to these lags. The AR component just adds autoregressive dynamics for the error term. It is not clear why you would want to do this.

With the dynamic model as specified, you can hardly avoid the usual GMM estimators. See for example:
XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Before seeking specific help in that direction, I highly recommend that yoi make yourself familiar with the respective econometric literature.

Finally, regarding your second question, you are estimating a model with homogenous slope coefficients. By assumption, all countries have the same slope. To estimate a model with heterogeneous slopes, you might want to have a look at the so-called (pooled) mean-group estimator. That said, your time dimension might be a bit short for that purpose.

https://www.kripfganz.de/stata/
1 like
Comment

Sara Zakaryan

Join Date: Mar 2016
Posts: 30

08 Dec 2018, 12:43

Dear Sebastian and Carlo,

Thanks for your reply! I actually was trying to use system gmm as you mentioned. I went through the documentations and all assumptions were true for my data.
I am trying to calculate convergence rate of Labor productivity. So ln_diff is the log difference of labor productivity = growth rate, and ln_prod_lagged is log of productivity in t-1 time.

I started to hesitate about systemgmm because it is very sensitive how many lags I include.

Code:

. xtdpdsys ln_diff ln_prod_lagged t19-t20, lags(3) vce(robust) artests(2)

System dynamic panel-data estimation            Number of obs     =      1,254
Group variable: id                              Number of groups  =         57
Time variable: year
                                                Obs per group:
                                                              min =         22
                                                              avg =         22
                                                              max =         22

Number of instruments =    299                  Wald chi2(6)      =     144.19
                                                Prob > chi2       =     0.0000
One-step results
--------------------------------------------------------------------------------
               |               Robust
       ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
       ln_diff |
           L1. |  -.1116288   .0393783    -2.83   0.005    -.1888088   -.0344487
           L2. |  -.0797623   .0425259    -1.88   0.061    -.1631115    .0035869
           L3. |  -.0296277   .0302774    -0.98   0.328    -.0889704    .0297149
               |
ln_prod_lagged |  -.1075446   .0137441    -7.82   0.000    -.1344826   -.0806066
           t19 |   .0777054   .0099087     7.84   0.000     .0582846    .0971262
           t20 |    .034093   .0139105     2.45   0.014     .0068289    .0613572
         _cons |   1.133677   .1382678     8.20   0.000     .8626767    1.404677
--------------------------------------------------------------------------------
Instruments for differenced equation
        GMM-type: L(2/.).ln_diff
        Standard: D.ln_prod_lagged D.t19 D.t20
Instruments for level equation
        GMM-type: LD.ln_diff
        Standard: _cons

and when I reduce number of lags

Code:

. xtdpdsys ln_diff ln_prod_lagged t19-t20, lags(2) vce(robust) artests(2)

System dynamic panel-data estimation            Number of obs     =      1,311
Group variable: id                              Number of groups  =         57
Time variable: year
                                                Obs per group:
                                                              min =         23
                                                              avg =         23
                                                              max =         23

Number of instruments =    302                  Wald chi2(5)      =     167.05
                                                Prob > chi2       =     0.0000
One-step results
--------------------------------------------------------------------------------
               |               Robust
       ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
       ln_diff |
           L1. |  -.0554033   .0306257    -1.81   0.070    -.1154286     .004622
           L2. |   .0357003   .1092617     0.33   0.744    -.1784487    .2498493
               |
ln_prod_lagged |  -.1156499   .0187584    -6.17   0.000    -.1524157   -.0788841
           t19 |   .0870614   .0103397     8.42   0.000      .066796    .1073268
           t20 |   .0468474   .0133936     3.50   0.000     .0205964    .0730984
         _cons |   1.204001   .1903986     6.32   0.000     .8308264    1.577175
--------------------------------------------------------------------------------
Instruments for differenced equation
        GMM-type: L(2/.).ln_diff
        Standard: D.ln_prod_lagged D.t19 D.t20
Instruments for level equation
        GMM-type: LD.ln_diff
        Standard: _cons

Code:

 xtdpdsys ln_diff ln_prod_lagged t19-t20, lags(1) vce(robust) artests(2)

System dynamic panel-data estimation            Number of obs     =      1,368
Group variable: id                              Number of groups  =         57
Time variable: year
                                                Obs per group:
                                                              min =         24
                                                              avg =         24
                                                              max =         24

Number of instruments =    303                  Wald chi2(4)      =      96.61
                                                Prob > chi2       =     0.0000
One-step results
--------------------------------------------------------------------------------
               |               Robust
       ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
       ln_diff |
           L1. |   .0294566   .0839557     0.35   0.726    -.1350934    .1940067
               |
ln_prod_lagged |  -.1155769   .0266246    -4.34   0.000    -.1677602   -.0633935
           t19 |   .0959021   .0132818     7.22   0.000     .0698702     .121934
           t20 |   .0388161   .0165714     2.34   0.019     .0063367    .0712956
         _cons |   1.197448   .2674311     4.48   0.000     .6732929    1.721604
--------------------------------------------------------------------------------
Instruments for differenced equation
        GMM-type: L(2/.).ln_diff
        Standard: D.ln_prod_lagged D.t19 D.t20
Instruments for level equation
        GMM-type: LD.ln_diff
        Standard: _cons

I am quite worried about the large confidence interval as well.

In my sample I have UPPER-MIDDLE-INCOME ECONOMIES and HIGH-INCOME ECONOMIES excluding oil producers and islands.

I got that with xtregar i should not include lags of y as it will be correlated with errors. so removing it i get such resuts

Code:

. xtregar ln_diff ln_prod_lagged t18-t19, fe rhotype(tscorr) twostep

FE (within) regression with AR(1) disturbances  Number of obs     =      1,368
Group variable: id                              Number of groups  =         57

R-sq:                                           Obs per group:
     within  = 0.0905                                         min =         24
     between = 0.0000                                         avg =       24.0
     overall = 0.0104                                         max =         24

                                                F(3,1308)         =      43.41
corr(u_i, Xb)  = -0.8897                        Prob > F          =     0.0000

--------------------------------------------------------------------------------
       ln_diff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
ln_prod_lagged |  -.0766854   .0084694    -9.05   0.000    -.0933005   -.0600702
           t18 |  -.0454302   .0132051    -3.44   0.001    -.0713356   -.0195247
           t19 |    .070636   .0131691     5.36   0.000     .0448012    .0964708
         _cons |   .8101878   .0762135    10.63   0.000     .6606737    .9597019
---------------+----------------------------------------------------------------
        rho_ar |  .11008698
       sigma_u |  .08237438
       sigma_e |  .09676155
       rho_fov |  .42020046   (fraction of variance because of u_i)
--------------------------------------------------------------------------------
F test that all u_i=0: F(56,1308) = 2.45                     Prob > F = 0.0000

Do you have any assumptions what can cause the gmm model to be so non stable?

Thanks a lot in advance!
Best,
Sara

Comment

Sara Zakaryan

Join Date: Mar 2016
Posts: 30

08 Dec 2018, 13:42

I am also posting the results from xtdpdgmm ... I think this is a good result, is it?

Code:

. xtdpdgmm L(0/1).ln_diff ln_prod_lagged t18-t19, gmmiv(L(0/3).ln_diff ln_prod_lagged, l(1 4) c m(d)) noserial twostep vce(robust)

Generalized method of moments estimation

Step 1
initial:       f(p) =   .2040731
alternative:   f(p) =  5.3804756
rescale:       f(p) =  .17549462
Iteration 0:   f(p) =  .17549462  
Iteration 1:   f(p) =  .01882145  
Iteration 2:   f(p) =   .0098848  
Iteration 3:   f(p) =  .00987584  
Iteration 4:   f(p) =  .00987584  

Step 2
Iteration 0:   f(p) =  .77182291  
Iteration 1:   f(p) =   .5845599  
Iteration 2:   f(p) =  .46495408  
Iteration 3:   f(p) =  .46233416  
Iteration 4:   f(p) =  .46231849  
Iteration 5:   f(p) =  .46231783  
Iteration 6:   f(p) =  .46231726  
Iteration 7:   f(p) =  .46231694  
Iteration 8:   f(p) =  .46231669  
Iteration 9:   f(p) =  .46231654  
Iteration 10:  f(p) =  .46231642  

Group variable: id                           Number of obs         =      1368
Time variable: year                          Number of groups      =        57

Moment conditions:     linear =      12      Obs per group:    min =        24
                    nonlinear =      22                        avg =        24
                        total =      34                        max =        24

                                      (Std. Err. adjusted for 57 clusters in id)
--------------------------------------------------------------------------------
               |              WC-Robust
       ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
       ln_diff |
           L1. |   .3107452   .1307188     2.38   0.017     .0545411    .5669494
               |
ln_prod_lagged |  -.0385669   .0136412    -2.83   0.005    -.0653032   -.0118306
           t18 |   -.937218   .3313464    -2.83   0.005    -1.586645   -.2877909
           t19 |   .9299993   .3407103     2.73   0.006     .2622194    1.597779
         _cons |   .4171995   .1425264     2.93   0.003     .1378529     .696546
--------------------------------------------------------------------------------

. estat overid

Hansen's J-test                                        chi2(29)    =   26.3520
H0: overidentifying restrictions are valid             Prob > chi2 =    0.6066

Your comments will be very very helpful, thank you all very much in advance!

Sara

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#8

09 Dec 2018, 04:38

I would strongly advise against using xtdpdsys. As you can see, it creates a huge number of instruments which exposes your estimates to a "too-many-instruments problem". While it is possible to restrict the lag depth with xtdpdsys, it does not have a collapse option. With other commands, xtdpdgmm for example, you can do everything that you could do with xtdpdsys plus additional things such as collapsing.

Even after excluding the lags of ln_diff, your model estimated with xtregar is still dynamic. ln_prod_lagged still creates a dynamic panel bias in the same way. In that regard, it is unusual to specify the dynamic regression model with the first difference as the dependent variable and lagged first differences as independent variables. When using the system GMM estimator, this implies that your are specifying instruments for the first differences of these first differences, which are second differences. This could lead to weak instruments problems.

Your results from xtdpdgmm illustrate nicely that we should not blindly believe in the result of the overidentification test. Here, it indeed suggests that the model might be correctly specified while it is definitely not. You are using the first lag of the dependent variable as an instrument for the first-differenced model. By construction of the model, this first lag is correlated with the first-differenced error term. You need to write gmmiv(L(1/3).ln_diff ln_prod_lagged, l(1 4) c m(d)) instead, or just gmmiv(L.ln_diff ln_prod_lagged, l(1 6) c m(d)).

Also, to avoid confusion, it is usually better not to create differences and lags as separate variables but to use time-series operators instead, i.e. D.ln_prod instead of ln_diff and L.ln_prod instead of ln_prod_lagged. That way, you immediately notice the dynamic relationship by including L.ln_prod in the set of explanatory variables.

https://www.kripfganz.de/stata/
Comment
Sara Zakaryan

Join Date: Mar 2016

Posts: 30
#9

09 Dec 2018, 08:29

Dear Sebastian,

Thank you very much for your support! I appreciate it a lot.

Yes I agree with time-series operators, but it somehow right now gives me even more confusion. But I will try to overcome it and write correctly.
I have one more questiion please, with your suggestions the model is becoming like this.

Code:

xtdpdgmm L(0/1).ln_diff ln_prod_lagged t18-t19, gmmiv(L.ln_diff ln_prod_lagged, l(1 6) c m(d)) noserial twostep vce(robust)

I will run such models for about 10 industry data seperately, so for some datasets the following code works better

Code:

xtdpdgmm L(0/2).ln_diff ln_prod_lagged t18-t19, gmmiv(L.ln_diff ln_prod_lagged, l(1 6) c m(d)) noserial twostep vce(robust)

will this be correct as well?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#10

09 Dec 2018, 09:50

Adding more lags of the dependent variable is fine, given that you have specified sufficiently many (but not too many) instruments.

In any case, you should verify with the estat serial postestimation command that the Arellano-Bond AR(p) test, for p>1, does not reject the null hypothesis of no serial correlation of order p in the first-differenced error term.

https://www.kripfganz.de/stata/
Comment
Sara Zakaryan

Join Date: Mar 2016

Posts: 30
#11

09 Dec 2018, 10:25

Thanks Sebastian. I am very grateful
Comment

Sara Zakaryan

Join Date: Mar 2016
Posts: 30

#12

17 Dec 2018, 00:37

Dear Sebastian,

Following all suggestions and theory, I did constructed my model using the lags of the y and x as gmmiv. I tested for overidentification, for serial correlation and tests indicate that they are valid.
But I would like to have some base under the choice of those instruments. Can you please hint where I can look for the theory of choosing instrumental variables for such type of model?

my current model is giving me the following results.

Code:

. . xtdpdgmm L(0/1).log_difference log_prod_lagged high_income t4 t15-t17,gmmiv(L(1/3).log_difference log_prod_lagged, l(2 4) c m(d))
>  iv( t4 t15-t17 high_income, m(level)) noserial twostep vce(robust)

Generalized method of moments estimation

Step 1
initial:       f(p) =  .03154082
alternative:   f(p) =  4.6080451
rescale:       f(p) =  .00952851
Iteration 0:   f(p) =  .00952851  
Iteration 1:   f(p) =  .00087221  
Iteration 2:   f(p) =  .00087206  
Iteration 3:   f(p) =  .00087206  

Step 2
Iteration 0:   f(p) =  .62472978  
Iteration 1:   f(p) =  .54706659  
Iteration 2:   f(p) =  .54578078  
Iteration 3:   f(p) =   .5457685  
Iteration 4:   f(p) =  .54576828  
Iteration 5:   f(p) =  .54576827  

Group variable: country                      Number of obs         =      1239
Time variable: year                          Number of groups      =        59

Moment conditions:     linear =      14      Obs per group:    min =        21
                    nonlinear =      19                        avg =        21
                        total =      33                        max =        21

                                  (Std. Err. adjusted for 59 clusters in country)
---------------------------------------------------------------------------------
                |              WC-Robust
 log_difference |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
 log_difference |
            L1. |  -.2737381   .2348985    -1.17   0.244    -.7341308    .1866546
                |
log_prod_lagged |  -.0717901   .0262071    -2.74   0.006    -.1231551   -.0204251
    high_income |   .0977761   .0358735     2.73   0.006     .0274653     .168087
             t4 |  -.0213408   .0102535    -2.08   0.037    -.0414372   -.0012444
            t15 |  -.0545968    .009814    -5.56   0.000    -.0738319   -.0353617
            t16 |   .0474716   .0174916     2.71   0.007     .0131886    .0817545
            t17 |   .0321065   .0190612     1.68   0.092    -.0052528    .0694657
          _cons |   .7185215   .2519104     2.85   0.004     .2247862    1.212257
---------------------------------------------------------------------------------

. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =   -1.2059   Prob > |z|  =    0.2278
H0: no autocorrelation of order 2:     z =   -0.8626   Prob > |z|  =    0.3883

. estat overid

Hansen's J-test                                        chi2(25)    =   32.2003
H0: overidentifying restrictions are valid             Prob > chi2 =    0.1523

The choice of the instruments are being questioned a lot, so i am trying to understand what is the correct way to approach in this case.

Thanks a lot in adnvance!
best,
Sara Zakaryan

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#13

17 Dec 2018, 01:16

Sara:
isn't the literature in your research field supportive in this respect?

Kind regards,
Carlo
(Stata 19.0)
Comment
Sara Zakaryan

Join Date: Mar 2016

Posts: 30
#14

17 Dec 2018, 02:08

Carlo,

I find papers which used GMM models to estimate productivity as I try to do. But none of them specify what instruments they used. It is really weird because either they didnt mention it and just skipped or they didnt use any, which means they assume their variables are exogenous. In my case I have endogeniety issue.

Sara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#15

17 Dec 2018, 03:17

Sara:
if no previous researches touched on endogeneity issue, your instruments are less likely to be questionable!

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Panel data analysis with AR(1)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment