Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data analysis with AR(1)

    Hello everyone!

    I have a panel data for labor productivity for 74 countries for period of 1992-2016.I am trying to understand the convergence rate based on romers model of convergence.
    I attach also the data

    I am using xtregar for my model and getting such results.

    Code:
    . xtregar ln_diff ln_prod_lagged t18-t19 L1.ln_diff L2.ln_diff, fe rhotype(tscorr) twostep
    
    FE (within) regression with AR(1) disturbances  Number of obs     =      1,628
    Group variable: id                              Number of groups  =         74
    
    R-sq:                                           Obs per group:
         within  = 0.0917                                         min =         22
         between = 0.0056                                         avg =       22.0
         overall = 0.0043                                         max =         22
    
                                                    F(5,1549)         =      31.26
    corr(u_i, Xb)  = -0.9112                        Prob > F          =     0.0000
    
    --------------------------------------------------------------------------------
           ln_diff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    ln_prod_lagged |  -.0742815   .0073197   -10.15   0.000     -.088639   -.0599239
               t18 |   -.038016   .0108618    -3.50   0.000    -.0593213   -.0167108
               t19 |   .0602955   .0109165     5.52   0.000     .0388827    .0817082
                   |
           ln_diff |
               L1. |   .0431449   .0222561     1.94   0.053    -.0005104    .0868002
               L2. |  -.0367265   .0210011    -1.75   0.081    -.0779201    .0044671
                   |
             _cons |   .7583084   .0742463    10.21   0.000     .6126746    .9039422
    ---------------+----------------------------------------------------------------
            rho_ar | -.03962128
           sigma_u |  .09476397
           sigma_e |  .09042197
           rho_fov |  .52343382   (fraction of variance because of u_i)
    --------------------------------------------------------------------------------
    F test that all u_i=0: F(73,1549) = 3.58                     Prob > F = 0.0000
    I have several questions regarding the results I am getting, and will be grateful if someone could answer.
    1. I am getting 0.52 for rho_fov and with no any linear model i cannot increase it, does it mean this models are not good to use in case of my data?

    2. My second question might sound weird but is it possible after estimation of coefficients somehow go deeper and inderstand what the coefficient/rate would be for specific country, for example Armenia as each country has different moving path and the speed can be different.

    3.Are GMM models more appropriate for such analysis? I do not understand very well dynamic panel models and what is going on in background, so for now trying to stay away from them.

    Thank you very much in advace!
    Best,
    Sara Zakaryan
    Attached Files

  • #2
    Sara:
    1) as per FAQ, please do not attach files in non-Stata format. As far as spreadsheets are concerned, please note that there's a pretty widespread (and wise) attitude not to download them, due to the risk of malwares (no blame to the original poster, as she/he is usually unaware of them);
    2) the F-test appearing at the foot of the outcome table tells that you're on the right track;
    3) you can inlcude an indicator variable (-i.country-) amonbg your predictors;
    4) dynamic panel data models are really difficult to manage, unless you have a strong background on econometrics. The guru on this list for this kind of stuff is Sebastian Kripfganz (http://www.kripfganz.de/index.html). Let's hope he'll chime in.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Dear Carlo,

      Thank you very much for your reply.
      1) Yes, i completely understand, somehow didnt think of it, sorry.

      2) There is no way to assume what will be the learnt parameter for individual countries right? Panel regression is in the end ignoring the ids and giving me only one coefficient, so the interpretation is in average all those countires are getting close to each other with 7.4% yearly? but what is the speed for seperate countries no way to check right?


      Best,
      Sara




      Comment


      • #4
        Sara:
        I fail to get your 2). if you include a categorical variable among your predictors, you should have N-1 coefficients.
        In all likelihood, things would be clearer if you posted what you typed and what Stata gave you back (as you correctly did in your original post).
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          It is hard to provide meaningful help without knowing what your variables are. Is ln_diff the first difference of ln_prod?

          It looks like you are already estimating a dynamic panel model, so I do not understand your comment 3. When you estimate such a dynamic model with lags of the dependent variable, xtregar is not an appropriate command as it does not deal with the dynamic panel bias due to these lags. The AR component just adds autoregressive dynamics for the error term. It is not clear why you would want to do this.

          With the dynamic model as specified, you can hardly avoid the usual GMM estimators. See for example:
          XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

          Before seeking specific help in that direction, I highly recommend that yoi make yourself familiar with the respective econometric literature.

          Finally, regarding your second question, you are estimating a model with homogenous slope coefficients. By assumption, all countries have the same slope. To estimate a model with heterogeneous slopes, you might want to have a look at the so-called (pooled) mean-group estimator. That said, your time dimension might be a bit short for that purpose.
          https://twitter.com/Kripfganz

          Comment


          • #6
            Dear Sebastian and Carlo,

            Thanks for your reply! I actually was trying to use system gmm as you mentioned. I went through the documentations and all assumptions were true for my data.
            I am trying to calculate convergence rate of Labor productivity. So ln_diff is the log difference of labor productivity = growth rate, and ln_prod_lagged is log of productivity in t-1 time.

            I started to hesitate about systemgmm because it is very sensitive how many lags I include.
            Code:
            . xtdpdsys ln_diff ln_prod_lagged t19-t20, lags(3) vce(robust) artests(2)
            
            System dynamic panel-data estimation            Number of obs     =      1,254
            Group variable: id                              Number of groups  =         57
            Time variable: year
                                                            Obs per group:
                                                                          min =         22
                                                                          avg =         22
                                                                          max =         22
            
            Number of instruments =    299                  Wald chi2(6)      =     144.19
                                                            Prob > chi2       =     0.0000
            One-step results
            --------------------------------------------------------------------------------
                           |               Robust
                   ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ---------------+----------------------------------------------------------------
                   ln_diff |
                       L1. |  -.1116288   .0393783    -2.83   0.005    -.1888088   -.0344487
                       L2. |  -.0797623   .0425259    -1.88   0.061    -.1631115    .0035869
                       L3. |  -.0296277   .0302774    -0.98   0.328    -.0889704    .0297149
                           |
            ln_prod_lagged |  -.1075446   .0137441    -7.82   0.000    -.1344826   -.0806066
                       t19 |   .0777054   .0099087     7.84   0.000     .0582846    .0971262
                       t20 |    .034093   .0139105     2.45   0.014     .0068289    .0613572
                     _cons |   1.133677   .1382678     8.20   0.000     .8626767    1.404677
            --------------------------------------------------------------------------------
            Instruments for differenced equation
                    GMM-type: L(2/.).ln_diff
                    Standard: D.ln_prod_lagged D.t19 D.t20
            Instruments for level equation
                    GMM-type: LD.ln_diff
                    Standard: _cons
            and when I reduce number of lags

            Code:
            . xtdpdsys ln_diff ln_prod_lagged t19-t20, lags(2) vce(robust) artests(2)
            
            System dynamic panel-data estimation            Number of obs     =      1,311
            Group variable: id                              Number of groups  =         57
            Time variable: year
                                                            Obs per group:
                                                                          min =         23
                                                                          avg =         23
                                                                          max =         23
            
            Number of instruments =    302                  Wald chi2(5)      =     167.05
                                                            Prob > chi2       =     0.0000
            One-step results
            --------------------------------------------------------------------------------
                           |               Robust
                   ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ---------------+----------------------------------------------------------------
                   ln_diff |
                       L1. |  -.0554033   .0306257    -1.81   0.070    -.1154286     .004622
                       L2. |   .0357003   .1092617     0.33   0.744    -.1784487    .2498493
                           |
            ln_prod_lagged |  -.1156499   .0187584    -6.17   0.000    -.1524157   -.0788841
                       t19 |   .0870614   .0103397     8.42   0.000      .066796    .1073268
                       t20 |   .0468474   .0133936     3.50   0.000     .0205964    .0730984
                     _cons |   1.204001   .1903986     6.32   0.000     .8308264    1.577175
            --------------------------------------------------------------------------------
            Instruments for differenced equation
                    GMM-type: L(2/.).ln_diff
                    Standard: D.ln_prod_lagged D.t19 D.t20
            Instruments for level equation
                    GMM-type: LD.ln_diff
                    Standard: _cons
            Code:
             xtdpdsys ln_diff ln_prod_lagged t19-t20, lags(1) vce(robust) artests(2)
            
            System dynamic panel-data estimation            Number of obs     =      1,368
            Group variable: id                              Number of groups  =         57
            Time variable: year
                                                            Obs per group:
                                                                          min =         24
                                                                          avg =         24
                                                                          max =         24
            
            Number of instruments =    303                  Wald chi2(4)      =      96.61
                                                            Prob > chi2       =     0.0000
            One-step results
            --------------------------------------------------------------------------------
                           |               Robust
                   ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ---------------+----------------------------------------------------------------
                   ln_diff |
                       L1. |   .0294566   .0839557     0.35   0.726    -.1350934    .1940067
                           |
            ln_prod_lagged |  -.1155769   .0266246    -4.34   0.000    -.1677602   -.0633935
                       t19 |   .0959021   .0132818     7.22   0.000     .0698702     .121934
                       t20 |   .0388161   .0165714     2.34   0.019     .0063367    .0712956
                     _cons |   1.197448   .2674311     4.48   0.000     .6732929    1.721604
            --------------------------------------------------------------------------------
            Instruments for differenced equation
                    GMM-type: L(2/.).ln_diff
                    Standard: D.ln_prod_lagged D.t19 D.t20
            Instruments for level equation
                    GMM-type: LD.ln_diff
                    Standard: _cons
            I am quite worried about the large confidence interval as well.

            In my sample I have UPPER-MIDDLE-INCOME ECONOMIES and HIGH-INCOME ECONOMIES excluding oil producers and islands.


            I got that with xtregar i should not include lags of y as it will be correlated with errors. so removing it i get such resuts

            Code:
            . xtregar ln_diff ln_prod_lagged t18-t19, fe rhotype(tscorr) twostep
            
            FE (within) regression with AR(1) disturbances  Number of obs     =      1,368
            Group variable: id                              Number of groups  =         57
            
            R-sq:                                           Obs per group:
                 within  = 0.0905                                         min =         24
                 between = 0.0000                                         avg =       24.0
                 overall = 0.0104                                         max =         24
            
                                                            F(3,1308)         =      43.41
            corr(u_i, Xb)  = -0.8897                        Prob > F          =     0.0000
            
            --------------------------------------------------------------------------------
                   ln_diff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            ---------------+----------------------------------------------------------------
            ln_prod_lagged |  -.0766854   .0084694    -9.05   0.000    -.0933005   -.0600702
                       t18 |  -.0454302   .0132051    -3.44   0.001    -.0713356   -.0195247
                       t19 |    .070636   .0131691     5.36   0.000     .0448012    .0964708
                     _cons |   .8101878   .0762135    10.63   0.000     .6606737    .9597019
            ---------------+----------------------------------------------------------------
                    rho_ar |  .11008698
                   sigma_u |  .08237438
                   sigma_e |  .09676155
                   rho_fov |  .42020046   (fraction of variance because of u_i)
            --------------------------------------------------------------------------------
            F test that all u_i=0: F(56,1308) = 2.45                     Prob > F = 0.0000
            Do you have any assumptions what can cause the gmm model to be so non stable?

            Thanks a lot in advance!
            Best,
            Sara

            Comment


            • #7
              I am also posting the results from xtdpdgmm ... I think this is a good result, is it?

              Code:
              . xtdpdgmm L(0/1).ln_diff ln_prod_lagged t18-t19, gmmiv(L(0/3).ln_diff ln_prod_lagged, l(1 4) c m(d)) noserial twostep vce(robust)
              
              Generalized method of moments estimation
              
              Step 1
              initial:       f(p) =   .2040731
              alternative:   f(p) =  5.3804756
              rescale:       f(p) =  .17549462
              Iteration 0:   f(p) =  .17549462  
              Iteration 1:   f(p) =  .01882145  
              Iteration 2:   f(p) =   .0098848  
              Iteration 3:   f(p) =  .00987584  
              Iteration 4:   f(p) =  .00987584  
              
              Step 2
              Iteration 0:   f(p) =  .77182291  
              Iteration 1:   f(p) =   .5845599  
              Iteration 2:   f(p) =  .46495408  
              Iteration 3:   f(p) =  .46233416  
              Iteration 4:   f(p) =  .46231849  
              Iteration 5:   f(p) =  .46231783  
              Iteration 6:   f(p) =  .46231726  
              Iteration 7:   f(p) =  .46231694  
              Iteration 8:   f(p) =  .46231669  
              Iteration 9:   f(p) =  .46231654  
              Iteration 10:  f(p) =  .46231642  
              
              Group variable: id                           Number of obs         =      1368
              Time variable: year                          Number of groups      =        57
              
              Moment conditions:     linear =      12      Obs per group:    min =        24
                                  nonlinear =      22                        avg =        24
                                      total =      34                        max =        24
              
                                                    (Std. Err. adjusted for 57 clusters in id)
              --------------------------------------------------------------------------------
                             |              WC-Robust
                     ln_diff |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              ---------------+----------------------------------------------------------------
                     ln_diff |
                         L1. |   .3107452   .1307188     2.38   0.017     .0545411    .5669494
                             |
              ln_prod_lagged |  -.0385669   .0136412    -2.83   0.005    -.0653032   -.0118306
                         t18 |   -.937218   .3313464    -2.83   0.005    -1.586645   -.2877909
                         t19 |   .9299993   .3407103     2.73   0.006     .2622194    1.597779
                       _cons |   .4171995   .1425264     2.93   0.003     .1378529     .696546
              --------------------------------------------------------------------------------
              
              . estat overid
              
              Hansen's J-test                                        chi2(29)    =   26.3520
              H0: overidentifying restrictions are valid             Prob > chi2 =    0.6066
              Your comments will be very very helpful, thank you all very much in advance!

              Sara

              Comment


              • #8
                I would strongly advise against using xtdpdsys. As you can see, it creates a huge number of instruments which exposes your estimates to a "too-many-instruments problem". While it is possible to restrict the lag depth with xtdpdsys, it does not have a collapse option. With other commands, xtdpdgmm for example, you can do everything that you could do with xtdpdsys plus additional things such as collapsing.

                Even after excluding the lags of ln_diff, your model estimated with xtregar is still dynamic. ln_prod_lagged still creates a dynamic panel bias in the same way. In that regard, it is unusual to specify the dynamic regression model with the first difference as the dependent variable and lagged first differences as independent variables. When using the system GMM estimator, this implies that your are specifying instruments for the first differences of these first differences, which are second differences. This could lead to weak instruments problems.

                Your results from xtdpdgmm illustrate nicely that we should not blindly believe in the result of the overidentification test. Here, it indeed suggests that the model might be correctly specified while it is definitely not. You are using the first lag of the dependent variable as an instrument for the first-differenced model. By construction of the model, this first lag is correlated with the first-differenced error term. You need to write gmmiv(L(1/3).ln_diff ln_prod_lagged, l(1 4) c m(d)) instead, or just gmmiv(L.ln_diff ln_prod_lagged, l(1 6) c m(d)).

                Also, to avoid confusion, it is usually better not to create differences and lags as separate variables but to use time-series operators instead, i.e. D.ln_prod instead of ln_diff and L.ln_prod instead of ln_prod_lagged. That way, you immediately notice the dynamic relationship by including L.ln_prod in the set of explanatory variables.
                https://twitter.com/Kripfganz

                Comment


                • #9
                  Dear Sebastian,

                  Thank you very much for your support! I appreciate it a lot.

                  Yes I agree with time-series operators, but it somehow right now gives me even more confusion. But I will try to overcome it and write correctly.
                  I have one more questiion please, with your suggestions the model is becoming like this.
                  Code:
                   
                   xtdpdgmm L(0/1).ln_diff ln_prod_lagged t18-t19, gmmiv(L.ln_diff ln_prod_lagged, l(1 6) c m(d)) noserial twostep vce(robust)
                  I will run such models for about 10 industry data seperately, so for some datasets the following code works better


                  Code:
                   
                   xtdpdgmm L(0/2).ln_diff ln_prod_lagged t18-t19, gmmiv(L.ln_diff ln_prod_lagged, l(1 6) c m(d)) noserial twostep vce(robust)
                  will this be correct as well?


                  Comment


                  • #10
                    Adding more lags of the dependent variable is fine, given that you have specified sufficiently many (but not too many) instruments.

                    In any case, you should verify with the estat serial postestimation command that the Arellano-Bond AR(p) test, for p>1, does not reject the null hypothesis of no serial correlation of order p in the first-differenced error term.
                    https://twitter.com/Kripfganz

                    Comment


                    • #11
                      Thanks Sebastian. I am very grateful

                      Comment


                      • #12
                        Dear Sebastian,

                        Following all suggestions and theory, I did constructed my model using the lags of the y and x as gmmiv. I tested for overidentification, for serial correlation and tests indicate that they are valid.
                        But I would like to have some base under the choice of those instruments. Can you please hint where I can look for the theory of choosing instrumental variables for such type of model?

                        my current model is giving me the following results.
                        Code:
                        . . xtdpdgmm L(0/1).log_difference log_prod_lagged high_income t4 t15-t17,gmmiv(L(1/3).log_difference log_prod_lagged, l(2 4) c m(d))
                        >  iv( t4 t15-t17 high_income, m(level)) noserial twostep vce(robust)
                        
                        Generalized method of moments estimation
                        
                        Step 1
                        initial:       f(p) =  .03154082
                        alternative:   f(p) =  4.6080451
                        rescale:       f(p) =  .00952851
                        Iteration 0:   f(p) =  .00952851  
                        Iteration 1:   f(p) =  .00087221  
                        Iteration 2:   f(p) =  .00087206  
                        Iteration 3:   f(p) =  .00087206  
                        
                        Step 2
                        Iteration 0:   f(p) =  .62472978  
                        Iteration 1:   f(p) =  .54706659  
                        Iteration 2:   f(p) =  .54578078  
                        Iteration 3:   f(p) =   .5457685  
                        Iteration 4:   f(p) =  .54576828  
                        Iteration 5:   f(p) =  .54576827  
                        
                        Group variable: country                      Number of obs         =      1239
                        Time variable: year                          Number of groups      =        59
                        
                        Moment conditions:     linear =      14      Obs per group:    min =        21
                                            nonlinear =      19                        avg =        21
                                                total =      33                        max =        21
                        
                                                          (Std. Err. adjusted for 59 clusters in country)
                        ---------------------------------------------------------------------------------
                                        |              WC-Robust
                         log_difference |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        ----------------+----------------------------------------------------------------
                         log_difference |
                                    L1. |  -.2737381   .2348985    -1.17   0.244    -.7341308    .1866546
                                        |
                        log_prod_lagged |  -.0717901   .0262071    -2.74   0.006    -.1231551   -.0204251
                            high_income |   .0977761   .0358735     2.73   0.006     .0274653     .168087
                                     t4 |  -.0213408   .0102535    -2.08   0.037    -.0414372   -.0012444
                                    t15 |  -.0545968    .009814    -5.56   0.000    -.0738319   -.0353617
                                    t16 |   .0474716   .0174916     2.71   0.007     .0131886    .0817545
                                    t17 |   .0321065   .0190612     1.68   0.092    -.0052528    .0694657
                                  _cons |   .7185215   .2519104     2.85   0.004     .2247862    1.212257
                        ---------------------------------------------------------------------------------
                        
                        . estat serial
                        
                        Arellano-Bond test for autocorrelation of the first-differenced residuals
                        H0: no autocorrelation of order 1:     z =   -1.2059   Prob > |z|  =    0.2278
                        H0: no autocorrelation of order 2:     z =   -0.8626   Prob > |z|  =    0.3883
                        
                        . estat overid
                        
                        Hansen's J-test                                        chi2(25)    =   32.2003
                        H0: overidentifying restrictions are valid             Prob > chi2 =    0.1523
                        The choice of the instruments are being questioned a lot, so i am trying to understand what is the correct way to approach in this case.

                        Thanks a lot in adnvance!
                        best,
                        Sara Zakaryan

                        Comment


                        • #13
                          Sara:
                          isn't the literature in your research field supportive in this respect?
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            Carlo,

                            I find papers which used GMM models to estimate productivity as I try to do. But none of them specify what instruments they used. It is really weird because either they didnt mention it and just skipped or they didnt use any, which means they assume their variables are exogenous. In my case I have endogeniety issue.

                            Sara

                            Comment


                            • #15
                              Sara:
                              if no previous researches touched on endogeneity issue, your instruments are less likely to be questionable!
                              Kind regards,
                              Carlo
                              (Stata 18.0 SE)

                              Comment

                              Working...
                              X