Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTSEQREG: error when estimating a sequential 2SLS-First Differences model

    Dear Statalists,

    I am trying to implement Kripfganz & Schwarz (2019) variance correction for a sequential 2SLS estimation in which the first step involves a first-differenced equation. However, I am not able to run the second step succesfully either because: 1) variable names do not macth in the second step or 2) the command xtseqreg does not estimate correctly the first step when using the option iv(,model(diff))

    Let me illustrate the error I get using the public data base abdata.dta.

    Code:
    webuse abdata
    xtset id year 
    gen sample = (L.n != . & L.w != . & L.k != . & L2.w != . & L2.k != .)
    This is the first step I would like to replicate with XTSEQREG:
    Code:
    .  ivreg2 D.n (D.w D.k = L2.w L2.k) i.year if sample == 1, robust
    
    IV (2SLS) estimation
    --------------------
    
    Estimates efficient for homoskedasticity only
    Statistics robust to heteroskedasticity
    
                                                          Number of obs =      751
                                                          F(  8,   742) =     8.48
                                                          Prob > F      =   0.0000
    Total (centered) SS     =  13.73614077                Centered R2   =  -0.6556
    Total (uncentered) SS   =  16.06429854                Uncentered R2 =  -0.4156
    Residual SS             =  22.74127488                Root MSE      =     .174
    
    ------------------------------------------------------------------------------
                 |               Robust
             D.n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               w |
             D1. |   1.146626   1.024929     1.12   0.263     -.862198     3.15545
                 |
               k |
             D1. |   .1041816   .8474889     0.12   0.902    -1.556866    1.765229
                 |
            year |
           1979  |  -.0031767   .0369074    -0.09   0.931    -.0755138    .0691605
           1980  |   -.033016   .0842578    -0.39   0.695    -.1981583    .1321262
           1981  |  -.1403377   .1671101    -0.84   0.401    -.4678675    .1871922
           1982  |  -.1298584   .1888818    -0.69   0.492      -.50006    .2403432
           1983  |  -.0918314   .1470443    -0.62   0.532     -.380033    .1963702
           1984  |  -.0454144   .1310702    -0.35   0.729    -.3023074    .2114785
                 |
           _cons |  -.0033702   .0537219    -0.06   0.950    -.1086632    .1019228
    ------------------------------------------------------------------------------
    Underidentification test (Kleibergen-Paap rk LM statistic):              1.387
                                                       Chi-sq(1) P-val =    0.2388
    ------------------------------------------------------------------------------
    Weak identification test (Cragg-Donald Wald F statistic):                0.928
                             (Kleibergen-Paap rk Wald F statistic):          0.690
    Stock-Yogo weak ID test critical values: 10% maximal IV size              7.03
                                             15% maximal IV size              4.58
                                             20% maximal IV size              3.95
                                             25% maximal IV size              3.63
    Source: Stock-Yogo (2005).  Reproduced by permission.
    NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
    ------------------------------------------------------------------------------
    Hansen J statistic (overidentification test of all instruments):         0.000
                                                     (equation exactly identified)
    ------------------------------------------------------------------------------
    Instrumented:         D.w D.k
    Included instruments: 1979.year 1980.year 1981.year 1982.year 1983.year
                          1984.year
    Excluded instruments: L2.w L2.k
    ------------------------------------------------------------------------------
    In a first attempt, I am able to replicate the point estimates obtained with the command IVREG2. In the second step, on the other hand, I would like to ignore my first estimate of the capital coefficient and re-estimate this parameter using only cross sectional variation, that is, using the levels equation. However, Stata throws an error saying the variables in the first step do not macth:

    Code:
    .  xtseqreg D.n D.w D.k if sample == 1, iv(L2.w L2.k, model(level)) teffects vce(robust)
    1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year
    
    Group variable: id                           Number of obs         =       751
    Time variable: year                          Number of groups      =       140
    
                                                 Obs per group:    min =         5
                                                                   avg =  5.364286
                                                                   max =         7
    
                                                 Number of instruments =         9
    
                                       (Std. Err. adjusted for 140 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
             D.n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               w |
             D1. |   1.146626   1.112832     1.03   0.303    -1.034484    3.327736
                 |
               k |
             D1. |   .1041816   .6833247     0.15   0.879     -1.23511    1.443473
                 |
            year |
           1979  |  -.0031767   .0314504    -0.10   0.920    -.0648183     .058465
           1980  |   -.033016    .070716    -0.47   0.641    -.1716169    .1055849
           1981  |  -.1403377   .1428372    -0.98   0.326    -.4202935    .1396181
           1982  |  -.1298584   .1657704    -0.78   0.433    -.4547625    .1950456
           1983  |  -.0918314   .1339453    -0.69   0.493    -.3543594    .1706966
           1984  |  -.0454144   .1081419    -0.42   0.675    -.2573686    .1665397
                 |
           _cons |  -.0033702   .0442384    -0.08   0.939    -.0900759    .0833355
    ------------------------------------------------------------------------------
    
    
    .  xtseqreg n (w) k if sample == 1, iv(LD.k, model(level)) teffects vce(robust)
    1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year
    option first() incorrectly specified -- variable names do not match
    r(322);
    In a second attempt, I try to make use of the option iv(,model(diff)). As you can see below, I am now able to run the second step. However, the first step changes and does not match the results obtained with IVREG2 or with the previous first step using XTSEQREG with the variables manually first differenced and making use of the option iv(,model(lev)).

    Code:
    .  xtseqreg n w k if sample == 1, iv(L2.w L2.k, model(diff)) teffects vce(robust)
    1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year
    
    Group variable: id                           Number of obs         =       751
    Time variable: year                          Number of groups      =       140
    
                                                 Obs per group:    min =         5
                                                                   avg =  5.364286
                                                                   max =         7
    
                                                 Number of instruments =         9
    
                                       (Std. Err. adjusted for 140 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
               n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               w |   .5550703    .529953     1.05   0.295    -.4836185    1.593759
               k |   .7902225   .0450446    17.54   0.000     .7019367    .8785083
                 |
            year |
           1979  |   .0190451    .043757     0.44   0.663     -.066717    .1048073
           1980  |   .0086869    .044635     0.19   0.846    -.0787962    .0961699
           1981  |  -.0482938   .0462004    -1.05   0.296     -.138845    .0422573
           1982  |  -.0722319    .062388    -1.16   0.247    -.1945102    .0500464
           1983  |  -.0696385   .0915159    -0.76   0.447    -.2490063    .1097294
           1984  |  -.1923327   .1193695    -1.61   0.107    -.4262926    .0416273
                 |
           _cons |   -.329578   1.672226    -0.20   0.844     -3.60708    2.947924
    ------------------------------------------------------------------------------
    
    .  xtseqreg n (w) k if sample == 1, iv(LD.k, model(level)) teffects vce(robust)
    1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year
    
    Group variable: id                           Number of obs         =       751
    Time variable: year                          Number of groups      =       140
    
    ------------------------------------------------------------------------------
    Equation _first                              Equation _second
    Number of obs         =       751            Number of obs         =       751
    Number of groups      =       140            Number of groups      =       140
    
    Obs per group:    min =         5            Obs per group:    min =         5
                      avg =  5.364286                              avg =  5.364286
                      max =         7                              max =         7
    
    Number of instruments =         9            Number of instruments =         8
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |               Robust
               n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _first       |
               w |   .5550703    .529953     1.05   0.295    -.4836185    1.593759
           _cons |   -.329578   1.672226    -0.20   0.844     -3.60708    2.947924
    -------------+----------------------------------------------------------------
    _second      |
               k |   .5487759   .1122811     4.89   0.000      .328709    .7688428
                 |
            year |
           1979  |  -.0018788   .0517203    -0.04   0.971    -.1032487    .0994911
           1980  |  -.0198237   .0534126    -0.37   0.711    -.1245105    .0848632
           1981  |  -.1003511   .0590729    -1.70   0.089     -.216132    .0154297
           1982  |  -.1546863   .0761061    -2.03   0.042    -.3038516   -.0055211
           1983  |  -.2072187   .1211814    -1.71   0.087    -.4447299    .0302925
           1984  |  -.4274209   .1510207    -2.83   0.005    -.7234161   -.1314258
                 |
           _cons |  -.0528342   .0459619    -1.15   0.250    -.1429178    .0372494
    ------------------------------------------------------------------------------
    .
    Thanks!

    Santiago


  • #2
    Many thanks for raising this issue, Santiago Franco.

    There are two problems:
    1. The sample needs to be restricted differently when the model is specified in levels compared to when it is specified in first differences. Below, I create a new variable sample2 that must be used when the model is specified in levels.
    2. There was unfortunately a bug in xtseqreg with the determination of the estimation sample when the sample was restricted with an if-condition. I have fixed this bug and an update is now available on my personal website:
    Code:
    net install xtseqreg, from(http://www.kripfganz.de/stata/) replace
    With this update, the following specifications should all yield the same results for the first stage:
    Code:
    . ivreg2 D.n (D.w D.k = L2.w L2.k) i.year if sample == 1, cluster(id)
    
    . xtseqreg D.n D.w D.k if sample == 1, iv(L2.w L2.k, model(level)) teffects vce(robust)
    
    . gen sample2 = sample
    . replace sample2 = 1 if F.sample == 1
    . xtseqreg n w k yr1979-yr1984 year if sample2 == 1, iv(L2.w L2.k, model(diff)) iv(yr1979-yr1984 year, diff model(diff)) vce(robust) nocons
    Notice that the time dummies for the final model should not be specified with the teffects option, which always creates time dummies for the level model. To specify them for the first-differenced model, I explicitly put the dummies in the list of regressors and created the appropriate instruments. Also, to replicate the intercept for the first-differenced model, we need to specify a linear time trend for the level model (variable year), again with appropriate instrument, together with the nocons option.

    Finally, the second-stage results follow:
    Code:
    . xtseqreg n (w) k if sample2 == 1, iv(LD.k, model(level)) teffects vce(robust)
    It would be great if you could let me know if everything works fine for you now.
    https://twitter.com/Kripfganz

    Comment


    • #3
      Thank you very much Sebastian Kripfganz for your detailed answer. I was able to implement what you recommended and, indeed, the three specifications yield the same results. However, I was not able to obtain the same results with commands xtseqreg and ivreg2 when considering an estimation with "internal" and "external" instruments.

      Note that the initial estimation I posted considers only internal instruments (lags) for both
      w and k. Nevertheless, if I consider w to be an external instrument for k (does not enter into the model) I am not able to replicate the results of ivreg2 with xtseqreg.

      Code:
      webuse abdata, clear
      xtset id year 
      
      gen sample = (L.n != . & L.w != . & L.k != . & D.w != . & L2.k != .)
      gen sample2 = sample
      replace sample2 = 1 if F.sample == 1


      This is the estimation using ivreg2:
      Code:
      .  ivreg2 D.n (D.k = D.w L2.k) i.year if sample == 1, cluster(id)
      
      IV (2SLS) estimation
      --------------------
      
      Estimates efficient for homoskedasticity only
      Statistics robust to heteroskedasticity and clustering on id
      
      Number of clusters (id) =          140                Number of obs =      751
                                                            F(  7,   139) =    19.92
                                                            Prob > F      =   0.0000
      Total (centered) SS     =  13.73614077                Centered R2   =   0.2272
      Total (uncentered) SS   =  16.06429854                Uncentered R2 =   0.3392
      Residual SS             =  10.61514588                Root MSE      =    .1189
      
      ------------------------------------------------------------------------------
                   |               Robust
               D.n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                 k |
               D1. |    .142234   .4398271     0.32   0.746    -.7198114    1.004279
                   |
              year |
             1979  |   .0114146   .0136808     0.83   0.404    -.0153993    .0382285
             1980  |  -.0217357   .0431501    -0.50   0.614    -.1063083    .0628369
             1981  |  -.0895835   .0708498    -1.26   0.206    -.2284466    .0492796
             1982  |  -.0720179   .0788896    -0.91   0.361    -.2266387    .0826029
             1983  |  -.0388727   .0656469    -0.59   0.554    -.1675382    .0897928
             1984  |  -.0148543   .0574581    -0.26   0.796    -.1274701    .0977614
                   |
             _cons |   -.011992   .0260098    -0.46   0.645    -.0629703    .0389863
      ------------------------------------------------------------------------------
      Underidentification test (Kleibergen-Paap rk LM statistic):              3.030
                                                         Chi-sq(2) P-val =    0.2199
      ------------------------------------------------------------------------------
      Weak identification test (Cragg-Donald Wald F statistic):                2.022
                               (Kleibergen-Paap rk Wald F statistic):          1.814
      Stock-Yogo weak ID test critical values: 10% maximal IV size             19.93
                                               15% maximal IV size             11.59
                                               20% maximal IV size              8.75
                                               25% maximal IV size              7.25
      Source: Stock-Yogo (2005).  Reproduced by permission.
      NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
      ------------------------------------------------------------------------------
      Hansen J statistic (overidentification test of all instruments):         2.418
                                                         Chi-sq(1) P-val =    0.1200
      ------------------------------------------------------------------------------
      Instrumented:         D.k
      Included instruments: 1979.year 1980.year 1981.year 1982.year 1983.year
                            1984.year
      Excluded instruments: D.w L2.k
      ------------------------------------------------------------------------------


      And these are the results using xtseqreg:
      Code:
      .  xtseqreg n k yr1979-yr1984 year if sample2 == 1, iv(D.w L2.k, model(diff)) iv(yr1979-yr1984 year, diff model(diff)) vce(cluster id) nocons
      
      Group variable: id                           Number of obs         =       891
      Time variable: year                          Number of groups      =       140
      
                                                   Obs per group:    min =         6
                                                                     avg =  6.364286
                                                                     max =         8
      
                                                   Number of instruments =         9
      
                                         (Std. Err. adjusted for 140 clusters in id)
      ------------------------------------------------------------------------------
                   |               Robust
                 n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                 k |   .2909463   .4012652     0.73   0.468    -.4955191    1.077412
            yr1979 |   .0153868   .0122692     1.25   0.210    -.0086604    .0394339
            yr1980 |   .0068197   .0491992     0.14   0.890    -.0896089    .1032484
            yr1981 |  -.0602003   .1120414    -0.54   0.591    -.2797974    .1593968
            yr1982 |  -.1054354   .1828742    -0.58   0.564    -.4638622    .2529913
            yr1983 |  -.1263167   .2389586    -0.53   0.597    -.5946669    .3420336
            yr1984 |  -.1231209   .2896033    -0.43   0.671     -.690733    .4444912
              year |  -.0200525   .0228561    -0.88   0.380    -.0648496    .0247445
      ------------------------------------------------------------------------------
      Thanks again for your help!

      Santiago

      Comment


      • #4
        The source of the discrepancies is that the model is now overidentified and the results depend on the choice of the weighting matrix.

        When the ivreg2 model is specified in first differences, to replicate it with xtseqreg specified in levels you need to add the option wmatrix(separate).
        https://twitter.com/Kripfganz

        Comment


        • #5
          Thank you Sebastian, you are right. Now I get the same results with both commands, including the option wmatrix(separate) for xtseqreg when I have an overidentified model.

          Best,

          Santiago

          Comment

          Working...
          X