XTSEQREG: error when estimating a sequential 2SLS-First Differences model

Santiago Franco

Join Date: Jul 2020
Posts: 4

XTSEQREG: error when estimating a sequential 2SLS-First Differences model

03 Aug 2020, 17:09

Dear Statalists,

I am trying to implement Kripfganz & Schwarz (2019) variance correction for a sequential 2SLS estimation in which the first step involves a first-differenced equation. However, I am not able to run the second step succesfully either because: 1) variable names do not macth in the second step or 2) the command xtseqreg does not estimate correctly the first step when using the option iv(,model(diff))

Let me illustrate the error I get using the public data base abdata.dta.

Code:

webuse abdata
xtset id year 
gen sample = (L.n != . & L.w != . & L.k != . & L2.w != . & L2.k != .)

This is the first step I would like to replicate with XTSEQREG:

Code:

.  ivreg2 D.n (D.w D.k = L2.w L2.k) i.year if sample == 1, robust

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

                                                      Number of obs =      751
                                                      F(  8,   742) =     8.48
                                                      Prob > F      =   0.0000
Total (centered) SS     =  13.73614077                Centered R2   =  -0.6556
Total (uncentered) SS   =  16.06429854                Uncentered R2 =  -0.4156
Residual SS             =  22.74127488                Root MSE      =     .174

------------------------------------------------------------------------------
             |               Robust
         D.n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           w |
         D1. |   1.146626   1.024929     1.12   0.263     -.862198     3.15545
             |
           k |
         D1. |   .1041816   .8474889     0.12   0.902    -1.556866    1.765229
             |
        year |
       1979  |  -.0031767   .0369074    -0.09   0.931    -.0755138    .0691605
       1980  |   -.033016   .0842578    -0.39   0.695    -.1981583    .1321262
       1981  |  -.1403377   .1671101    -0.84   0.401    -.4678675    .1871922
       1982  |  -.1298584   .1888818    -0.69   0.492      -.50006    .2403432
       1983  |  -.0918314   .1470443    -0.62   0.532     -.380033    .1963702
       1984  |  -.0454144   .1310702    -0.35   0.729    -.3023074    .2114785
             |
       _cons |  -.0033702   .0537219    -0.06   0.950    -.1086632    .1019228
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):              1.387
                                                   Chi-sq(1) P-val =    0.2388
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):                0.928
                         (Kleibergen-Paap rk Wald F statistic):          0.690
Stock-Yogo weak ID test critical values: 10% maximal IV size              7.03
                                         15% maximal IV size              4.58
                                         20% maximal IV size              3.95
                                         25% maximal IV size              3.63
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         D.w D.k
Included instruments: 1979.year 1980.year 1981.year 1982.year 1983.year
                      1984.year
Excluded instruments: L2.w L2.k
------------------------------------------------------------------------------

In a first attempt, I am able to replicate the point estimates obtained with the command IVREG2. In the second step, on the other hand, I would like to ignore my first estimate of the capital coefficient and re-estimate this parameter using only cross sectional variation, that is, using the levels equation. However, Stata throws an error saying the variables in the first step do not macth:

Code:

.  xtseqreg D.n D.w D.k if sample == 1, iv(L2.w L2.k, model(level)) teffects vce(robust)
1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year

Group variable: id                           Number of obs         =       751
Time variable: year                          Number of groups      =       140

                                             Obs per group:    min =         5
                                                               avg =  5.364286
                                                               max =         7

                                             Number of instruments =         9

                                   (Std. Err. adjusted for 140 clusters in id)
------------------------------------------------------------------------------
             |               Robust
         D.n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           w |
         D1. |   1.146626   1.112832     1.03   0.303    -1.034484    3.327736
             |
           k |
         D1. |   .1041816   .6833247     0.15   0.879     -1.23511    1.443473
             |
        year |
       1979  |  -.0031767   .0314504    -0.10   0.920    -.0648183     .058465
       1980  |   -.033016    .070716    -0.47   0.641    -.1716169    .1055849
       1981  |  -.1403377   .1428372    -0.98   0.326    -.4202935    .1396181
       1982  |  -.1298584   .1657704    -0.78   0.433    -.4547625    .1950456
       1983  |  -.0918314   .1339453    -0.69   0.493    -.3543594    .1706966
       1984  |  -.0454144   .1081419    -0.42   0.675    -.2573686    .1665397
             |
       _cons |  -.0033702   .0442384    -0.08   0.939    -.0900759    .0833355
------------------------------------------------------------------------------


.  xtseqreg n (w) k if sample == 1, iv(LD.k, model(level)) teffects vce(robust)
1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year
option first() incorrectly specified -- variable names do not match
r(322);

In a second attempt, I try to make use of the option iv(,model(diff)). As you can see below, I am now able to run the second step. However, the first step changes and does not match the results obtained with IVREG2 or with the previous first step using XTSEQREG with the variables manually first differenced and making use of the option iv(,model(lev)).

Code:

.  xtseqreg n w k if sample == 1, iv(L2.w L2.k, model(diff)) teffects vce(robust)
1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year

Group variable: id                           Number of obs         =       751
Time variable: year                          Number of groups      =       140

                                             Obs per group:    min =         5
                                                               avg =  5.364286
                                                               max =         7

                                             Number of instruments =         9

                                   (Std. Err. adjusted for 140 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           w |   .5550703    .529953     1.05   0.295    -.4836185    1.593759
           k |   .7902225   .0450446    17.54   0.000     .7019367    .8785083
             |
        year |
       1979  |   .0190451    .043757     0.44   0.663     -.066717    .1048073
       1980  |   .0086869    .044635     0.19   0.846    -.0787962    .0961699
       1981  |  -.0482938   .0462004    -1.05   0.296     -.138845    .0422573
       1982  |  -.0722319    .062388    -1.16   0.247    -.1945102    .0500464
       1983  |  -.0696385   .0915159    -0.76   0.447    -.2490063    .1097294
       1984  |  -.1923327   .1193695    -1.61   0.107    -.4262926    .0416273
             |
       _cons |   -.329578   1.672226    -0.20   0.844     -3.60708    2.947924
------------------------------------------------------------------------------

.  xtseqreg n (w) k if sample == 1, iv(LD.k, model(level)) teffects vce(robust)
1979bn.year 1980.year 1981.year 1982.year 1983.year 1984.year

Group variable: id                           Number of obs         =       751
Time variable: year                          Number of groups      =       140

------------------------------------------------------------------------------
Equation _first                              Equation _second
Number of obs         =       751            Number of obs         =       751
Number of groups      =       140            Number of groups      =       140

Obs per group:    min =         5            Obs per group:    min =         5
                  avg =  5.364286                              avg =  5.364286
                  max =         7                              max =         7

Number of instruments =         9            Number of instruments =         8

                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |               Robust
           n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_first       |
           w |   .5550703    .529953     1.05   0.295    -.4836185    1.593759
       _cons |   -.329578   1.672226    -0.20   0.844     -3.60708    2.947924
-------------+----------------------------------------------------------------
_second      |
           k |   .5487759   .1122811     4.89   0.000      .328709    .7688428
             |
        year |
       1979  |  -.0018788   .0517203    -0.04   0.971    -.1032487    .0994911
       1980  |  -.0198237   .0534126    -0.37   0.711    -.1245105    .0848632
       1981  |  -.1003511   .0590729    -1.70   0.089     -.216132    .0154297
       1982  |  -.1546863   .0761061    -2.03   0.042    -.3038516   -.0055211
       1983  |  -.2072187   .1211814    -1.71   0.087    -.4447299    .0302925
       1984  |  -.4274209   .1510207    -2.83   0.005    -.7234161   -.1314258
             |
       _cons |  -.0528342   .0459619    -1.15   0.250    -.1429178    .0372494
------------------------------------------------------------------------------

.
Thanks!

Santiago

Tags: None

Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#2

04 Aug 2020, 12:04

Many thanks for raising this issue, Santiago Franco.

There are two problems:
The sample needs to be restricted differently when the model is specified in levels compared to when it is specified in first differences. Below, I create a new variable sample2 that must be used when the model is specified in levels.

There was unfortunately a bug in xtseqreg with the determination of the estimation sample when the sample was restricted with an if-condition. I have fixed this bug and an update is now available on my personal website:

Code:

net install xtseqreg, from(http://www.kripfganz.de/stata/) replace

With this update, the following specifications should all yield the same results for the first stage:

Code:

. ivreg2 D.n (D.w D.k = L2.w L2.k) i.year if sample == 1, cluster(id) . xtseqreg D.n D.w D.k if sample == 1, iv(L2.w L2.k, model(level)) teffects vce(robust) . gen sample2 = sample . replace sample2 = 1 if F.sample == 1 . xtseqreg n w k yr1979-yr1984 year if sample2 == 1, iv(L2.w L2.k, model(diff)) iv(yr1979-yr1984 year, diff model(diff)) vce(robust) nocons

Notice that the time dummies for the final model should not be specified with the teffects option, which always creates time dummies for the level model. To specify them for the first-differenced model, I explicitly put the dummies in the list of regressors and created the appropriate instruments. Also, to replicate the intercept for the first-differenced model, we need to specify a linear time trend for the level model (variable year), again with appropriate instrument, together with the nocons option.

Finally, the second-stage results follow:

Code:

. xtseqreg n (w) k if sample2 == 1, iv(LD.k, model(level)) teffects vce(robust)

It would be great if you could let me know if everything works fine for you now.

https://www.kripfganz.de/stata/
Comment

Santiago Franco

Join Date: Jul 2020
Posts: 4

06 Aug 2020, 17:14

Thank you very much Sebastian Kripfganz for your detailed answer. I was able to implement what you recommended and, indeed, the three specifications yield the same results. However, I was not able to obtain the same results with commands xtseqreg and ivreg2 when considering an estimation with "internal" and "external" instruments.

Note that the initial estimation I posted considers only internal instruments (lags) for both w and k. Nevertheless, if I consider w to be an external instrument for k (does not enter into the model) I am not able to replicate the results of ivreg2 with xtseqreg.

Code:

webuse abdata, clear
xtset id year 

gen sample = (L.n != . & L.w != . & L.k != . & D.w != . & L2.k != .)
gen sample2 = sample
replace sample2 = 1 if F.sample == 1

This is the estimation using ivreg2:

Code:

.  ivreg2 D.n (D.k = D.w L2.k) i.year if sample == 1, cluster(id)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on id

Number of clusters (id) =          140                Number of obs =      751
                                                      F(  7,   139) =    19.92
                                                      Prob > F      =   0.0000
Total (centered) SS     =  13.73614077                Centered R2   =   0.2272
Total (uncentered) SS   =  16.06429854                Uncentered R2 =   0.3392
Residual SS             =  10.61514588                Root MSE      =    .1189

------------------------------------------------------------------------------
             |               Robust
         D.n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           k |
         D1. |    .142234   .4398271     0.32   0.746    -.7198114    1.004279
             |
        year |
       1979  |   .0114146   .0136808     0.83   0.404    -.0153993    .0382285
       1980  |  -.0217357   .0431501    -0.50   0.614    -.1063083    .0628369
       1981  |  -.0895835   .0708498    -1.26   0.206    -.2284466    .0492796
       1982  |  -.0720179   .0788896    -0.91   0.361    -.2266387    .0826029
       1983  |  -.0388727   .0656469    -0.59   0.554    -.1675382    .0897928
       1984  |  -.0148543   .0574581    -0.26   0.796    -.1274701    .0977614
             |
       _cons |   -.011992   .0260098    -0.46   0.645    -.0629703    .0389863
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):              3.030
                                                   Chi-sq(2) P-val =    0.2199
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):                2.022
                         (Kleibergen-Paap rk Wald F statistic):          1.814
Stock-Yogo weak ID test critical values: 10% maximal IV size             19.93
                                         15% maximal IV size             11.59
                                         20% maximal IV size              8.75
                                         25% maximal IV size              7.25
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         2.418
                                                   Chi-sq(1) P-val =    0.1200
------------------------------------------------------------------------------
Instrumented:         D.k
Included instruments: 1979.year 1980.year 1981.year 1982.year 1983.year
                      1984.year
Excluded instruments: D.w L2.k
------------------------------------------------------------------------------

And these are the results using xtseqreg:

Code:

.  xtseqreg n k yr1979-yr1984 year if sample2 == 1, iv(D.w L2.k, model(diff)) iv(yr1979-yr1984 year, diff model(diff)) vce(cluster id) nocons

Group variable: id                           Number of obs         =       891
Time variable: year                          Number of groups      =       140

                                             Obs per group:    min =         6
                                                               avg =  6.364286
                                                               max =         8

                                             Number of instruments =         9

                                   (Std. Err. adjusted for 140 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           k |   .2909463   .4012652     0.73   0.468    -.4955191    1.077412
      yr1979 |   .0153868   .0122692     1.25   0.210    -.0086604    .0394339
      yr1980 |   .0068197   .0491992     0.14   0.890    -.0896089    .1032484
      yr1981 |  -.0602003   .1120414    -0.54   0.591    -.2797974    .1593968
      yr1982 |  -.1054354   .1828742    -0.58   0.564    -.4638622    .2529913
      yr1983 |  -.1263167   .2389586    -0.53   0.597    -.5946669    .3420336
      yr1984 |  -.1231209   .2896033    -0.43   0.671     -.690733    .4444912
        year |  -.0200525   .0228561    -0.88   0.380    -.0648496    .0247445
------------------------------------------------------------------------------

Thanks again for your help!

Santiago

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#4

07 Aug 2020, 03:47

The source of the discrepancies is that the model is now overidentified and the results depend on the choice of the weighting matrix.

When the ivreg2 model is specified in first differences, to replicate it with xtseqreg specified in levels you need to add the option wmatrix(separate).

https://www.kripfganz.de/stata/
Comment
Santiago Franco

Join Date: Jul 2020

Posts: 4
#5

07 Aug 2020, 14:46

Thank you Sebastian, you are right. Now I get the same results with both commands, including the option wmatrix(separate) for xtseqreg when I have an overidentified model.

Best,

Santiago
Comment

Announcement

XTSEQREG: error when estimating a sequential 2SLS-First Differences model

Comment

Comment

Comment

Comment