Correcting standard errors in IV procedure carried out manually

Dario Maimone Ansaldo Patti

Join Date: Aug 2014
Posts: 505

Correcting standard errors in IV procedure carried out manually

26 Sep 2021, 12:17

Dear All,

I need to run an IV estimation, but for some specific reasons I need to do it manually. Obviously, it is necessary to correct the s.e. at the second stage. A post on Stata website suggests the procedure to do so (the only difference is that I included in the first stage even x1, the exogenous regressor in the second stage):

Code:

sysuse auto, clear

rename price y1
rename mpg y2
rename displacement z1
rename turn x1 

regress y2 z1 x1
predict double y2hat

regress y1 y2hat x1

rename y2hat y2hold
rename y2 y2hat

predict double res, residual

rename y2hat y2                       
rename y2hold y2hat  

replace res = res^2  

summarize res

scalar realmse = r(mean)*r(N)/e(df_r) 
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
matrix Vmatrix = e(V) * realmse / e(rmse)^2

ereturn post bmatrix Vmatrix, noclear

ereturn display

The red part of the code is used to compute correctly the standard errors.

As you can note the result is the same, if I estimate the model using ivregress 2sls:

Manually

Code:

------------------------------------------------------------------------------
          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       y2hat |  -882.4067   317.3699    -2.78   0.007    -1515.224   -249.5891
          x1 |    -626.99   315.5781    -1.99   0.051    -1256.235    2.254868
       _cons |   49817.44   19060.57     2.61   0.011     11811.74    87823.15
------------------------------------------------------------------------------

ivregress

Code:

ivregress 2sls y1 x1 (y2=z1),  small

------------------------------------------------------------------------------
          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          y2 |  -882.4067   317.3699    -2.78   0.007    -1515.224   -249.5891
          x1 |    -626.99   315.5781    -1.99   0.051    -1256.235    2.254868
       _cons |   49817.44   19060.57     2.61   0.011     11811.74    87823.15
------------------------------------------------------------------------------

if I use robust standard errors in both the manual procedure and in the ivregress one the results are substantially different:

Manually with robust s.e.

Code:

------------------------------------------------------------------------------
             |               Robust
          y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       y2hat |  -882.4067   389.2006    -2.27   0.026    -1658.451   -106.3628
          x1 |    -626.99   381.1144    -1.65   0.104    -1386.911    132.9304
       _cons |   49817.44   23418.69     2.13   0.037     3121.906    96512.98
------------------------------------------------------------------------------

ivregress with robust s.e.

Code:

------------------------------------------------------------------------------
             |               Robust
          y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          y2 |  -882.4067   313.0633    -2.82   0.006    -1506.637   -258.1763
          x1 |    -626.99   281.2112    -2.23   0.029    -1187.709   -66.27079
       _cons |   49817.44   17619.86     2.83   0.006     14684.45    84950.44
------------------------------------------------------------------------------

Theoretically, this is expected, as the variance-covariance matrix is different when robust standard errors are used. However, I cannot figure out how I can reformulate the correction above (the red part) to include to account for the robust variance-covariance matrix. Any suggestion?

Thanks in advance

Dario

Tags: ivregress

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

26 Sep 2021, 12:41

Nothing in your code relates to robust standard errors/variance.

So it is not clear on what you want feedback.
Comment
Dario Maimone Ansaldo Patti

Join Date: Aug 2014

Posts: 505
#3

26 Sep 2021, 13:10

Joro Kolev thanks for your reply. The first part of the code refers to a case without robust standard errors. If i apply the same code with the changes in blue:

Code:

sysuse auto, clear rename price y1 rename mpg y2 rename displacement z1 rename turn x1 regress y2 z1 x1, r predict double y2hat regress y1 y2hat x1, r rename y2hat y2hold rename y2 y2hat predict double res, residual rename y2hat y2 rename y2hold y2hat replace res = res^2 summarize res scalar realmse = r(mean)*r(N)/e(df_r) matrix bmatrix = e(b) matrix Vmatrix = e(V) matrix Vmatrix = e(V) * realmse / e(rmse)^2 ereturn post bmatrix Vmatrix, noclear ereturn display ivregress 2sls y1 (y2=Z) x1, r small

i obtain different results as indicated in my previous post. So my point is how I could obtain the same results no matter if i carry out IV estimation using ivregress or manually. I cannot figurate how to change the lines in red in my first post to account for a different robust vcv matrix.

Dario
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

27 Sep 2021, 19:26

Here is how you compute manually the robust variance for TSLS. (And I did not fiddle with the degrees of freedom.)

Code:

. sysuse auto, clear
(1978 Automobile Data)

. 
. rename price y1

. rename mpg y2

. rename displacement z1

. rename turn x1

. 
. regress y2 z1 x1

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     47.39
       Model |  1396.95646         2  698.478229   Prob > F        =    0.0000
    Residual |    1046.503        71  14.7394789   R-squared       =    0.5717
-------------+----------------------------------   Adj R-squared   =    0.5596
       Total |  2443.45946        73  33.4720474   Root MSE        =    3.8392

------------------------------------------------------------------------------
          y2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          z1 |  -.0233485    .007769    -3.01   0.004    -.0388394   -.0078576
          x1 |  -.5671898   .1621789    -3.50   0.001    -.8905654   -.2438143
       _cons |    48.3922   5.346396     9.05   0.000     37.73179    59.05261
------------------------------------------------------------------------------

. predict double y2hat
(option xb assumed; fitted values)

. regress y1 y2hat x1, mse1

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 74)        >  99999.00
       Model |   164538571         2  82269285.5   Prob > F        =    0.0000
    Residual |   470526825        74  6358470.61   R-squared       =    0.2591
-------------+----------------------------------   Adj R-squared   =    0.2691
       Total |   635065396        73  8699525.97   Root MSE        =         1

------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       y2hat |  -882.4067   .0866692 -1.0e+04   0.000    -882.5794    -882.234
          x1 |    -626.99   .0861799 -7275.37   0.000    -627.1618   -626.8183
       _cons |   49817.44   5.205169  9570.76   0.000     49807.07    49827.82
------------------------------------------------------------------------------

. matrix bmatrix = e(b)

. matrix Vmatrix = e(V)

. 
. rename y2hat y2hold

. rename y2 y2hat

. 
. predict double res, residual

. 
. rename y2hat y2

. rename y2hold y2hat

. replace res = res^2
(74 real changes made)

. 
. mat accum Meat = y2hat x1 [iw = res]
(obs=952051990.4)

. 
. 
. matrix Vmatrix = Vmatrix*Meat*Vmatrix

. 
. 
. ereturn post bmatrix Vmatrix, noclear

. ereturn display
------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       y2hat |  -882.4067   306.6517    -2.88   0.005    -1493.424   -271.3898
          x1 |    -626.99    275.452    -2.28   0.026     -1175.84   -78.13991
       _cons |   49817.44      17259     2.89   0.005     15428.13    84206.75
------------------------------------------------------------------------------

.   
. ivregress 2sls y1 (y2=z1) x1, robust

Instrumental variables (2SLS) regression          Number of obs   =         74
                                                  Wald chi2(2)    =       9.39
                                                  Prob > chi2     =     0.0091
                                                  R-squared       =          .
                                                  Root MSE        =     3586.9

------------------------------------------------------------------------------
             |               Robust
          y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y2 |  -882.4067   306.6517    -2.88   0.004    -1483.433   -281.3804
          x1 |    -626.99    275.452    -2.28   0.023    -1166.866     -87.114
       _cons |   49817.44      17259     2.89   0.004     15990.42    83644.46
------------------------------------------------------------------------------
Instrumented:  y2
Instruments:   x1 z1

Comment

Dario Maimone Ansaldo Patti

Join Date: Aug 2014

Posts: 505
#5

28 Sep 2021, 05:59

Joro Kolev Thanks a lot for your insight. The "Meat" part is the one that I was not able to code. Actually, I was trying using mata but still I had some problems. You code is extremely helpful.
Comment

Dario Maimone Ansaldo Patti

Join Date: Aug 2014
Posts: 505

08 Oct 2021, 16:56

Dear Joro Kolev, Dear All

I have a follow up question. I used the code Joro provided in post #4 using my dataset. I made few small changes to account for missing values. Specifically:

Code:

regress y2 z1 z12 z2 x1 x2 x3 if y1!=., r
predict double y2hat if e(sample)
regress y1 y2hat x1 x2 x3, mse1
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
rename y2hat y2hold
rename y2 y2hat
predict double res, residual
rename y2hat y2
rename y2hold y2hat
replace res = res^2
mat accum Meat = y2hat x1 x2 x3 [iw = res] if y1!=.
matrix Vmatrix = Vmatrix*Meat*Vmatrix
ereturn post bmatrix Vmatrix, noclear
ereturn display
ivregress 2sls y1 (y2=z1 z12 z2) x1 x2 x3, robust first

In the above models z12 is the squared values of z1. Both z1 and z2 are weakly time variant. The exogenous regressors x1, x2 and x3 are time invariant. If I estimate the IV model manually, I obtain:

First stage:

Code:

Linear regression                               Number of obs     =     76,814
                                                F(6, 76807)       =      14.89
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0012
                                                Root MSE          =     25.151

------------------------------------------------------------------------------
             |               Robust
          y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          z1 |   .0134117    .033855     0.40   0.692     -.052944    .0797673
         z12 |  -5.53e-06   .0008375    -0.01   0.995    -.0016469    .0016359
          z2 |  -.0019404   .0014015    -1.38   0.166    -.0046873    .0008066
          x1 |   .0412252   .0230858     1.79   0.074    -.0040229    .0864732
          x2 |  -.0008498   .0047419    -0.18   0.858     -.010144    .0084443
          x3 |   .0346339   .0075961     4.56   0.000     .0197456    .0495222
       _cons |   .2908111   .3428981     0.85   0.396    -.3812673    .9628896
------------------------------------------------------------------------------

Second Stage:

Code:

. ereturn display
------------------------------------------------------------------------------
          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       y2hat |   64.07488   30.14269     2.13   0.034     4.995364    123.1544
          x1 |  -.6878107   2.246888    -0.31   0.760      -5.0917    3.716078
          x2 |   -.082632   .3035083    -0.27   0.785    -.6775067    .5122427
          x3 |  -.0457806   1.146236    -0.04   0.968    -2.292398    2.200837
       _cons |  -20.01899     17.663    -1.13   0.257    -54.63839    14.60041
------------------------------------------------------------------------------

Using ivregress I get:

Code:

. ivregress 2sls y1 (y2=z1 z12 z2) x1 x2 x3, robust first

First-stage regressions
-----------------------

                                                       Number of obs =  76,814
                                                       F(6, 76807)   =   14.89
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.0012
                                                       Adj R-squared =  0.0011
                                                       Root MSE      = 25.1505

------------------------------------------------------------------------------
             |               Robust
          y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         z1  |   .0134117    .033855     0.40   0.692     -.052944    .0797673
         z12 |  -5.53e-06   .0008375    -0.01   0.995    -.0016469    .0016359
          z2 |  -.0019404   .0014015    -1.38   0.166    -.0046873    .0008066
          x1 |   .0412252   .0230858     1.79   0.074    -.0040229    .0864732
          x2 |  -.0008498   .0047419    -0.18   0.858     -.010144    .0084443
          x3 |   .0346339   .0075961     4.56   0.000     .0197456    .0495222
       _cons |   .2908111   .3428981     0.85   0.396    -.3812673    .9628896
------------------------------------------------------------------------------


Instrumental variables 2SLS regression            Number of obs   =     76,814
                                                  Wald chi2(4)    =      74.12
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =          .
                                                  Root MSE        =     1620.1

------------------------------------------------------------------------------
             |               Robust
          y1 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          y2 |   64.07488   30.14269     2.13   0.034     4.996295    123.1535
          x1 |  -.6878107   2.246888    -0.31   0.760     -5.09163    3.716009
          x2 |   -.082632   .3035083    -0.27   0.785    -.6774974    .5122333
          x3 |  -.0457806   1.146236    -0.04   0.968    -2.292362    2.200801
       _cons |  -20.01899     17.663    -1.13   0.257    -54.63784    14.59986
------------------------------------------------------------------------------

Results are identical.

Now suppose that I add some country dummies (new1-new161) and a trend (date) to the model above. The code changes accordingly:

Code:

regress y2 z1 z12 z2 x1 x2 x3 new1-new161 c.date if y1!=., r
predict double y2hat if e(sample)
regress y1 y2hat x1 x2 x3 new1-new161 c.date, mse1
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
rename y2hat y2hold
rename y2 y2hat
predict double res, residual
rename y2hat y2
rename y2hold y2hat
replace res = res^2
mat accum Meat = y2hat x1 x2 x3 new1-new161 c.date [iw = res] if y1!=.
matrix Vmatrix = Vmatrix*Meat*Vmatrix
ereturn post bmatrix Vmatrix, noclear
ereturn display
ivregress 2sls y1 (y2=z1 z12 z2) x1 x2 x3 new1-new161 c.date, robust first

Now something that I cannot explain occurs. Doing IV manually, I get (I removed from the results below the country dummies and the trend, although they are included in the estimation):

First Stage:

Code:

Linear regression                               Number of obs     =     76,814
                                                F(144, 76668)     =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0383
                                                Root MSE          =     24.701

------------------------------------------------------------------------------
             |               Robust
          y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          z1 |  -2.463026   .4370156    -5.64   0.000    -3.319574   -1.606477
         z12 |   .0364213   .0088151     4.13   0.000     .0191437    .0536989
          z2 |  -2.024472   .5814382    -3.48   0.000    -3.164088   -.8848566
          x1 |  -50.99909   14.93742    -3.41   0.001    -80.27635   -21.72183
          x2 |  -2.136277   .6285056    -3.40   0.001    -3.368145   -.9044091
          x3 |  -10.63291    2.94176    -3.61   0.000    -16.39875   -4.867078
       _cons |   2418.978   511.3641     4.73   0.000     1416.707    3421.249
------------------------------------------------------------------------------

Second stage:

Code:

. ereturn display
------------------------------------------------------------------------------
          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       y2hat |  -6.251991   .9465268    -6.61   0.000    -8.107178   -4.396803
          x1 |   6.091359     .58326    10.44   0.000     4.948172    7.234545
          x2 |  -.9006199   .0975376    -9.23   0.000    -1.091793   -.7094467
          x3 |   1.396808   .1781995     7.84   0.000     1.047538    1.746078
        date |   .0422543   .0258671     1.63   0.102    -.0084451    .0929537
       _cons |  -941.2898   580.7457    -1.62   0.105    -2079.548    196.9687
------------------------------------------------------------------------------

If I use ivregress, I obtain:

Code:

First-stage regressions
-----------------------

                                                       Number of obs =  76,814
                                                       F(145, 76668) =   10.48
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.0383
                                                       Adj R-squared =  0.0365
                                                       Root MSE      = 24.7015

------------------------------------------------------------------------------
             |               Robust
          y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          z1 |  -2.463026   .4370813    -5.64   0.000    -3.319703   -1.606349
         z12 |   .0364213   .0088164     4.13   0.000     .0191412    .0537015
          z2 |  -2.024474   .5815258    -3.48   0.000    -3.164262   -.8846863
          x1 |  -8.945115   1.514435    -5.91   0.000     -11.9134   -5.976831
          x2 |    1.23147   .2301185     5.35   0.000     .7804386    1.682501
          x3 |  -2.863724   .4507995    -6.35   0.000    -3.747289   -1.980159
       _cons |   707.2298   24.11607    29.33   0.000     659.9624    754.4972
------------------------------------------------------------------------------

Code:

Instrumental variables 2SLS regression            Number of obs   =     76,814
                                                  Wald chi2(143)  =   13290.41
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =          .
                                                  Root MSE        =     212.29

------------------------------------------------------------------------------
             |               Robust
          y1 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          y2 |  -6.251988   .9465264    -6.61   0.000    -8.107146   -4.396831
          x1 |  -3.325746   5.355277    -0.62   0.535     -13.8219    7.170405
          x2 |    1.53259   1.122069     1.37   0.172     -.666625    3.731806
          x3 |   .3424514   .6935874     0.49   0.621    -1.016955    1.701858
       _cons |  -923.2611   574.7166    -1.61   0.108    -2049.685    203.1628
------------------------------------------------------------------------------

As you can notice, in the first stage the variables (and the s.e.) of z1 z12 z2 are basically identical. Instead, the estimations of the parameters associated to the exogenous ones are totally different. If I move to the second stage, the estimation of the endogenous parameter (and its s.e.) is identical in the two cases, while the other coefficients are totally different (even the sign is different, not only the magnitude).

I tried to re-estimate the model manually dropping the constant in both the first and the second stage. I removed robust standard errors, but the difference persists. I should also point out that Stata removes some country dummies and initially, I though such a difference in the estimation could be due to different removed dummies. Actually this is not the case: the same country dummies are removed in both procedure.

I noted that if I include continent dummies rather than country dummies (keeping the trend in) the results are identical. So I guess the difference in the results is due to the presence of the country dummies. However I cannot understand why and how I can fix this issue.

Do you have any idea why such differences occur? I tried to solve this issue for 3 days now, but still I could not find a solution.

I did not included the datase via datex as it is quite large. But it can be downloaded from here

Many thanks for your help.

Last edited by Dario Maimone Ansaldo Patti; 08 Oct 2021, 17:16.

Announcement

Correcting standard errors in IV procedure carried out manually

Comment

Comment

Comment

Comment

Comment