Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correcting standard errors in IV procedure carried out manually

    Dear All,

    I need to run an IV estimation, but for some specific reasons I need to do it manually. Obviously, it is necessary to correct the s.e. at the second stage. A post on Stata website suggests the procedure to do so (the only difference is that I included in the first stage even x1, the exogenous regressor in the second stage):

    Code:
    sysuse auto, clear
    
    rename price y1
    rename mpg y2
    rename displacement z1
    rename turn x1 
    
    regress y2 z1 x1
    predict double y2hat
    
    regress y1 y2hat x1
    
    rename y2hat y2hold
    rename y2 y2hat
    
    predict double res, residual
    
    rename y2hat y2                       
    rename y2hold y2hat  
    
    replace res = res^2  
    
    summarize res
    
    scalar realmse = r(mean)*r(N)/e(df_r) 
    matrix bmatrix = e(b)
    matrix Vmatrix = e(V)
    matrix Vmatrix = e(V) * realmse / e(rmse)^2
    
    ereturn post bmatrix Vmatrix, noclear
    
    ereturn display
    The red part of the code is used to compute correctly the standard errors.

    As you can note the result is the same, if I estimate the model using ivregress 2sls:

    Manually

    Code:
    ------------------------------------------------------------------------------
              y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           y2hat |  -882.4067   317.3699    -2.78   0.007    -1515.224   -249.5891
              x1 |    -626.99   315.5781    -1.99   0.051    -1256.235    2.254868
           _cons |   49817.44   19060.57     2.61   0.011     11811.74    87823.15
    ------------------------------------------------------------------------------
    ivregress

    Code:
    ivregress 2sls y1 x1 (y2=z1),  small
    
    ------------------------------------------------------------------------------
              y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              y2 |  -882.4067   317.3699    -2.78   0.007    -1515.224   -249.5891
              x1 |    -626.99   315.5781    -1.99   0.051    -1256.235    2.254868
           _cons |   49817.44   19060.57     2.61   0.011     11811.74    87823.15
    ------------------------------------------------------------------------------
    if I use robust standard errors in both the manual procedure and in the ivregress one the results are substantially different:

    Manually with robust s.e.

    Code:
    ------------------------------------------------------------------------------
                 |               Robust
              y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           y2hat |  -882.4067   389.2006    -2.27   0.026    -1658.451   -106.3628
              x1 |    -626.99   381.1144    -1.65   0.104    -1386.911    132.9304
           _cons |   49817.44   23418.69     2.13   0.037     3121.906    96512.98
    ------------------------------------------------------------------------------
    ivregress with robust s.e.

    Code:
    ------------------------------------------------------------------------------
                 |               Robust
              y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              y2 |  -882.4067   313.0633    -2.82   0.006    -1506.637   -258.1763
              x1 |    -626.99   281.2112    -2.23   0.029    -1187.709   -66.27079
           _cons |   49817.44   17619.86     2.83   0.006     14684.45    84950.44
    ------------------------------------------------------------------------------
    Theoretically, this is expected, as the variance-covariance matrix is different when robust standard errors are used. However, I cannot figure out how I can reformulate the correction above (the red part) to include to account for the robust variance-covariance matrix. Any suggestion?

    Thanks in advance

    Dario




  • #2
    Nothing in your code relates to robust standard errors/variance.

    So it is not clear on what you want feedback.

    Comment


    • #3
      Joro Kolev thanks for your reply. The first part of the code refers to a case without robust standard errors. If i apply the same code with the changes in blue:

      Code:
      sysuse auto, clear
      
      rename price y1
      rename mpg y2
      rename displacement z1
      rename turn x1
      
      regress y2 z1 x1, r
      predict double y2hat
      regress y1 y2hat x1, r
      
      rename y2hat y2hold
      rename y2 y2hat
      
      predict double res, residual
      
      rename y2hat y2
      rename y2hold y2hat
      replace res = res^2
      
      summarize res
      
      scalar realmse = r(mean)*r(N)/e(df_r)
      matrix bmatrix = e(b)
      matrix Vmatrix = e(V)
      matrix Vmatrix = e(V) * realmse / e(rmse)^2
      
      ereturn post bmatrix Vmatrix, noclear
      ereturn display
        
      ivregress 2sls y1 (y2=Z) x1, r small
      i obtain different results as indicated in my previous post. So my point is how I could obtain the same results no matter if i carry out IV estimation using ivregress or manually. I cannot figurate how to change the lines in red in my first post to account for a different robust vcv matrix.

      Dario

      Comment


      • #4
        Here is how you compute manually the robust variance for TSLS. (And I did not fiddle with the degrees of freedom.)

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . 
        . rename price y1
        
        . rename mpg y2
        
        . rename displacement z1
        
        . rename turn x1
        
        . 
        . regress y2 z1 x1
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(2, 71)        =     47.39
               Model |  1396.95646         2  698.478229   Prob > F        =    0.0000
            Residual |    1046.503        71  14.7394789   R-squared       =    0.5717
        -------------+----------------------------------   Adj R-squared   =    0.5596
               Total |  2443.45946        73  33.4720474   Root MSE        =    3.8392
        
        ------------------------------------------------------------------------------
                  y2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                  z1 |  -.0233485    .007769    -3.01   0.004    -.0388394   -.0078576
                  x1 |  -.5671898   .1621789    -3.50   0.001    -.8905654   -.2438143
               _cons |    48.3922   5.346396     9.05   0.000     37.73179    59.05261
        ------------------------------------------------------------------------------
        
        . predict double y2hat
        (option xb assumed; fitted values)
        
        . regress y1 y2hat x1, mse1
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(2, 74)        >  99999.00
               Model |   164538571         2  82269285.5   Prob > F        =    0.0000
            Residual |   470526825        74  6358470.61   R-squared       =    0.2591
        -------------+----------------------------------   Adj R-squared   =    0.2691
               Total |   635065396        73  8699525.97   Root MSE        =         1
        
        ------------------------------------------------------------------------------
                  y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               y2hat |  -882.4067   .0866692 -1.0e+04   0.000    -882.5794    -882.234
                  x1 |    -626.99   .0861799 -7275.37   0.000    -627.1618   -626.8183
               _cons |   49817.44   5.205169  9570.76   0.000     49807.07    49827.82
        ------------------------------------------------------------------------------
        
        . matrix bmatrix = e(b)
        
        . matrix Vmatrix = e(V)
        
        . 
        . rename y2hat y2hold
        
        . rename y2 y2hat
        
        . 
        . predict double res, residual
        
        . 
        . rename y2hat y2
        
        . rename y2hold y2hat
        
        . replace res = res^2
        (74 real changes made)
        
        . 
        . mat accum Meat = y2hat x1 [iw = res]
        (obs=952051990.4)
        
        . 
        . 
        . matrix Vmatrix = Vmatrix*Meat*Vmatrix
        
        . 
        . 
        . ereturn post bmatrix Vmatrix, noclear
        
        . ereturn display
        ------------------------------------------------------------------------------
                  y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               y2hat |  -882.4067   306.6517    -2.88   0.005    -1493.424   -271.3898
                  x1 |    -626.99    275.452    -2.28   0.026     -1175.84   -78.13991
               _cons |   49817.44      17259     2.89   0.005     15428.13    84206.75
        ------------------------------------------------------------------------------
        
        .   
        . ivregress 2sls y1 (y2=z1) x1, robust
        
        Instrumental variables (2SLS) regression          Number of obs   =         74
                                                          Wald chi2(2)    =       9.39
                                                          Prob > chi2     =     0.0091
                                                          R-squared       =          .
                                                          Root MSE        =     3586.9
        
        ------------------------------------------------------------------------------
                     |               Robust
                  y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                  y2 |  -882.4067   306.6517    -2.88   0.004    -1483.433   -281.3804
                  x1 |    -626.99    275.452    -2.28   0.023    -1166.866     -87.114
               _cons |   49817.44      17259     2.89   0.004     15990.42    83644.46
        ------------------------------------------------------------------------------
        Instrumented:  y2
        Instruments:   x1 z1

        Comment


        • #5
          Joro Kolev Thanks a lot for your insight. The "Meat" part is the one that I was not able to code. Actually, I was trying using mata but still I had some problems. You code is extremely helpful.

          Comment


          • #6
            Dear Joro Kolev, Dear All

            I have a follow up question. I used the code Joro provided in post #4 using my dataset. I made few small changes to account for missing values. Specifically:

            Code:
            regress y2 z1 z12 z2 x1 x2 x3 if y1!=., r
            predict double y2hat if e(sample)
            regress y1 y2hat x1 x2 x3, mse1
            matrix bmatrix = e(b)
            matrix Vmatrix = e(V)
            rename y2hat y2hold
            rename y2 y2hat
            predict double res, residual
            rename y2hat y2
            rename y2hold y2hat
            replace res = res^2
            mat accum Meat = y2hat x1 x2 x3 [iw = res] if y1!=.
            matrix Vmatrix = Vmatrix*Meat*Vmatrix
            ereturn post bmatrix Vmatrix, noclear
            ereturn display
            ivregress 2sls y1 (y2=z1 z12 z2) x1 x2 x3, robust first
            In the above models z12 is the squared values of z1. Both z1 and z2 are weakly time variant. The exogenous regressors x1, x2 and x3 are time invariant. If I estimate the IV model manually, I obtain:

            First stage:

            Code:
            Linear regression                               Number of obs     =     76,814
                                                            F(6, 76807)       =      14.89
                                                            Prob > F          =     0.0000
                                                            R-squared         =     0.0012
                                                            Root MSE          =     25.151
            
            ------------------------------------------------------------------------------
                         |               Robust
                      y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                      z1 |   .0134117    .033855     0.40   0.692     -.052944    .0797673
                     z12 |  -5.53e-06   .0008375    -0.01   0.995    -.0016469    .0016359
                      z2 |  -.0019404   .0014015    -1.38   0.166    -.0046873    .0008066
                      x1 |   .0412252   .0230858     1.79   0.074    -.0040229    .0864732
                      x2 |  -.0008498   .0047419    -0.18   0.858     -.010144    .0084443
                      x3 |   .0346339   .0075961     4.56   0.000     .0197456    .0495222
                   _cons |   .2908111   .3428981     0.85   0.396    -.3812673    .9628896
            ------------------------------------------------------------------------------
            Second Stage:

            Code:
            . ereturn display
            ------------------------------------------------------------------------------
                      y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                   y2hat |   64.07488   30.14269     2.13   0.034     4.995364    123.1544
                      x1 |  -.6878107   2.246888    -0.31   0.760      -5.0917    3.716078
                      x2 |   -.082632   .3035083    -0.27   0.785    -.6775067    .5122427
                      x3 |  -.0457806   1.146236    -0.04   0.968    -2.292398    2.200837
                   _cons |  -20.01899     17.663    -1.13   0.257    -54.63839    14.60041
            ------------------------------------------------------------------------------
            Using ivregress I get:

            Code:
            . ivregress 2sls y1 (y2=z1 z12 z2) x1 x2 x3, robust first
            
            First-stage regressions
            -----------------------
            
                                                                   Number of obs =  76,814
                                                                   F(6, 76807)   =   14.89
                                                                   Prob > F      =  0.0000
                                                                   R-squared     =  0.0012
                                                                   Adj R-squared =  0.0011
                                                                   Root MSE      = 25.1505
            
            ------------------------------------------------------------------------------
                         |               Robust
                      y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                     z1  |   .0134117    .033855     0.40   0.692     -.052944    .0797673
                     z12 |  -5.53e-06   .0008375    -0.01   0.995    -.0016469    .0016359
                      z2 |  -.0019404   .0014015    -1.38   0.166    -.0046873    .0008066
                      x1 |   .0412252   .0230858     1.79   0.074    -.0040229    .0864732
                      x2 |  -.0008498   .0047419    -0.18   0.858     -.010144    .0084443
                      x3 |   .0346339   .0075961     4.56   0.000     .0197456    .0495222
                   _cons |   .2908111   .3428981     0.85   0.396    -.3812673    .9628896
            ------------------------------------------------------------------------------
            
            
            Instrumental variables 2SLS regression            Number of obs   =     76,814
                                                              Wald chi2(4)    =      74.12
                                                              Prob > chi2     =     0.0000
                                                              R-squared       =          .
                                                              Root MSE        =     1620.1
            
            ------------------------------------------------------------------------------
                         |               Robust
                      y1 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                      y2 |   64.07488   30.14269     2.13   0.034     4.996295    123.1535
                      x1 |  -.6878107   2.246888    -0.31   0.760     -5.09163    3.716009
                      x2 |   -.082632   .3035083    -0.27   0.785    -.6774974    .5122333
                      x3 |  -.0457806   1.146236    -0.04   0.968    -2.292362    2.200801
                   _cons |  -20.01899     17.663    -1.13   0.257    -54.63784    14.59986
            ------------------------------------------------------------------------------
            Results are identical.

            Now suppose that I add some country dummies (new1-new161) and a trend (date) to the model above. The code changes accordingly:

            Code:
            regress y2 z1 z12 z2 x1 x2 x3 new1-new161 c.date if y1!=., r
            predict double y2hat if e(sample)
            regress y1 y2hat x1 x2 x3 new1-new161 c.date, mse1
            matrix bmatrix = e(b)
            matrix Vmatrix = e(V)
            rename y2hat y2hold
            rename y2 y2hat
            predict double res, residual
            rename y2hat y2
            rename y2hold y2hat
            replace res = res^2
            mat accum Meat = y2hat x1 x2 x3 new1-new161 c.date [iw = res] if y1!=.
            matrix Vmatrix = Vmatrix*Meat*Vmatrix
            ereturn post bmatrix Vmatrix, noclear
            ereturn display
            ivregress 2sls y1 (y2=z1 z12 z2) x1 x2 x3 new1-new161 c.date, robust first
            Now something that I cannot explain occurs. Doing IV manually, I get (I removed from the results below the country dummies and the trend, although they are included in the estimation):

            First Stage:

            Code:
            Linear regression                               Number of obs     =     76,814
                                                            F(144, 76668)     =          .
                                                            Prob > F          =          .
                                                            R-squared         =     0.0383
                                                            Root MSE          =     24.701
            
            ------------------------------------------------------------------------------
                         |               Robust
                      y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                      z1 |  -2.463026   .4370156    -5.64   0.000    -3.319574   -1.606477
                     z12 |   .0364213   .0088151     4.13   0.000     .0191437    .0536989
                      z2 |  -2.024472   .5814382    -3.48   0.000    -3.164088   -.8848566
                      x1 |  -50.99909   14.93742    -3.41   0.001    -80.27635   -21.72183
                      x2 |  -2.136277   .6285056    -3.40   0.001    -3.368145   -.9044091
                      x3 |  -10.63291    2.94176    -3.61   0.000    -16.39875   -4.867078
                   _cons |   2418.978   511.3641     4.73   0.000     1416.707    3421.249
            ------------------------------------------------------------------------------
            Second stage:

            Code:
            . ereturn display
            ------------------------------------------------------------------------------
                      y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                   y2hat |  -6.251991   .9465268    -6.61   0.000    -8.107178   -4.396803
                      x1 |   6.091359     .58326    10.44   0.000     4.948172    7.234545
                      x2 |  -.9006199   .0975376    -9.23   0.000    -1.091793   -.7094467
                      x3 |   1.396808   .1781995     7.84   0.000     1.047538    1.746078
                    date |   .0422543   .0258671     1.63   0.102    -.0084451    .0929537
                   _cons |  -941.2898   580.7457    -1.62   0.105    -2079.548    196.9687
            ------------------------------------------------------------------------------
            If I use ivregress, I obtain:

            Code:
            First-stage regressions
            -----------------------
            
                                                                   Number of obs =  76,814
                                                                   F(145, 76668) =   10.48
                                                                   Prob > F      =  0.0000
                                                                   R-squared     =  0.0383
                                                                   Adj R-squared =  0.0365
                                                                   Root MSE      = 24.7015
            
            ------------------------------------------------------------------------------
                         |               Robust
                      y2 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                      z1 |  -2.463026   .4370813    -5.64   0.000    -3.319703   -1.606349
                     z12 |   .0364213   .0088164     4.13   0.000     .0191412    .0537015
                      z2 |  -2.024474   .5815258    -3.48   0.000    -3.164262   -.8846863
                      x1 |  -8.945115   1.514435    -5.91   0.000     -11.9134   -5.976831
                      x2 |    1.23147   .2301185     5.35   0.000     .7804386    1.682501
                      x3 |  -2.863724   .4507995    -6.35   0.000    -3.747289   -1.980159
                   _cons |   707.2298   24.11607    29.33   0.000     659.9624    754.4972
            ------------------------------------------------------------------------------
            Code:
            Instrumental variables 2SLS regression            Number of obs   =     76,814
                                                              Wald chi2(143)  =   13290.41
                                                              Prob > chi2     =     0.0000
                                                              R-squared       =          .
                                                              Root MSE        =     212.29
            
            ------------------------------------------------------------------------------
                         |               Robust
                      y1 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                      y2 |  -6.251988   .9465264    -6.61   0.000    -8.107146   -4.396831
                      x1 |  -3.325746   5.355277    -0.62   0.535     -13.8219    7.170405
                      x2 |    1.53259   1.122069     1.37   0.172     -.666625    3.731806
                      x3 |   .3424514   .6935874     0.49   0.621    -1.016955    1.701858
                   _cons |  -923.2611   574.7166    -1.61   0.108    -2049.685    203.1628
            ------------------------------------------------------------------------------
            As you can notice, in the first stage the variables (and the s.e.) of z1 z12 z2 are basically identical. Instead, the estimations of the parameters associated to the exogenous ones are totally different. If I move to the second stage, the estimation of the endogenous parameter (and its s.e.) is identical in the two cases, while the other coefficients are totally different (even the sign is different, not only the magnitude).

            I tried to re-estimate the model manually dropping the constant in both the first and the second stage. I removed robust standard errors, but the difference persists. I should also point out that Stata removes some country dummies and initially, I though such a difference in the estimation could be due to different removed dummies. Actually this is not the case: the same country dummies are removed in both procedure.

            I noted that if I include continent dummies rather than country dummies (keeping the trend in) the results are identical. So I guess the difference in the results is due to the presence of the country dummies. However I cannot understand why and how I can fix this issue.

            Do you have any idea why such differences occur? I tried to solve this issue for 3 days now, but still I could not find a solution.

            I did not included the datase via datex as it is quite large. But it can be downloaded from here

            Many thanks for your help.
            Last edited by Dario Maimone Ansaldo Patti; 08 Oct 2021, 17:16.

            Comment

            Working...
            X