Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to manually calculate standard errors for instrumental variables?

    Hi

    I am working on statistical inference with instrumental variables (IV) following Wooldridge (2016) Introductory Econometrics, Ch. 15. I am using the Card data set (like the book), with wages as outcome (y), education as a endogenous continuous treatment (x) and distance to college as a binary IV (z).

    I want to calculate the standard errors manually, and preferably additionally in matrix form using Mata. So far, I am able to calculate coefficients but I can't seem to obtain the correct standard errors and would be happy for input on this.

    I obtain the point estimate for βiv with the Wald-estimator:

    βiv = E[y|z=1] - E[y|z=1] / E[x|z=1] - E[x|z=1]

    Code:
    . use http://pped.org/card.dta, clear // Load Card data set
    
    . keep nearc4 educ lwage id 
    
    . rename nearc4 z
    
    . rename educ x
    
    . rename lwage y
    
    . bysort z: sum y x
    
    -----------------------------------------------------------------------------------------------------------------
    -> z = 0
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               y |        957    6.155494    .4328417    4.60517   7.474772
               x |        957    12.69801    2.791523          1         18
    
    -----------------------------------------------------------------------------------------------------------------
    -> z = 1
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               y |      2,053    6.311401    .4402214    4.60517   7.784889
               x |      2,053    13.52703    2.580455          2         18
    
    
    . di (6.311401-6.155494)/(13.52703-12.69801)
    .18806181
    The point estimates can be cross checked by -ivregress-
    Code:
    . ivregress 2sls y (x=z), nohe
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .1880626   .0262826     7.16   0.000     .1365496    .2395756
           _cons |   3.767472   .3487458    10.80   0.000     3.083943    4.451001
    ------------------------------------------------------------------------------
    I now want to proceed by calculating the standard errors. Wooldridge (2016, p. 466) writes that standard errors for βiv is obtained by using the square root of the estimated asymptotic variance, where the latter is obtained by

    Var(βiv) = sigma^2/(SSTx*R^2x,z)

    First, the total sum of squares for x (SSTx), is obtained by
    Code:
    . egen x_bar = mean(x)
    
    . gen SSTx = (x-x_bar)^2
    
    . quiet sum SSTx
    
    . di r(sum)
    21562.08
    Second, R^2x,z is the squared population correlation between x and z, obtained from the regression output
    Code:
    . reg x z, nohe
    ------------------------------------------------------------------------------
               x |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               z |    .829019   .1036988     7.99   0.000     .6256912    1.032347
           _cons |   12.69801   .0856416   148.27   0.000     12.53009    12.86594
    ------------------------------------------------------------------------------
    
    . di .829^2
    .687241
    Finally, sigma^2 is the error variance. Wooldridge says to use the IV residuals û in calculating the error variance

    sigma^2 = 1/(n-2)*sum(û^2)

    Code:
    . quiet reg x z
    
    . predict x_hat
    (option xb assumed; fitted values)
    
    . quiet reg y x_hat, nohe
    
    . predict iv_resid
    (option xb assumed; fitted values)
    
    . quiet sum iv_resid
    
    . di r(sum)
    18848.115
    
    . di (18848.114)^2
    3.553e+08
    
    . gen sigma_squared = 3.553e+08
    
    . tabstat sigma_squared, format(%20.2f)
    
        variable |      mean
    -------------+----------
    sigma_squa~d |        355300000.00
    ------------------------
    
    . di (1/(3010-2))*355300000
    118118.35
    Thus, when finally substituting my values into the formula for Var(βiv) and standard error I get
    Code:
    . di 118118.35/(21562.08*.687241)
    7.971089
    
    . * sigma 
    . di sqrt(7.971089)
    2.8233117
    
    . * se(βiv)
    . di 2.8233117/sqrt(21562.08)
    .01922709
    My main question is: Does someone see where I have miscalculated and can anyone provide the correct calculation?

    I have also worked out IV in Mata, although I am stuck at variance-estimation here as well:
    Code:
    . mata
    : 
    : y=st_data(.,"y")
    
    : 
    : X=st_data(.,"x")
    
    : 
    : Z=st_data(.,"z")
    
    : 
    : X = X, J(rows(X),1,1) // Add constant
    
    : 
    : Z = Z, J(rows(Z),1,1) // Add constant
    
    : 
    : b_iv = luinv(Z'*X)*Z'*y
    
    : 
    : b_iv
                     1
        +---------------+
      1 |  .1880626042  |
      2 |  3.767472015  |
        +---------------+
    
    : end
    (I have posted at stats.stackexchange as well)








  • #2
    Couple of clues here
    1. errors or residuals from the 2sls is: Res=y-x*B_IV. In your code, you were not predicting residuals, but the predicted value x*BIV
    2. ivregress as you use it does not adjust for degrees of freedom for the estimation of standard errors. If you want that to be done you need to include the option, "small"

    HTH

    Comment


    • #3
      Thank you! Glad you discovered the error in the predict command. sigma^2 now becomes 0:

      Code:
      . reg x z, nohe 
      ------------------------------------------------------------------------------
                 x |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                 z |    .829019   .1036988     7.99   0.000     .6256912    1.032347
             _cons |   12.69801   .0856416   148.27   0.000     12.53009    12.86594
      ------------------------------------------------------------------------------
      
      . predict x_hat
      (option xb assumed; fitted values)
      
      . quiet reg y x_hat, nohe
      
      . predict iv_resid, residuals
      
      . quiet sum iv_resid
      
      . di r(sum)
      -8.027e-06
      I have seen Jeff Wooldridge frequently reply to questions on this forum, so I am tagging in case I can get a reply from him. I will keep at it until I find out where the calculation is wrong.

      Comment


      • #4
        Hi Tarjei
        ok, so you are one step closer
        your regression ". quiet reg y x_hat,"
        will get the right betas, but residuals will need to be calculated differently

        The right residuals: res=y-x*b_iv
        the residuals you are creating: res= y-x_hat*b_iv


        Comment


        • #5
          Thanks for your reply. I have tried several ways to create the correct residuals, among them
          Code:
          . gen xh=.
          (3,010 missing values generated)
          
          . gen biv=.
          (3,010 missing values generated)
          
          . mata xh=z*invsym(z'*z)*z'*x
          
          . mata biv=invsym(xh'*xh)*xh'*y
          
          . mata biv
                           1
              +---------------+
            1 |  .1880626042  |
            2 |  3.767472015  |
              +---------------+
          
          . mata st_store(.,"biv",biv)
                        st_store():  3200  conformability error
                           <istmt>:     -  function returned error
          intended to use for res=y-x*b_iv. Do you have any suggestions?

          I do suspect that R^2x,z may be off too. I will add that I have gotten a helpful reply on how to solve this using Mata at stackexchange, but it would be interesting to additionally solve this using the step-by-step way outlined by Wooldridge.

          Comment

          Working...
          X