Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • SUR instead of IV?

    Dear All,

    One general question: If the model contains an endogenous independent variable (x), we usually apply a (2SLS) IV estimation.

    Let's assume, we run the two-stage least squares model:

    First stage: \hat(x) = a_0 + a_1z + u_1
    Second stage: y = b_0 +b_1\hat(x) + u_2

    x is endogenous because of o.v.b. in the main equation. However, these variables are available but irrelevant for y (they are not included in the main model to keep itl parsimonous). In this case, could we use a seemingly unrelated regression (SUR) and estimate the model in two simultaneous regressions, instead of (2SLS) IV?

    It would then we written as:
    y = c_0 +c_1x + u_3 and x=d_0 + d_1X + u_4?

    Thanks!

  • #2
    Originally posted by Kerstin Schmidt View Post
    In this case, could we use a seemingly unrelated regression (SUR) and estimate the model in two simultaneous regressions, instead of (2SLS) IV?
    Yes. Consider:

    Code:
    webuse hsng2, clear
    ivregress 2sls rent (hsngval = faminc), first
    sureg (rent=hsngval) (hsngval = faminc), isure nolog
    Res.:

    Code:
    . ivregress 2sls rent (hsngval = faminc), first
    
    First-stage regressions
    -----------------------
    
                                                    Number of obs     =         50
                                                    F(   1,     48)   =      40.69
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.4588
                                                    Adj R-squared     =     0.4475
                                                    Root MSE          = 11722.1199
    
    ------------------------------------------------------------------------------
         hsngval |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          faminc |   4.081283   .6398354     6.38   0.000     2.794808    5.367758
           _cons |  -31100.69   12586.39    -2.47   0.017    -56407.32    -5794.06
    ------------------------------------------------------------------------------
    
    
    Instrumental variables (2SLS) regression          Number of obs   =         50
                                                      Wald chi2(1)    =      63.36
                                                      Prob > chi2     =     0.0000
                                                      R-squared       =     0.4359
                                                      Root MSE        =     26.287
    
    ------------------------------------------------------------------------------
            rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         hsngval |   .0027983   .0003516     7.96   0.000     .0021093    .0034874
           _cons |   99.08714   17.44581     5.68   0.000     64.89399    133.2803
    ------------------------------------------------------------------------------
    Instrumented:  hsngval
    Instruments:   faminc
    
    . 
    . sureg (rent=hsngval) (hsngval = faminc), isure nolog
    
    Seemingly unrelated regression, iterated 
    --------------------------------------------------------------------------
    Equation             Obs   Parms        RMSE    "R-sq"       chi2        P
    --------------------------------------------------------------------------
    rent                  50       1    26.28669    0.4359     284.40   0.0000
    hsngval               50       1    11485.28    0.4588      87.28   0.0000
    --------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    rent         |
         hsngval |   .0027983   .0001659    16.86   0.000     .0024731    .0031235
           _cons |   99.08728   8.862468    11.18   0.000     81.71716    116.4574
    -------------+----------------------------------------------------------------
    hsngval      |
          faminc |   4.081287    .436855     9.34   0.000     3.225067    4.937507
           _cons |  -31100.78   8672.106    -3.59   0.000    -48097.79   -14103.76
    ------------------------------------------------------------------------------
    
    .

    Comment


    • #3
      Nice - that is what I suspected! Thank you!!

      Comment


      • #4
        Yes, but it is imperative that you use the -sureg, isure- option as Andrew did above. The -sureg- has to be iterated in such called triangular systems to give you the correct maximum likelihood estimates.

        Also what Andrew showed above is a coincidence coming from the fact that he fit an exactly identified model.

        In overidentified models such iterated sureg models are equivalent to the Limited Information Maximum Likelihood.

        E.g.,

        Code:
        . webuse hsng2, clear
        (1980 Census housing data)
        
        . ivregress 2sls rent (hsngval = faminc reg1 reg2 reg3 reg4)
        note: reg4 omitted because of collinearity.
        
        Instrumental variables 2SLS regression            Number of obs   =         50
                                                          Wald chi2(1)    =      86.61
                                                          Prob > chi2     =     0.0000
                                                          R-squared       =     0.5798
                                                          Root MSE        =     22.687
        
        ------------------------------------------------------------------------------
                rent | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
             hsngval |   .0023311   .0002505     9.31   0.000     .0018402     .002822
               _cons |   121.7403   12.56056     9.69   0.000     97.12202    146.3585
        ------------------------------------------------------------------------------
        Instrumented: hsngval
         Instruments: faminc reg1 reg2 reg3
        
        . ivregress liml rent (hsngval = faminc reg1 reg2 reg3 reg4)
        note: reg4 omitted because of collinearity.
        
        Instrumental variables LIML regression            Number of obs   =         50
                                                          Wald chi2(1)    =      80.65
                                                          Prob > chi2     =     0.0000
                                                          R-squared       =     0.5230
                                                          Root MSE        =     24.171
        
        ------------------------------------------------------------------------------
                rent | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
             hsngval |   .0025505    .000284     8.98   0.000     .0019938    .0031071
               _cons |   111.1019   14.18782     7.83   0.000     83.29428    138.9095
        ------------------------------------------------------------------------------
        Instrumented: hsngval
         Instruments: faminc reg1 reg2 reg3
        
        . sureg (rent=hsngval) (hsngval = faminc reg1 reg2 reg3 reg4 ), isure nolog
        note: reg4 omitted because of collinearity.
        
        Seemingly unrelated regression, iterated 
        ------------------------------------------------------------------------------
        Equation             Obs   Params         RMSE  "R-squared"      chi2   P>chi2
        ------------------------------------------------------------------------------
        rent                  50        1     24.17075      0.5230     216.20   0.0000
        hsngval               50        4     9542.744      0.6264     140.68   0.0000
        ------------------------------------------------------------------------------
        
        ------------------------------------------------------------------------------
                     | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
        rent         |
             hsngval |   .0025505   .0001735    14.70   0.000     .0022105    .0028905
               _cons |    111.102   9.078152    12.24   0.000     93.30914    128.8948
        -------------+----------------------------------------------------------------
        hsngval      |
              faminc |   4.187762    .399246    10.49   0.000     3.405254     4.97027
                reg1 |  -6293.307   2597.125    -2.42   0.015    -11383.58   -1203.037
                reg2 |  -13543.18   2540.064    -5.33   0.000    -18521.61   -8564.743
                reg3 |  -6432.218   2547.666    -2.52   0.012    -11425.55   -1438.884
                reg4 |          0  (omitted)
               _cons |  -26735.56   8182.961    -3.27   0.001    -42773.87   -10697.25
        ------------------------------------------------------------------------------
        
        .

        Comment


        • #5
          Furthermore this is not just a lucky guess that OP, Andrew, or myself had.

          There is literature behind this:

          Lahiri, Kajal, and Peter Schmidt. "On the estimation of triangular structural systems." Econometrica: Journal of the Econometric Society (1978): 1217-1221.

          Pagan, Adrian. "Some consequences of viewing LIML as an iterated Aitken estimator." Economics Letters 3, no. 4 (1979): 369-372.

          Revankar, N. S. "On the LIML estimation of a structural equation by an iterative generalized least squares method." Economics Letters 7, no. 1 (1981): 63-68.



          Comment


          • #6
            Thank you Joro! What would the equivalent command for panel data? Does the -xtsur- command already include the isure- option of the -sureg-command?

            Comment


            • #7
              Another question:
              When running
              Code:
               
               sureg (rent=hsngval) (hsngval = faminc), isure nolog
              can R2 be interpreted as the variance explained or is it similar to the 2SLS/IV context, where it no longer has a statistical meaning?

              Comment


              • #8
                The IV2SLS estimator applies OLS in two stages, and the R2 statistic in OLS is unambiguous. It is the percentage of variation in the outcome explained by variation in the right-hand side variables.

                Comment


                • #9
                  What about SUR? Can we there also interpret R2 as the % of variation in the outcome explained by the model?

                  Comment


                  • #10
                    Yes. Consider that these correspond to the R2 statistics in the first and second stages of IV2SLS (see the results in #3).

                    Comment


                    • #11
                      Another related question: If using the iterated SUR model instead of IV, do the variables in both equations need to be identical or can they vary?

                      Comment


                      • #12
                        I'm not sure why you want to use SUR in place of 2SLS. As implemented above, the standard errors are not robust to heteroskedasticity (and it wouldn't be robust to serial correlation if you implement it in a panel setting). I think Joro might have a package that does SUR with robust standard errors. What the SUR approach is mimicking in the just identified case is the control function version of 2SLS.

                        Comment


                        • #13
                          Thank you Wooldridge!
                          My problem is that I have an endogenous variable in my main equation of interest. Trying to find a valid and strong IV failed. Yet, I have information which variables determine my endogenous variable (but still fail IV properties), and thought the best I can do is to estimate these two equations simultaneously in an iterated SUR model. Do you have a better idea?

                          Comment

                          Working...
                          X