Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental Variable (2SLS) First stage different sample compared to Second Stage?

    Dear all,

    I have a question with regard to Instrumental variable regression.

    My sample is identified at the firm-year level (combination of variable firm_id and year uniquely identifies my observations). My dependent variable (Y) also varies at that level (firm-year). Now, I am interested in the effect of a variable (X) that varies only by country. Also there is no variation in the combination of firm_id and country.

    In an attempt to establish a causal link I used Stata command ivreg2 and included a new variable (Z) as an instrument for X. This variable Z also varies at the country level (same as X).

    My commands look like this:

    Code:
    ivreg2 Y (X=Z), savefirst
    This leads to approximately 13000 observations in the first stage and the second stage. First stage and second stage coefficients seem reasonable.

    Now, I received a comment that this is not the correct way to approach the Instrumental variable estimation. The comment says that I should run the First Stage regression on the country level only because X and Z only vary at this level. This would yield the correct coefficient that can be included in the second stage. I would end up with a first stage with approx. 30 observations and a second stage with 13000 observations. I did this "by hand" by merging the predicted values of X from a first regression on the country level to my firm-year sample. Then I included the predicted values in a simple OLS (reg). I am fairly sure that this is not correct because standard errors are incorrect when using predicted values in a simple OLS regression.

    My question: Is there a way to approach this problem in Stata with ivreg or ivreg2? Or else, is there a way to do this "by hand" but handle the problem of using predicted values?

    Thank you all very much in advance and best wishes,
    Simon



  • #2
    If you are interested in my view on how you should handle the situation: You simply need to cluster at the level at which your X and Z are varying to the end of 30 clusters.You might "lose significance", but I think this is the appropriate thing to do in your situation.

    If you are asking how to manually compute correct standard errors/variances for TSLS, look at this FAQ: https://www.stata.com/support/faqs/s...es-regression/

    Comment


    • #3
      Thank you very much! That is helpful.

      Comment


      • #4
        Joro, I am trying to replicate the results from the link you send, but it is not working for me. Am I missing something?



        Code:
        sysuse auto , clear
        
        rename price y1
        rename mpg y2
        rename displacement z1
        rename turn x1 
        
        
        regress y2 z1
        
        predict double y2hat 
        regress y1 y2hat x1 
        
        rename y2hat y2hold
        rename y2 y2hat
        predict double res, residual
        rename y2hat y2                       /* put back real y2 */
        rename y2hold y2hat  
        replace res = res^2  
        
        sum res
        
        
        scalar realmse = r(mean)*r(N)/e(df_r) 
                                          /* much ado about small sample */
        matrix bmatrix = e(b)
        matrix Vmatrix = e(V)
        matrix Vmatrix = e(V) * realmse / e(rmse)^2
        ereturn post bmatrix Vmatrix, noclear
        ereturn display



        My results look like this and the last regression is quite different from the output displayed in the FAQ:


        Click image for larger version

Name:	Screenshot 2021-09-27 at 11.35.19.png
Views:	1
Size:	330.8 KB
ID:	1629200

        Comment


        • #5
          Do not benchmark yourself to Vince Wiggins because he is doing something slightly nonstandard, benchmark yourself to the results of -ivregress 2sls-

          The only error that I see in your code is that in the first stage you must include all exogenous variables, therefore your first stage

          Code:
           
           regress y2 z1
          should be

          Code:
           
           regress y2 z1 x1

          Comment


          • #6
            It is easier for me to just do it myself, rather than follow what Vince Wiggins is doing...

            So here is a replication, my way:

            Code:
            sysuse auto, clear
            
            * Lets say I want to replicate this regression:
            
            ivregress 2sls price  (headroom = length weight turn) mpg
            
            * First stage, note I am including ALL exogenous variables
            
            reg headroom mpg  length weight turn
            
            predict headhat
            
            * Second stage
            
            reg price headhat mpg, mse1
            
            gen double mpgres = price - _b[_cons] - _b[mpg]*mpg - _b[headhat]*headroom
            
            summ c.mpgres#c.mpgres
            
            mat V = r(mean)* e(V)
            mat b = e(b)
            
            ereturn post b V 
            
            ereturn display
            which results in :

            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . 
            . * Lets say I want to replicate this regression:
            . 
            . ivregress 2sls price  (headroom = length weight turn) mpg
            
            Instrumental variables (2SLS) regression          Number of obs   =         74
                                                              Wald chi2(2)    =      18.32
                                                              Prob > chi2     =     0.0001
                                                              R-squared       =     0.0666
                                                              Root MSE        =     2830.2
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                headroom |   1200.537   1254.985     0.96   0.339    -1259.188    3660.263
                     mpg |   -166.251     95.104    -1.75   0.080    -352.6514     20.1494
                   _cons |   6112.454   5520.174     1.11   0.268    -4706.888     16931.8
            ------------------------------------------------------------------------------
            Instrumented:  headroom
            Instruments:   mpg length weight turn
            
            . 
            . * First stage, note I am including ALL exogenous variables
            . 
            . reg headroom mpg  length weight turn
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(4, 69)        =      6.33
                   Model |  14.0321944         4  3.50804861   Prob > F        =    0.0002
                Residual |  38.2144272        69  .553832278   R-squared       =    0.2686
            -------------+----------------------------------   Adj R-squared   =    0.2262
                   Total |  52.2466216        73  .715707146   Root MSE        =     .7442
            
            ------------------------------------------------------------------------------
                headroom |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |   -.002775   .0258912    -0.11   0.915    -.0544265    .0488765
                  length |   .0227138   .0128729     1.76   0.082    -.0029668    .0483945
                  weight |  -.0000273   .0003694    -0.07   0.941    -.0007642    .0007096
                    turn |  -.0162218   .0406179    -0.40   0.691    -.0972524    .0648088
                   _cons |   -.490733   1.922416    -0.26   0.799    -4.325848    3.344382
            ------------------------------------------------------------------------------
            
            . 
            . predict headhat
            (option xb assumed; fitted values)
            
            . 
            . * Second stage
            . 
            . reg price headhat mpg, mse1
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(2, 74)        >  99999.00
                   Model |   146779665         2  73389832.6   Prob > F        =    0.0000
                Residual |   488285731        74  6598455.82   R-squared       =    0.2311
            -------------+----------------------------------   Adj R-squared   =    0.2415
                   Total |   635065396        73  8699525.97   Root MSE        =         1
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                 headhat |   1200.538   .4434229  2707.43   0.000     1199.654    1201.421
                     mpg |   -166.251    .033603 -4947.50   0.000    -166.3179    -166.184
                   _cons |   6112.453   1.950439  3133.89   0.000     6108.566    6116.339
            ------------------------------------------------------------------------------
            
            . 
            . gen double mpgres = price - _b[_cons] - _b[mpg]*mpg - _b[headhat]*headroom
            
            . 
            . summ c.mpgres#c.mpgres
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                c.mpgres#|
                c.mpgres |         74     8010155    1.37e+07   .2421612   9.38e+07
            
            . 
            . mat V = r(mean)* e(V)
            
            . mat b = e(b)
            
            . 
            . ereturn post b V 
            
            . 
            . ereturn display 
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                 headhat |   1200.538   1254.985     0.96   0.339    -1259.188    3660.263
                     mpg |   -166.251     95.104    -1.75   0.080    -352.6514    20.14943
                   _cons |   6112.453   5520.174     1.11   0.268     -4706.89     16931.8
            ------------------------------------------------------------------------------
            
            .
            and the results from TSLS programmed in Stata, and from me manually doing it are the same.

            Comment


            • #7
              Here is a related thread where I show how to calculate manually the robust variance of TSLS: https://www.statalist.org/forums/for...d-out-manually

              Comment


              • #8
                Perfect! Thank you very much :-)

                Comment

                Working...
                X