Instrumental Variable (2SLS) First stage different sample compared to Second Stage?

Simon Lesmeister

Join Date: Jan 2020

Posts: 5
#1

Instrumental Variable (2SLS) First stage different sample compared to Second Stage?

27 Sep 2021, 02:23

Dear all,

I have a question with regard to Instrumental variable regression.

My sample is identified at the firm-year level (combination of variable firm_id and year uniquely identifies my observations). My dependent variable (Y) also varies at that level (firm-year). Now, I am interested in the effect of a variable (X) that varies only by country. Also there is no variation in the combination of firm_id and country.

In an attempt to establish a causal link I used Stata command ivreg2 and included a new variable (Z) as an instrument for X. This variable Z also varies at the country level (same as X).

My commands look like this:

Code:

ivreg2 Y (X=Z), savefirst

This leads to approximately 13000 observations in the first stage and the second stage. First stage and second stage coefficients seem reasonable.

Now, I received a comment that this is not the correct way to approach the Instrumental variable estimation. The comment says that I should run the First Stage regression on the country level only because X and Z only vary at this level. This would yield the correct coefficient that can be included in the second stage. I would end up with a first stage with approx. 30 observations and a second stage with 13000 observations. I did this "by hand" by merging the predicted values of X from a first regression on the country level to my firm-year sample. Then I included the predicted values in a simple OLS (reg). I am fairly sure that this is not correct because standard errors are incorrect when using predicted values in a simple OLS regression.

My question: Is there a way to approach this problem in Stata with ivreg or ivreg2? Or else, is there a way to do this "by hand" but handle the problem of using predicted values?

Thank you all very much in advance and best wishes,
Simon
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

27 Sep 2021, 02:34

If you are interested in my view on how you should handle the situation: You simply need to cluster at the level at which your X and Z are varying to the end of 30 clusters.You might "lose significance", but I think this is the appropriate thing to do in your situation.

If you are asking how to manually compute correct standard errors/variances for TSLS, look at this FAQ: https://www.stata.com/support/faqs/s...es-regression/
Comment
Simon Lesmeister

Join Date: Jan 2020

Posts: 5
#3

27 Sep 2021, 03:06

Thank you very much! That is helpful.
Comment

Simon Lesmeister

Join Date: Jan 2020
Posts: 5

27 Sep 2021, 03:37

Joro, I am trying to replicate the results from the link you send, but it is not working for me. Am I missing something?

Code:

sysuse auto , clear

rename price y1
rename mpg y2
rename displacement z1
rename turn x1 


regress y2 z1

predict double y2hat 
regress y1 y2hat x1 

rename y2hat y2hold
rename y2 y2hat
predict double res, residual
rename y2hat y2                       /* put back real y2 */
rename y2hold y2hat  
replace res = res^2  

sum res


scalar realmse = r(mean)*r(N)/e(df_r) 
                                  /* much ado about small sample */
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
matrix Vmatrix = e(V) * realmse / e(rmse)^2
ereturn post bmatrix Vmatrix, noclear
ereturn display

My results look like this and the last regression is quite different from the output displayed in the FAQ:

Click image for larger version

Name: Screenshot 2021-09-27 at 11.35.19.png
Views: 1
Size: 330.8 KB
ID: 1629200

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

27 Sep 2021, 10:37

Do not benchmark yourself to Vince Wiggins because he is doing something slightly nonstandard, benchmark yourself to the results of -ivregress 2sls-

The only error that I see in your code is that in the first stage you must include all exogenous variables, therefore your first stage

Code:

regress y2 z1

should be

Code:

regress y2 z1 x1
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

27 Sep 2021, 11:07

It is easier for me to just do it myself, rather than follow what Vince Wiggins is doing...

So here is a replication, my way:

Code:

sysuse auto, clear

* Lets say I want to replicate this regression:

ivregress 2sls price  (headroom = length weight turn) mpg

* First stage, note I am including ALL exogenous variables

reg headroom mpg  length weight turn

predict headhat

* Second stage

reg price headhat mpg, mse1

gen double mpgres = price - _b[_cons] - _b[mpg]*mpg - _b[headhat]*headroom

summ c.mpgres#c.mpgres

mat V = r(mean)* e(V)
mat b = e(b)

ereturn post b V 

ereturn display

which results in :

Code:

. sysuse auto, clear
(1978 Automobile Data)

. 
. * Lets say I want to replicate this regression:
. 
. ivregress 2sls price  (headroom = length weight turn) mpg

Instrumental variables (2SLS) regression          Number of obs   =         74
                                                  Wald chi2(2)    =      18.32
                                                  Prob > chi2     =     0.0001
                                                  R-squared       =     0.0666
                                                  Root MSE        =     2830.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    headroom |   1200.537   1254.985     0.96   0.339    -1259.188    3660.263
         mpg |   -166.251     95.104    -1.75   0.080    -352.6514     20.1494
       _cons |   6112.454   5520.174     1.11   0.268    -4706.888     16931.8
------------------------------------------------------------------------------
Instrumented:  headroom
Instruments:   mpg length weight turn

. 
. * First stage, note I am including ALL exogenous variables
. 
. reg headroom mpg  length weight turn

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(4, 69)        =      6.33
       Model |  14.0321944         4  3.50804861   Prob > F        =    0.0002
    Residual |  38.2144272        69  .553832278   R-squared       =    0.2686
-------------+----------------------------------   Adj R-squared   =    0.2262
       Total |  52.2466216        73  .715707146   Root MSE        =     .7442

------------------------------------------------------------------------------
    headroom |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   -.002775   .0258912    -0.11   0.915    -.0544265    .0488765
      length |   .0227138   .0128729     1.76   0.082    -.0029668    .0483945
      weight |  -.0000273   .0003694    -0.07   0.941    -.0007642    .0007096
        turn |  -.0162218   .0406179    -0.40   0.691    -.0972524    .0648088
       _cons |   -.490733   1.922416    -0.26   0.799    -4.325848    3.344382
------------------------------------------------------------------------------

. 
. predict headhat
(option xb assumed; fitted values)

. 
. * Second stage
. 
. reg price headhat mpg, mse1

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 74)        >  99999.00
       Model |   146779665         2  73389832.6   Prob > F        =    0.0000
    Residual |   488285731        74  6598455.82   R-squared       =    0.2311
-------------+----------------------------------   Adj R-squared   =    0.2415
       Total |   635065396        73  8699525.97   Root MSE        =         1

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     headhat |   1200.538   .4434229  2707.43   0.000     1199.654    1201.421
         mpg |   -166.251    .033603 -4947.50   0.000    -166.3179    -166.184
       _cons |   6112.453   1.950439  3133.89   0.000     6108.566    6116.339
------------------------------------------------------------------------------

. 
. gen double mpgres = price - _b[_cons] - _b[mpg]*mpg - _b[headhat]*headroom

. 
. summ c.mpgres#c.mpgres

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    c.mpgres#|
    c.mpgres |         74     8010155    1.37e+07   .2421612   9.38e+07

. 
. mat V = r(mean)* e(V)

. mat b = e(b)

. 
. ereturn post b V 

. 
. ereturn display 
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     headhat |   1200.538   1254.985     0.96   0.339    -1259.188    3660.263
         mpg |   -166.251     95.104    -1.75   0.080    -352.6514    20.14943
       _cons |   6112.453   5520.174     1.11   0.268     -4706.89     16931.8
------------------------------------------------------------------------------

.

and the results from TSLS programmed in Stata, and from me manually doing it are the same.

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

27 Sep 2021, 19:34

Here is a related thread where I show how to calculate manually the robust variance of TSLS: https://www.statalist.org/forums/for...d-out-manually
1 like
Comment
Simon Lesmeister

Join Date: Jan 2020

Posts: 5
#8

28 Sep 2021, 00:20

Perfect! Thank you very much :-)
Comment

Announcement