Two stage two stage least squares method - Help needed

Jun Hong Tan

Join Date: Sep 2019

Posts: 2
#1

Two stage two stage least squares method - Help needed

25 Sep 2019, 03:47

Hi, I am preparing for an Honour Thesis right now and I need to be directed on
(1) Appropriateness of my model
(2) The right syntax to be used

I am working on the topic of Intergenerational Income Mobility, which the ideal model is as follows:

log Y^{s ₌}α+ β₁log Y^f+ β₂multiple characteristics of child + β₃age+ β₄age²+ u_i

Where Y refer to the earnings of the son denoted by s and the earning of the father denoted by f at age 40. The datasets that I have are unfortunately annual data, so I used the TS2SLS method as proposed, by multiple researches.

To achieve this I modelled

log Y^f= γ + δ₁age + δ₂age²+ δ₃multiple characteristics of father + v_ito estimate δ_2,δ_1,γ and δ₃to generate log Y^f at the age of 40 with characteristics found in another dataset, as these are characteristics described by the child.

Then I will use the predicted log Y^{f (hat)}and substitute into the main equation and normalise the son's age to 40 and estimate β_1.

However, I do not know how to transfer the standard errors over into the reduced form equation, if I were to regress step by step.

Can I know what are the proposed commands or syntax in STATA which I could use? I saw bootstrapping, but I am unsure of how bootstrapping might map the standard errors from the first stage into the second stage.

Thank you!

Last edited by Jun Hong Tan; 25 Sep 2019, 03:49.
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#2

25 Sep 2019, 05:18

You should understand a few things. First, what you're proposing is not really "two stage least squares." You are putting exclusion restrictions on the reduced form form log Y^f. You are assuming that there is no partial correlation between father's income and the child's characteristics. (And that father's characteristics do not directly affect child's income.) If you want to do real 2SLS then you should use ivregress 2sls using father characteristics as the listed instruments.

If you want to do your procedure, you can bootstrap the two steps. Or, you can use -gmm- to set this up as a two equation system and estimate the parameters at the same time. You can use a robust weighting matrix, too.
Comment

Jun Hong Tan

Join Date: Sep 2019
Posts: 2

26 Sep 2019, 11:15

Hi,

Thank you so much! I try to set this up in this setting. However, I am unsure of how do I go about constructing the C matrix. Should I convert the categorical variable into multiple dummies? It's almost 75 of them. I am stuck with the Chat portion.

Code:

se "/Users/junhongtan/Desktop/MUS Cleaned Data 1988 - 1998.dta",clear
gen const = 1
qui gmm (lnincomefather - {xb1: i.citystring i.industrystring i.occupationstring i.highestedustring age age2 const}) ///
    instruments(1 : i.citystring i.industrystring i.occupationstring i.highestedustring age age2) ///
    winit(unadjusted,independent) onestep  ///
    deriv(1/xb1 = -1) ///
mat Vx2het = e(V) /*1st round Robust variance estimate of pix2*/
use "/Users/junhongtan/Desktop/SFIECleanedtouse.dta", clear
gen ageoriginal = age
gen age2 = (age*age)
gen ageoriginal2 = age2
replace age = 40
replace age = 1600
/*Generating predicted X*/
qui predict lnincomefatherh /*lnincomefatherh is predicted from father's characteristics in SFIE, along with age normalised to 40*/
mat Vx2het = e(V) /*2nd round replace the variance after prediction, robust*/
scalar kx = 1
scalar ke = 67
gen lnincomeson = log(annualsalary2016adj)
encode city, gen(soncitystring)
encode industry, gen(sonindustrystring)
encode occupation, gen(sonoccupationstring)
encode highesteduc, gen(sonhighesteducstring)
replace ageoriginal = (ageoriginal - 40) /*Normalise to 40 for son*/
replace ageoriginal2 = ((ageoriginal-40)*(ageoriginal-40)) /*Normalise to 40 for son*/
bootstrap  _b, rep(1000): qui reg lnincomeson ageoriginal ageoriginal2 i.soncitystring i.sonindustrystring i.sonoccupationstring i.sonhighesteducstring i.citystring i.industrystring i.occupationstring i.highestedustring age age2,r
mat Vy1het = e(V)*e(df_r)/_N /*Robust variance estimate of piy1,*/ /*without degrees of freedom correction*/

/*TS2SLS estimator*/
bootstrap  _b, rep(1000): qui reg lnincomeson lnincomefather ageoriginal ageoriginal2 i.soncitystring i.sonindustrystring i.sonoccupationstring i.sonhighesteducstring
mat b2s = e(b)
mat b2sx = b2s[1,1..kx]'

/*Constructing C hat*/
bootstrap  _b, rep(50): qui reg citystring lnincomefatherh
mat ch = e(b)'
bootstrap  _b, rep(50): qui reg industrystring lnincomefatherh
mat ch = ch,e(b)'
bootstrap  _b, rep(50): qui reg occupationstring lnincomefatherh
mat ch = ch,e(b)'
bootstrap  _b, rep(50): qui reg highestedustring lnincomefatherh
mat ch = ch,e(b)'
bootstrap  _b, rep(50): qui reg age lnincomefatherh
mat ch = ch,e(b)'
bootstrap  _b, rep(50): qui reg age2 lnincomefatherh
mat ch = ch,e(b)'
mat ch = ch,(J(kx,ke,0)\I(ke))

/*Calculating robust standard errors*/
mat var1het =  ch*Vy1het*ch' + (b2sx' # ch)*Vx2het*(b2sx # ch')
mat seb2shet = vecdiag(cholesky(diag(vecdiag(var1het))))'

/*Displaying the results*/
mat res = b2s',seb2shet
mat colnames res = b_ts2sls se "rob se"
mat rownames res = x1 _cons
matlist res

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#4

27 Sep 2019, 15:16

I think you misunderstand what I was saying. If you use gmm, the equation you've listed above is the second equation. The first equation is the one with lnincomeson as the dependent variable and lnincomefather on the right hand side. If you write down those two equations using GMM, there is no need to bootstrap. You can essentially do it all with one command. I don't know what "C hat" you mean.
Comment

Announcement

Two stage two stage least squares method - Help needed

Comment

Comment

Comment