Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two stage two stage least squares method - Help needed

    Hi, I am preparing for an Honour Thesis right now and I need to be directed on
    (1) Appropriateness of my model
    (2) The right syntax to be used



    I am working on the topic of Intergenerational Income Mobility, which the ideal model is as follows:

    log Ys = α+ β1log Yf + β2multiple characteristics of child + β3age+ β4age2+ ui

    Where Y refer to the earnings of the son denoted by s and the earning of the father denoted by f at age 40. The datasets that I have are unfortunately annual data, so I used the TS2SLS method as proposed, by multiple researches.

    To achieve this I modelled

    log Yf = γ + δ1age + δ2age2 + δ3multiple characteristics of father + vi to estimate δ2, δ1,γ and δ3 to generate log Yf at the age of 40 with characteristics found in another dataset, as these are characteristics described by the child.

    Then I will use the predicted log Yf (hat) and substitute into the main equation and normalise the son's age to 40 and estimate β1.

    However, I do not know how to transfer the standard errors over into the reduced form equation, if I were to regress step by step.

    Can I know what are the proposed commands or syntax in STATA which I could use? I saw bootstrapping, but I am unsure of how bootstrapping might map the standard errors from the first stage into the second stage.

    Thank you!
    Last edited by Jun Hong Tan; 25 Sep 2019, 03:49.

  • #2
    You should understand a few things. First, what you're proposing is not really "two stage least squares." You are putting exclusion restrictions on the reduced form form log Yf . You are assuming that there is no partial correlation between father's income and the child's characteristics. (And that father's characteristics do not directly affect child's income.) If you want to do real 2SLS then you should use ivregress 2sls using father characteristics as the listed instruments.

    If you want to do your procedure, you can bootstrap the two steps. Or, you can use -gmm- to set this up as a two equation system and estimate the parameters at the same time. You can use a robust weighting matrix, too.

    Comment


    • #3
      Hi,

      Thank you so much! I try to set this up in this setting. However, I am unsure of how do I go about constructing the C matrix. Should I convert the categorical variable into multiple dummies? It's almost 75 of them. I am stuck with the Chat portion.

      Code:
      se "/Users/junhongtan/Desktop/MUS Cleaned Data 1988 - 1998.dta",clear
      gen const = 1
      qui gmm (lnincomefather - {xb1: i.citystring i.industrystring i.occupationstring i.highestedustring age age2 const}) ///
          instruments(1 : i.citystring i.industrystring i.occupationstring i.highestedustring age age2) ///
          winit(unadjusted,independent) onestep  ///
          deriv(1/xb1 = -1) ///
      mat Vx2het = e(V) /*1st round Robust variance estimate of pix2*/
      use "/Users/junhongtan/Desktop/SFIECleanedtouse.dta", clear
      gen ageoriginal = age
      gen age2 = (age*age)
      gen ageoriginal2 = age2
      replace age = 40
      replace age = 1600
      /*Generating predicted X*/
      qui predict lnincomefatherh /*lnincomefatherh is predicted from father's characteristics in SFIE, along with age normalised to 40*/
      mat Vx2het = e(V) /*2nd round replace the variance after prediction, robust*/
      scalar kx = 1
      scalar ke = 67
      gen lnincomeson = log(annualsalary2016adj)
      encode city, gen(soncitystring)
      encode industry, gen(sonindustrystring)
      encode occupation, gen(sonoccupationstring)
      encode highesteduc, gen(sonhighesteducstring)
      replace ageoriginal = (ageoriginal - 40) /*Normalise to 40 for son*/
      replace ageoriginal2 = ((ageoriginal-40)*(ageoriginal-40)) /*Normalise to 40 for son*/
      bootstrap  _b, rep(1000): qui reg lnincomeson ageoriginal ageoriginal2 i.soncitystring i.sonindustrystring i.sonoccupationstring i.sonhighesteducstring i.citystring i.industrystring i.occupationstring i.highestedustring age age2,r
      mat Vy1het = e(V)*e(df_r)/_N /*Robust variance estimate of piy1,*/ /*without degrees of freedom correction*/
      
      /*TS2SLS estimator*/
      bootstrap  _b, rep(1000): qui reg lnincomeson lnincomefather ageoriginal ageoriginal2 i.soncitystring i.sonindustrystring i.sonoccupationstring i.sonhighesteducstring
      mat b2s = e(b)
      mat b2sx = b2s[1,1..kx]'
      
      /*Constructing C hat*/
      bootstrap  _b, rep(50): qui reg citystring lnincomefatherh
      mat ch = e(b)'
      bootstrap  _b, rep(50): qui reg industrystring lnincomefatherh
      mat ch = ch,e(b)'
      bootstrap  _b, rep(50): qui reg occupationstring lnincomefatherh
      mat ch = ch,e(b)'
      bootstrap  _b, rep(50): qui reg highestedustring lnincomefatherh
      mat ch = ch,e(b)'
      bootstrap  _b, rep(50): qui reg age lnincomefatherh
      mat ch = ch,e(b)'
      bootstrap  _b, rep(50): qui reg age2 lnincomefatherh
      mat ch = ch,e(b)'
      mat ch = ch,(J(kx,ke,0)\I(ke))
      
      /*Calculating robust standard errors*/
      mat var1het =  ch*Vy1het*ch' + (b2sx' # ch)*Vx2het*(b2sx # ch')
      mat seb2shet = vecdiag(cholesky(diag(vecdiag(var1het))))'
      
      /*Displaying the results*/
      mat res = b2s',seb2shet
      mat colnames res = b_ts2sls se "rob se"
      mat rownames res = x1 _cons
      matlist res

      Comment


      • #4
        I think you misunderstand what I was saying. If you use gmm, the equation you've listed above is the second equation. The first equation is the one with lnincomeson as the dependent variable and lnincomefather on the right hand side. If you write down those two equations using GMM, there is no need to bootstrap. You can essentially do it all with one command. I don't know what "C hat" you mean.

        Comment

        Working...
        X