Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with endogeneity and sample selection biases in the same model

    Dear Statalist,

    I am trying to deal with two forms of bias in my analysis of probability into different sectors of employment and earnings functions in different sectors - sample selection and endogeneity of the education variable.

    Sample selection: I use a participation model using a multinomial logit (base category: unemployed/not employed) with the following sectors - public, formal private, informal private, self-employment, agriculture. I use this model to address the issue of sample selection. I can use the probabilities of employment in each sector (the inverse Mill's ratios) and include them in each sector's OLS wage functions.

    Endogeneity: However, the issue is following the above procedure is that education (which is thought to be exogenous under OLS) may in fact be endogenous. It is important to address this form of bias, along with sample selection. I have the following instruments to address endogeneity - parents'/spouse's education, change in compulsory schooling laws.

    I am faced with two options and I would appreciate any advice on which seems to be the more appropriate option. Unfortunately, I have not come across any economics literature that tries to deal with both forms of bias, despite the vast literature on these two issues (for example, returns to education literature using Instrumental Variables, occupational choice using Heckman).

    1. The first method was obtained from Wooldridge (2010) method in Chapter 19.

    The first stage is a participation model as a multinomial logit (with sectors: public, formal private, informal private, self-employment, agriculture; base category: unemployed/not employed). I then use the predicted probabilities from this model for each outcome, obtain the inverse Mill's ratios and include them as additional regressions in the respective wage functions which are estimated using IV (2SLS) where education is treated as an endogenous regressor.

    (a) mlogit employment age age2 age3 married female urban policy mother_educ father_educ spouse_educ mother_public_sector mother_private_sector father_public_sector father_private_sector spouse_public_sector spouse_private_sector proportion_of_kids_below_6 proportion_of_kids_between_6_and_18 proportion_of_elders_over_65

    where employment=1 (public sector), 2 (formal private), 3 (informal), 4 (self), 5(agriculture), 0 (unemployed/not employed); default categories are: male, not married, rural, parents/spouse not employed

    (b) obtain predicted probabilities

    (c) ivregress 2sls log_hourly_wage (years_in_education = policy mother_educ father_educ spouse_educ mother_public_sector mother_private_sector father_public_sector father_private_sector spouse_public_sector spouse_private_sector proportion_of_kids_below_6 proportion_of_kids_between_6_and_18 proportion_of_elders_over_65) age age2 age3 married female urban lambda1 if employment==1, robust first

    (done similarly for each sector)

    Wooldridge uses this technique for female labour force participation, thus using probit rather than multinomial logit (i.e. the first stage is: 0= not employed/unemployed, 1=employed). My concern is whether this approach still works for a mulltinomial logit.Logically, it doesn't seem to make sense with a multinomial logit because education is obtained before choosing the sector of employment. So it seems incorrect to obtained a reduced form for education in each sector.

    2. My second option is to use the predicted values of education using a reduced form:

    (a) regress years_in_education policy mother_educ father_educ spouse_educ mother_public_sector mother_private_sector father_public_sector father_private_sector spouse_public_sector spouse_private_sector proportion_of_kids_below_6 proportion_of_kids_between_6_and_18 proportion_of_elders_over_65 age age2 age3 married female urban

    (b) and then carry out the MNL and OLS wage estimations using predicted education.

    However, Woolridge states that this method makes strong assumptions on the error term of the reduced form.

    I would be very grateful to receive any advice on this matter.

    Thank you

    Reference:
    Wooldridge JM (2010) Econometric analysis of cross section and panel data, 2nd edn. The MIT Press, Cambridge

  • #2
    You should look at cmp and SEM/GSEM estimators - they can handle these problems.

    Comment


    • #3
      Thanks for the help, but this does not apply to my work. And I cannot find any literature on this. I am hoping to follow the Wooldridge method. If I can get a solution based on that, that will be helpful

      Comment

      Working...
      X