Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-stage Heckman correction using two samples - Data Frames

    Dear all,

    I want to carry out a two-stage Heckman correction using two samples. I have one dataset that gives me the selection to higher education and a second one in which I can only observe observation i (admission to an elite university department) if y*_i>0 (i.e if someone is admitted to higher education). I observe the same variables in both datasets. The only thing I do not observe in the second sample is selection (admission to higher education), nor do I observe the outcome of interest in the first dataset (admission to an elite university department). I want to estimate the selection equation with the first dataset and use the parameter estimates to construct the inverse mills ratio using the second dataset. I am using data frames to store both datasets in memory and i am running the following code:


    **Step 1: linear predictions from selection equation:

    probit HIGHEREDU i.yeard i.edu_f#i.yeard i.occup_f i.sex_stud i.nationality unemp_1, baselevels
    frame change admissions
    predict p1_hat, xb // Calculate predicted value from regression

    ** Compute Inverse Mills ratio
    generate phi_1=normalden(p1_hat)
    generate PHI_1=normal(p1_hat)
    generate lambda_1=phi/PHI


    The "predict p1_hat, xb" command generates only missing values as i think that the estimated parameters are lost when i change frames. Is there any way to store the estimated parameters from the first frame and then use them to construct the inverse mills ratio in the second dataset/frame?

    Any help would be very valuable.

    Best regards,
    Konstantina


  • #2
    What stops you from merging the data and using the maximum likelihood estimator? In this way, you have to go through the trouble of correcting the standard errors as a result of the two-stage estimation. In any case, the estimates are available and can be used across frames, so you need to provide a data example that reproduces what you observe. Also note that you need an exclusion restriction(s) when implementing the Heckman procedure, i.e., a variable(s) that predicts selection but does not predict the outcome. For example, in the empirical model of a woman's labor supply, such variables include the number of kids the woman has and whether she is married. These predict participation in the labor force but do not predict the outcome, wage. Therefore, you cannot have the same variables in both equations. If you do, the model is not identified.
    Last edited by Andrew Musau; 12 Dec 2022, 19:29.

    Comment


    • #3
      Dear Andrew,

      Thank you very much for all your suggestions! I will try the merging now.
      Best,
      Konstantina

      Comment


      • #4
        Dear Andrew Musau and Konstantina Boutsioukou,
        I am facing a similar problem and don't know how to get out of it.
        I have two datasets:
        i) one with administrative data (from firms) in which I have log(wages) as dependent variables (and the goal is to estimate the gender pay gap)
        ii) another (labour force survey) in which I would like to estimate the probability of women to work

        My main issue is that the databases are not homogeneous (for example I am really interested in the number of children and marital status but those aren't in the administrative data in which I wanna estimate the gender gap).
        So far, what I did was to homologate common variables such as age education and seniority , but I don't know what could I use as selection variables. there is a way to exploit the familiar conditions even if they are not in my principal DB?

        Many thanks in advance for your time.

        Comment

        Working...
        X