Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman selection model with a binary dependent variable using ML rather than heckprob

    Hi guys,

    I'm trying to replicate the results from the heckprob command with a ml model. Following the blogpost here (https://blog.stata.com/2015/10/22/pr...tion-by-mlexp/) it was quite easy to imitate the results with the mlexp command, but I'm not getting them with the usual ml model. Using help heckprob is no help because even though they explain which maximum likelihood function is behind the command, they do not explain how to code it.

    So let's assume we have a binary selection equation:

    y1 = a0 + a1*z+u

    and a binary equation of interest (probit model)

    y2 = b0 + b1*x+v

    If y1==0, we do not observe y2. u and v are correlated (correlation rho).

    We can estimate the a0, b0, a1 and b1 easily with:

    Code:
    heckprob y2 x , sel(y1=z)
    or (like in the blog post) with the mlexp command using the following expression:

    Code:
    mlexp (ln(cond(y1,cond(y2,binormal({y2:x _cons},{y1:z _cons}, {rho}),binormal(-{y2:},{y1:}, -{rho})),1-normal({y1:}))))
    However, I'm trying to get the same result with a maximum likelihood estimation. Essentially, I'm trying to write the expression above for the ml model command.
    This is what I have come up with:
    Code:
    program define mlheck
              args lnf Xb Za rho
              quietly replace `lnf' = ln(binormal(`Xb',`Za',`rho')) if $ML_y1==1 & $ML_y2==1
              quietly replace `lnf' = ln(binormal(-`Xb',`Za', -`rho')) if $ML_y1==1 & $ML_y2==0
              quietly replace `lnf' = ln(1-normal(`Za')) if $ML_y1==0
    end
    ml model lf mlheck (Y2: y2 = x)(Y1: y1 =z)(Rho:)
    ml search, repeat(1000)
    ml maximize
    But it doesn't give the correct values. Something is probably wrong in the model, but I have a hard time finding what exactly. All tips are welcome.
    Thanks!

  • #2
    Hi Merijn
    You are almost there, Just two more things you need to do.
    1. Add a "missing" option to your ML command (not necessary for the Stata Example heckprobit)
    2. Use a more stable transformation for rho

    Code:
    capture program drop  mlheck
    program define mlheck
              args lnf Xb Za arho
              tempvar rho
              qui: gen double `rho' = tanh(`arho')
              quietly replace `lnf' = ln(binormal(`Xb',`Za',`rho')) if $ML_y2==1 & $ML_y1==1
              quietly replace `lnf' = ln(binormal(-`Xb',`Za', -`rho')) if $ML_y2==1 & $ML_y1==0
              quietly replace `lnf' = ln(1-normal(`Za')) if $ML_y2==0
              
    end
    
    webuse school, clear
    heckprobit private years logptax, sel(vote=years loginc logptax)
     
    ml model lf mlheck (private: private = years logptax) ///
                        (vote:vote=years loginc logptax) /athrho, maximize missing technique(bhhh nr)
    ml display

    Comment


    • #3
      Hi Fernando,

      Many thanks for your comments. However, I couldn't get it to work. I first used your exact code, replaced it with my values and the estimates were still different. Then I used my own code with your suggestions, and they were still different.
      I have found now that the solution was to simply swap y1 and y2 in both the program and the ml model. It now looks like this:

      Code:
      capture program drop mlheck
      program define mlheck
                args lnf Za Xb rho
                quietly replace `lnf' = ln(binormal(`Xb',`Za',`rho')) if $ML_y1==1 & $ML_y2==1
                quietly replace `lnf' = ln(binormal(-`Xb',`Za', -`rho')) if $ML_y1==1 & $ML_y2==0
                quietly replace `lnf' = ln(1-normal(`Za')) if $ML_y1==0
       end
      ml model lf mlheck (Y1: y1 = z)(Y2: y2 = x)(Rho:), missing
      ml maximize
      Notice how Za and Xb have swapped in the program and Y1 and Y2 have swapped in the ml model. So the selection equation is first. I notice this isn't the case for your own example either.
      I did add your suggestion for "missing". There seems to be no need for the tanh(rho).

      Comment


      • #4
        Hi,
        so, two things you need to keep in mind.
        1. Yes: $ML_y1 and $ML_y2 stand for the first and second dependent variables.
        2. Regarding RHO, there is "technically" no need for that, but it helps for the model estimation, otherwise, you will end up with problems finding a path for the optimal solutions. Remember that RHO should be restricted to be between -1 and 1, but -ml- looks for values between -infinty + infinity. So, for "hard" problems, it may generate problems. Also, this is standard practice, of how Stata estimates the parameter internally.

        Best regards

        Comment


        • #5
          Right, I didn't notice that your $ML_y1 and $ML_y2 were swapped as well. That explains it.
          Thanks for all the tips!

          Comment

          Working...
          X