Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CMP with two recursive Heckman selection models and correlated error terms

    My target model uses two Heckman selection models to model four equations (2 x binary selection variables: Y1 and Y2; 2 x continuous outcome variables: Y3 and Y4). I want to model Y3 conditional on Y1 (Heckman model 1) and Y4 conditonal on Y2 (Heckman model 2). The model has a recursive structure, such that Y3 affects both Y2 and Y4 (selection and outcome equations in the second Heckman model). Each dependent variable has a unique set of precitor variables (X1, X2, X3, X4). In the end, I want to correlate the error terms of all four equations.

    Here is a depiction (please not that we model Y3 conditonal on Y1 and Y4 conditional on Y2, but the binary dependent variable Y1/Y2 is not a predictor of Y3/Y4):

    Click image for larger version

Name:	Model Structure.png
Views:	1
Size:	5.5 KB
ID:	1704569


    Recently, I came across the cmp package and documentation provided by David Roodman and I hope that I can use the package to implement this model.

    Drawing from the documentation, the code for Heckman model 1 would be:

    cmp (Y3 = X3) (Y1 = X1),
    indicators(Y1 $cmp probit) nolrtest quietly

    And the code for Heckman model 2 would be:

    cmp (Y4 = X4 + Y3) (Y2 = X3 + Y3),
    indicators(Y2 $cmp probit) nolrtest quietly

    Is there any chance to combine these two Heckman models and correlate the error terms using cmp?

    I'm looking forward to all ideas, many thanks in advcance.


  • #2
    Meanwhile, I've combined the models using cmp. Is the following correct:
    HTML Code:
    cmp (Y3 = X3) (Y1 = X1) (Y4 = X4 + Y3) (Y2 = X2 + Y3), indicators(Y2 $cmp_probit Y4 $cmp_probit) quietly
    Last edited by Jochen Pohlmann; 07 Mar 2023, 05:14.

    Comment


    • #3
      Yes, that looks right to me, except in Stata you don't use a "+" in the equations.

      Comment


      • #4
        Thanks, your package works great.

        I just have the problem that when combining the two Heckman models, Stata/cmp tells me "convergence not achieved". When running the two Heckman models separately, Stata does not throw the non-convergence warning. I have already optimized the scaling of the independent variables, such that they are on very similar scales, and I am using logged dependent variables in the outcome equations.

        Any guess what could be going on? May the 6 error term correlations (between the four equations) be too hard to estimate? My total N = 702,000.

        Thanks in advance.

        Comment


        • #5
          Here is the output. I have adapted the variable labels, such that it aligns with the modeling framework I have depicted in my first post.
          Any ideas what's going on?
          Attached Files

          Comment


          • #6
            It's hard to say. Stata provides diagnostic tools for the ml command, on which cmp is built. You could try adding the "interactive" option to the cmp command line. Then, when it seems close to converging, you can interrupt it by clicking on the red X in the Stata toolbar or hitting Ctrl-C, or whatever the equivalent is on your computer. Then you can use the ml plot command to graph the log likelihood as a function of one parameter at a time at the current best fit. (The command is not well documented except in the Stata book about ml.) For example "ml atanhrho_23:_cons" should give you a plot with respect to that parameter.

            Separately--this does not require the "interactive" option--you can add the "trace" and "gradient" options, which cmp passes to ml. (See "help maximize" and "help ml".) These will make it dump the current parameter values and gradient estimate as it searches, which might give insight.

            And you can try the "difficult" and the "technique()" options, which actually affect the search algorithm.

            --David

            Comment


            • #7
              Thank you very much, David. I will certainly follow these convergence diagnostics steps and report back whether I was able to identify the non-convergence sources.

              Looking at the output, I just saw that there are no Std. errors, intervals etc. for three parameters (_cons in Y4, and atanhrho23/rho23). Does this already tell that these could be the problematic parameters in the estimation?

              Thanks.

              Comment


              • #8
                Yes, I would focus on those parameters, as in my ml plot example.

                Comment


                • #9
                  Changing the search algorithm to " tech(dfp)" has solved the issue, thank you very much.

                  Comment

                  Working...
                  X