Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparison and interpretation of OLS and Heckman models results

    Hello, guys!

    I'm completely lame in the world of econometrics as well as in the world of Stata. Actually I'm a student of biology department but some circumstances made me deepen in this sphere of science. Last week I'm hardly trying to hit the interpretation of models myself but realize that don't have enough knowledge on subject. Just trying to be brief and precise: I have the sample of individuals which I use to estimate regression where logarithm of wages is dependent variable, in other words, traditional Mincer equation. Besides standard array of exogenous variables such as education, work experience and so on, I also use the variable called health, which I mostly interested in. First of all, I estimate coefficients with OLS regression, where individuals, who are unemployed, are excluded from the selection as we can't observe wages for those people. As far as I understand, this fact is the reason for so–called sample selection problem and OLS estimated coefficients are invalid (by the way, is it the correct definition to use or "biased" would be better?). Heckman model introduces additional equation which models decision of individual whether to work or not. Then I run Heckman model in Stata with the same regressors, where previously excluded information about individuals who are unemployed is now also included in selection. For additional regression I use age, male, marriage and education as Heckman originally did let alone using marriage instead of children and having variable male as I have both males and females in selection. The attached tables present the results for OLS and Heckman models.

    Click image for larger version

Name:	image_6817.png
Views:	1
Size:	20.4 KB
ID:	1373792 Click image for larger version

Name:	image_6818.png
Views:	1
Size:	35.2 KB
ID:	1373793

    Taking these results into account I have some misunderstanding:
    1) Do the results of F–statistic for OLS regression gathered in addition to R–squared imply that the whole regression is significant and have relatively good explanation power and variable health is significant?
    2) How should be Wald statistics results for Heckman model interpreted?
    3) What does it mean that rho estimation is positive and equals to 0,166?
    4) What is LR test of independent equations, what exactly does it show us?
    5) As I can see, coefficients in OLS model and Heckman model are almost the same, is it bad? Does it mean that there's no use in running Heckman model or implication about usefulness should come from some other signs?

    I would be very grateful if you "explain it to me like I'm a four–year–old" as I'm afraid I just can't operate with professional language of econometrics.
    Cheers, Guest.
    Last edited by sladmin; 17 Jul 2017, 10:43. Reason: anonymize poster

  • #2
    Guest:
    welcome to the list.
    All the clarifications you need are reported under -heckman- entry in Stata .pdf manual (Example 1 seems particularly enlightening).
    For the future , please read the FAQ on how to post more effectively (screenshots are deprecated; using CODE delimiters from posting what you typed and what Stata gave you back is highly welcomed). Thanks.
    Last edited by sladmin; 17 Jul 2017, 10:43. Reason: anonymize poster
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Mr. Lazzaro, thank you for your welcomes and reply.

      The file that you mentioned made things a little more clear for me. Having examined the file I suppose that:
      1) The null hypothesis of the F–statistics is that our model is bad, explains none of the variation and, therefore, has no explanation power at all. The p–value for F–test which we have here equals to 0,000 that is less than 0,01 so we can say that our model is statistically significant. The only thing I can't determine is whether R–squared is high enough so I don't know if we can say that model is good. Taking into account what I've seen in other empirical studies where R–squared is frequently doesn't reach even 0,2 I suppose that model is quite good.
      2) The Wald statistics also witnesses about significance of a whole regression. As its value is also quite high and p–value equals to 0,000 that is less than 0,01, we can say that the whole regression is statistically significant and provdes good level of explanation.
      3) The rho estimation in Heckman model is 0,166 what implies that OLS regression techniques applied to the basic equation yields biased results, therefore, the Heckman model provides more consistent and more efficient estimates for given set of parameters. Also I have found some evidence that if rho doesn't equal to zero it is enough to infer that there is selection bias. But in this case I don't undestand why the coefficients are the same in both models? Does that imply that I should provide a better specification for additional equation? Maybe add some more variables?
      4) The likelihood-ratio test reported at the bottom of the output is an equivalent test with the null hypothesis that rho equals to 0. Because chi–square here is 4,83 and p–value equals to 0,028, it justifies that we reject the null hypothesis on 5% and 10% levels of significance and implies that the Heckman selection equation with these data is useful and better than standard OLS regression.
      5) As statistical tests provide justification that the Heckman model is better in this case for given set of parameters, still don't get the trick about the identical coefficients.

      Looking forward to finding out whether my insights are right or not.
      Cheers, Guest.
      Last edited by sladmin; 17 Jul 2017, 10:43. Reason: anonymize poster

      Comment


      • #4
        Guest:
        your insights are correct, exception made for the meaning of the F-test in OLS: the null is the all the coefficients are zero (bad or good model is a qualitative opinion).
        As you implied, the highness of R-sq is often research field-dependent.
        I cannot follow your concern about the similarity of the coefficients reported in OLS and -heckman- outcome tables: why do you think that it can be a problem?
        What would you have expected instead?
        Last edited by sladmin; 17 Jul 2017, 10:44. Reason: anonymize poster
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Well, that's quite joyful to hear, thanks a lot, sir.

          My concerns originate from the contradiction of, on the one hand, the similarity of the coefficients and, on the other hand, the fact that estimations of OLS regression are biased and, therefore, wrong. So if the Heckman model provides more consistent and more efficient estimates comparing to OLS, I would have expected at least slightly different results. Maybe it's just this specific case that they are equivalent, because I definitely met works where Heckman model outcomes were different from standard OLS outcomes. By the way, that's another reason for concern because I just don't know how to explain with simple worlds why application of Heckman model has not changed a thing to people whom these results are going to be presented to.

          Cheers, Guest.
          Last edited by sladmin; 17 Jul 2017, 10:44. Reason: anonymize poster

          Comment


          • #6
            Guest (just in case; please note the strong preference for real given and family names on this list. How to re-register accordingly is covered in FAQ):
            You may want to try what happens when you robustify the standard errors in -heckman-.
            I confirm I cannot see any reason of concern.
            As a closing-out remark, please call me Carlo (my given name) as all on (and most off) the list do.
            Last edited by sladmin; 17 Jul 2017, 10:44. Reason: anonymize poster
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Okay, Carlo, roger that.

              It's really good that there is no reason for concerns, however, still can't hit economic interpretation for this.
              Another finding which I discussed with my friend is that sample has too many people who not only don't work but are not going to, for example, old pensioners. Therefore, additional equation in heckit can't model adequatlely their decision about participation in labor market. So further I'm going to exclude these people from the selection and see what Stata will give me back.

              Cheers, Guest.
              Last edited by sladmin; 17 Jul 2017, 10:44. Reason: anonymize poster

              Comment


              • #8
                After robustifying the errors and excluding people I've mentioned before from the selection the results still don't differ from OLS estimations. Certainly, it's very interesting to know what causes this indifference in terms of labor market participation. Is it possible that the additional equation just doesn't explain the decision of individual whether to participate in labor market or not or is there some other reason?

                Cheers, Guest.
                Last edited by sladmin; 17 Jul 2017, 10:45. Reason: anonymize poster

                Comment


                • #9
                  Guest:
                  does the literature in your research field support those results?
                  Last edited by sladmin; 17 Jul 2017, 10:45. Reason: anonymize poster
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Yeah, completely, the signs of the coefficients are the same as the theory predicts. Also the type of the selection is totally the same that Heckman used in his research.

                    Cheers, Guest.
                    Last edited by sladmin; 17 Jul 2017, 10:45. Reason: anonymize poster

                    Comment


                    • #11
                      I've finally found out the reason for the situation which is described above. For those who will share the same problem in the future, I can highly recommend the book by Jeffrey Wooldridge. In chapter about limited dependent variable models he presents example where OLS and heckit models result in same coefficient estimates. He explains that it means that selection equation that you use in Heckman model can barely describe a binary variable which indicates an individual's decision. In fact there is almost no modelling of this indicator, therefore you get estimates that have no difference. However, he says that is not definitely the case that you make something wrong: often sample that you have at disposal just cannot provide anything that would be a better explanatory variable for selection equation.

                      As a matter of fact, for the further work in that direction I would like to ask what is going to be the best solution now: continue using standard OLS model or try to apply tobit–model?

                      Reference: Wooldridge J. M. Econometric Analysis of Cross–Section & Panel Data: Second Edition. — The MIT Press, 2010.
                      Last edited by sladmin; 17 Jul 2017, 10:46. Reason: anonymize poster

                      Comment

                      Working...
                      X