Heckman ML and 2-Step Questions

Julian Zuñiga

Join Date: Aug 2015

Posts: 11
#1

Heckman ML and 2-Step Questions

17 Sep 2015, 15:55

Hi Stata list Users,

I'm estimating a heckman model for willingness to give up time for conservation activities and I have two questions:

1. How do I interpretate the coefficients of the Heckman ML and two-step models and the mariginal effects (commands)?

2. How do I compare between the 2 models? and the commands to do so.

Please help,
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#2

18 Sep 2015, 16:38

Dear Julian,

The Heckman estimator basically fits a linear model, therefore the interpretation is trivial. Also, the two estimators estimate the same model so there is just one model.

Last, but not the least, are you sure that this is the right approach? From your brief description of the problem it looks as if you have corner-solutions data, not a sample selection problem. If that is the case, you should consider Poisson regression instead.

Joao
Comment
Julian Zuñiga

Join Date: Aug 2015

Posts: 11
#3

18 Sep 2015, 17:34

Dear Joao,

First of all, thanks for the response,

I Think that I should describe it better: I'm Trying to estimate the willingness to give up time (Yes or no, WTT) bya Probit, and the positive amount of that time in hours per week (output regression), for conservation of ecosystem services activities, using behavioral and socio-economic variables. Dependent Variable is in Log.

The Heckman Model result is that indeed the sample selection problem exist for the data. But, I have a doubt of how do I calculate a pseudo R2 and/or Information Criterion if thats right for this kind of model.

Julian,
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#4

19 Sep 2015, 00:30

Dear Julian,

Thanks for the additional information.

What you describe does not sound like a case where the Heckman estimators would have the standard interpretation because your zeros are real zeros and not unobservable contributions. That is, you have corner-solutions rather than sample selection. The Heckman estimators can be used in this context but with a different interpretation (see Wooldridge's book for a discussion). Personally, I think it is unlikely that this will be a good approach.

If you believe that the decision process has two steps, that is, we first decide whether or not to contribute and then decide on the contribution, then you may be better off using a so-called two-part model, which is popular in health economics. If you believe that we decide in just one step, then Poisson regression would be a good starting point.

You may actually test all of these models against each other using the -hpc- command that I have co-written, and which is available form SSC (just type "ssc install hpc"). Have a look at the help file and at the references in there. Please email me if you have trouble downloading the papers.

All the best,

Joao
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#5

19 Sep 2015, 07:15

Hi Julian,

I would like to comment here because I don't agree totally with Joao's suggestions and/or comments. I will explain my reasoning.

From microeconomic theory there are two types of corner-solutions:
A true zero: a solution on the corner where the indifference cuve is tangent to the budget line and thus the marginal rate of substitution equals the relative price.

A constrained zero: the solution is a zero because the variable is constrained to be non-negative and the marginal rate of substitution differs from the relative price.

We should not understand the fact that it's impossible to have a negative donation, so donations are constrained to be non-negative, as all zeroes being true zeroes. When the non-negative constraint is binding, and thus have a constrained zero, we have a censored observation. If it were possible the giver would have wanted to give a negative amount to maximize his/her value, but since it's impossible he/she doesn't give. The true zero is a non-constrained zero and that is actually the giver's optimal choice even if he/she were unconstrained.

The seminal paper on estimating the supply of donations of time and money to charity is Brown and Lankford (1992). They use a Tobit bivariate tobit model, where both variables are censored on the left at zero, to estimate both supplies simultaneously. This clearly shows that their understanding is that not all zeroes are true zeroes and that estimations need to be adjusted for that fact since otherwise you would be putting too much weight on the zero value. Now, in these cases you can also use a two-step estimation and a Heckman sample selection model to estimate the coefficients of the supply equation. The difference here is that the Tobit assumes that the probability of a negative gift is explained by the same variables as the gift supply equation, and the two-step and Heckman sample selection estimators allow you to have a different specification for the probability of giving (and thus not giving) than for the supply of the gift equation. There are other differences but that is the main one. I don't think that the Poisson estimation that Joao suggests is valid in this case for two reasons: first it is for count data and the supply of time is a continuous variable, and second not all zeroes in your observations are true zeroes so even with a Poisson you would be putting an excessive weight on the zero value on estimation.

To compare between the different types of estimations you can check chapter 16 of Cameron and Trivedi (2010) (a must have book honestly) where they go through all three estimators (Tobit, two-step, and sample selection) and compare them using a log-transformed dependent variable as you mention. I wasn't aware of Joao's hpc command, but I will definitely download it and check it out.

References:
Brown, Eleanor and Hamilton Lankford (1992), "Gifts of Money and Gifts of Time: Estimating the Effect of Tax Prices and Available Time," Journal of Public Economics, 47(3), pp. 321-341

Cameron, A. Colin and Pravin K. Trivedi (2010), Microeconometrics Using Stata, Revised ed., College Station, TX USA: Stata Press.

Last edited by Alfonso Sánchez-Peñalver; 19 Sep 2015, 07:18.

Alfonso Sanchez-Penalver
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#6

19 Sep 2015, 14:38

Hello again,

Poisson regression can be used both with count data and with a continuous dependent variable, it is just a generalized linear model with an exponential conditional mean. Using Poisson regression with continuous data is actually a popular approach in areas such as health economics and international trade where data can have many zeros.

The discussion about whether the zeros are true or constrained as defined by Alfonso is a more philosophical one and the answer depends on what we want to do with the model. If we care about the effect of the regressors on observed time donations, then all the zeros are "real". If we care about some theoretical notion of donation that can be both positive or negative, then it makes sense to see the data as censored. If you decide that this is the interpretation that is relevant to you, keep in mind that inference with Tobit of Heckman's estimator depend critically on the assumptions of normality and homoskedasticity that are unlikely to be valid.

All the best,

Joao
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#7

19 Sep 2015, 15:03

Sorry Joao but I continue to disagree. First to mention that economic theory is something philosphical is interesting. Second the wrong assumption leads to biased estimates of the coefficients. The marginal effect is larger if you assume that all zeroes are larger according to your definition. Why? Because for many of those cases that are constrained zeroes, a change in the independent variable would not increase their donation since it would still be negative and stay at zero. If you estimate a model assuming that all zeroes are true zeroes, then it assumes that a change in the independent variable would increase their donation above zero, something that is not true, and thus gets you an average change that is larger than it should actually be. Notice that a Tobit or a Heckman selection model takes into account the effect on the probability of giving first, thus adjusting the marginal effect of a change in income, for example, to the true one. If not it would be overstated and you would predict a larger effect than the real one.

The log-transformation normally does a good job in taking care of the non-normality of the data. The issue of homoskedasticity is a tough one, but it is one that is common in microeconometrics. You can always use robust and clustered errors to do inference.

Alfonso Sanchez-Penalver
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#8

20 Sep 2015, 03:22

Hello All,

First of all, apologies to Julian if this discussion with Alfonso deviates too much form the topic of the original post; hopefully the discussion will be helpful to someone.

Going back to Alfonso's comments, I start with the easy parts and leave the more "philosophical" discussion for later.

a) Under heteroskadasticity both Tobit and Heckman's estimator will be inconsistent and that is something that cannot be fixed by using robust or clustered standard errors. A key reference in the area is:

Arabmazar, A. and Schmidt, P. (1981). Further evidence on the robustness of the Tobit estimator to heteroskedasticity, Journal of Econometrics, 17(2), 253-258.

b) Yes, the log transformation does a good job in taking care of non-normality if the data are log-normal ; I do not believe the log-transformation has the magical power of making normal data that are not log-normally distributed. Anyway, even if the transformed data is approximately normal, the transformation has potentially large costs because a model for ln(y) is not necessarily informative about y (this is called the "retransformation problem" in Heath Economics).

c) About the "philosophical"part, I do not think I said or suggested that "economic theory is something philosphical", but I may not have been as clear as I should have been. Let me try again and apologies if I state some obvious things. A given variable can be modeled in many different ways and many of these models may be valid in the sense that they provide answers to interesting questions. For example, structural and forecasting models can be very different but both may be useful to answer interesting questions.

My point was that the model and estimator to use for the kind of data Julian has depends on the question he wants to answer. If the interest is on the effects of the regressors on the observed donations, then the zeros are true zeros and Poisson regression would be a useful starting point. Tobit and sample-selection models can also be used in this context but they do not have the usual interpretation. If the interest is on some latent desired donation that can be negative, then the Tobit and sample-selection models may be useful if the assumptions they require are valid. Jeff Wooldridge's book (another "must have" book) has a nice discussion about this.

So, the philosophical point is that it is necessary to think hard about the questions we want to answer before choosing the model to use and the corresponding estimator. Unfortunately, the way we teach econometrics rarely encourages this practice and many textbooks give the impression that a model is useful only if it is an accurate description of the data generating process. Anyway, enough of philosophy!

All the best,

Joao
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#9

20 Sep 2015, 04:37

Sorry, the "retransformation problem" is discussed in Health Economics, not Heath Economics!
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#10

20 Sep 2015, 08:30

I understood your point Joao and I've already discussed why it's wrong. You have discarded it as a philosophical argument but it's not so I'll try one last time. Consider the effect a change in income will have for observed donations, let's call it the marginal effect, which should be captured by the coefficient on income in the estimation, and assume that giving is a normal good, i.e. the marginal income effect is positive for those where it actually has an effect.

In real life the people who don't give will have a reservation income, that is a level of income until which that is reached they will never give. There are some people whose income is below their reservation income (the constrained zeroes). For those people if the change in income doesn't put them over their reservation income the observed effect on their donations would be 0, and if it does indeed put them over their reservation income, the effect on their observed donations will be different than it would be for people who actually give (because there's a range of income where it's still ineffective). The average marginal effect for these people is then different than that for the people who currently give. I hope so far you agree, and that there is nothing philosophical about this.

Now there will be some zeroes that are true zeroes because it's a possibility. These are those people whose income is exactly equal to their reservation income, and thus they don't give. For these people, however, a change in income will have the same effect as for the people who currently give, because they no longer have a range of income to cover until they reach their reservation income. These people and givers are all from the same population, so the average marginal effect will be the same for all of them.

Joao's point is that "If the interest is on the effects of the regressors on the observed donations, then the zeros are true zeros." No they're not, precisely because you want to know what the effect is on observed donation. For many people you will not observe a change in donations, and for others the observed change will be different, on average, than the one for givers. Assuming that the average marginal effect on observed donations is the same for all of them will produce biased and inconsistent estimates of the coefficients on observed donations no matter what estimation method you choose, and no matter whether the errors are homoskedastic, there is normality log-normality or whatever.

As I said, this is the last time I'll discuss this because it's not productive for Julian. Have a good day.

Alfonso Sanchez-Penalver
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#11

20 Sep 2015, 09:46

Dear Alfonso,

I agree that it is not productive to continue this discussion here and so maybe we can agree to disagree.

Anyway, it looks as if we are actually converging: the mechanism you describe looks like a hurdle or two-part model for the observed data, and this may be fully appropriate in this case as I mentioned in #4 above. This, however, this is not a standard case of censoring or sample-selection, and therefore if Tobit or the sample-selection estimator are used here the usual interpretation is not valid. This is also what I said in #4 and #8 above. The suitability of all these models can be tested with -hpc- and that may actually be useful for Julian.

Enjoy your Sunday,

Joao
Comment
Julian Zuñiga

Join Date: Aug 2015

Posts: 11
#12

21 Sep 2015, 10:02

Hello, Joao and Alfonso

Thanks for all your comments about this subject. The discussion has been really productive for my article,

Joao, I will study Poisson regression as you suggested, and I will mail you if I have any kind of doubt about it.

Best for all,

Julián
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#13

21 Sep 2015, 11:56

Sure, you are most welcome. I am glad you found it useful.

All the best,

Joao
Comment
Nesre Kedir

Join Date: Sep 2023

Posts: 1
#14

12 Nov 2023, 16:57

how can i analyze data on factor affecting rosemary production and marketing
Comment

Announcement

Heckman ML and 2-Step Questions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment