Heckman: testing normality and homoskedasticity

Caspar Aumueller

Join Date: Oct 2015

Posts: 31
#1

Heckman: testing normality and homoskedasticity

10 Feb 2016, 04:32

Hello Statalisters,

i'm estimating the effect of wealth on the amount donated to charities, using data of the german socio-economic panel.

Because date on donations is censored at zero (negative donations are not possible) i`d like to use the heckman-ml estimator (one could also use tobit or two-step-method as laid out in "Microeconometrics Using Stata" by Cameron&Trivedi).

Doing my regressions, i realized that i don't know how to check the assumptions of normality and homoskedasticity of the error terms. First, i simply used the

Code:

predict, stdp

and

Code:

predict, stdf

command together with

Code:

qnorm

to check (for normality in this case). But i believe now this is wrong as prediction- and forecast-errors are not the same as residuals (and no estimate for it), at least that is my understanding.

The afore mentioned book suggests to compute generalized residuals to test the assumptions of the tobit-estimation, but i don't believe that i could use the same formulas to compute generalized residuals for heckman, because heckman assumes two individual processes for selection and amount donated, whereas tobit assumes this is one.

Finally, i looked for some formal tests, like estat hettest for regress, but found none (if i had something like residuals for the heckman i could use eg. sktest, though).

Many thanks in advance!

Caspar
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3005
#2

10 Feb 2016, 15:44

Dear Caspar,

I do not know of any simple method to test these assumptions but you can do some other checks. For example, you can do a simple RESET using squares and cubes on the first stage probit (which is a test for normality and homoskedasticity in the probit). Alternatively, you can do a RESET on the sample selection model directly.

Both the sample selection model and Tobit are very sensitive to departures form the normality and (especially) homoskedasticity assumptions. An alternative is to use a Poisson regression which is much more robust and also deals with the limited dependent variable. Actually, you can test the sample selection model against the Poisson regression using the -hpc- command, which is available from SSC.

All the best,

Joao
Comment
Caspar Aumueller

Join Date: Oct 2015

Posts: 31
#3

11 Feb 2016, 03:44

Hello Joao,

thank you very much. I will try the Reset-Test and look into the Poisson-Regression (although i have a feeling that we could find us in this discussion, which, by the way, i found very illuminating and is in my bookmarks ).

Regarding the Reset-Test and it's implementation: I'm unsure as to which predicted values i should use. I suppose it's the latent predicted value from predict, xb, because this is where all else (predicted probabilities, censored predictions, etc.) steams from. Am i right about that?

Kind regards,

Caspar
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3005
#4

11 Feb 2016, 10:52

Dear Caspar,

Yes, use the -xb- predictions for that. I am glad you found the discussion useful.

Best wishes,

Joao
Comment
Caspar Aumueller

Join Date: Oct 2015

Posts: 31
#5

12 Feb 2016, 08:31

Thank you! Your advice was very helpfull!
Comment
Cappuccia Leo

Join Date: Jun 2023

Posts: 6
#6

14 Jun 2023, 06:08

Dear Mr.Santos Silva, I am working on a Heckman model and I would like to know what is the justification of using the RESET test for normality or homoskedasticity here. Could you indicate if this code is correct to implement it :
heckman .... , select(...)
predict pred, xb
gen pred2=pred^2
gen pred3=pred^3
heckman ... pred2 pred3, select(...)
test pred2 pred3
Is it the good code ?

Moreover ,does this assumption of normality still holds in large samples with heckman ?

Thanks you very much and have a good day !

Last edited by Cappuccia Leo; 14 Jun 2023, 06:22.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3005
#7

14 Jun 2023, 06:59

Dear Cappuccia Leo,

The RESET, with squares and cubes, is a test for normality in the probit, which is the first stage of the Heckit. What you are doing is a RESET on the main equation; to test the normality in the selection equation you need to get its fitted values (with the option xbs instead of xb) and include the powers in the selection part of the model. Of course, you can do the test in both parts.

The need for normality and homoskedasticity does not depend on the sample size, so the estimator is not very robust at all.

Best wishes,

Joao
1 like
Comment
Cappuccia Leo

Join Date: Jun 2023

Posts: 6
#8

14 Jun 2023, 09:12

Dear Joao, thank you for your answer, I achieved to implement the test thanks to you !

However, knowing that it is often quite difficult to be sure to satisfy the assumptions of bivariate normality or homoskedasticity in the heckman model, do you know another interesting model to handle sample selection bias (in my case due to self-selection, some individuals choose to not consume depending on their characteristics, and so the missing values in the amount consumed are non random) with continuous data, that does not need the normality assumption ? I saw that there exists some non parametric heckman model with weaker assumptions, but not available in stata.
Thank you for your help !

Have a nice day.

Kind regards,
L.C
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3005
#9

14 Jun 2023, 11:37

Dear Cappuccia Leo,

From what you describe, you do not have sample selection but corner solutions data (the missing values are actually zeros). It may be better to use Poisson regression or a hurdle model.

Best wishes,

Joao
Comment
Cappuccia Leo

Join Date: Jun 2023

Posts: 6
#10

15 Jun 2023, 03:49

Dear Joao Santos Silva, thank you for your answer.
After having red a lot at this kind of discussion, I found that there seems to be some confusion between corner solution and self selection, and so I am confused too. For example, we can consider that the choice of not consuming a good leads to corner solution problem, but we can also consider, that, because we observe consumption level only for people who consume we have self selection, as the data is missing (even if we can replace it by 0). To illustrate this, I found an example used to illustrate sample selection : Wages are only observed for worker. Thus, the wages that non-workers would receive (if they decided to work) are unknown. If I remember correctly, it is more or less the example taken by Heckman in his article. So, in this kind of situation, I find interesting that in a sens, both arguments can be convincing. Do you agree or maybe this reasoning is too simple ?
Moreover, running the Heckman model, I find a significant rho parameter, so there is correlation in errors, and the model is valid (only regarding this point).

Thank you for your time, and for this discussion.

Kind regards,
Leo.C

Last edited by Cappuccia Leo; 15 Jun 2023, 03:59.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3005
#11

15 Jun 2023, 05:17

Dear Cappuccia Leo,

It all depends on what you want to estimate; on what is the population of interest. Heckman's sample selection procedure estimates the parameters for the conditional mean for an artificial population where everybody participates in the labour market. In your case, this would be a population in which all individuals would consume the particular good/service you are considering. Alternatively, you can consider modelling what people actually did and model the observed variable (rather than a latent one); in this case you have corner solutions. I guess Jeff Wooldridge's books are the best references for this, but it is up to you to decide what you want to model.

Best wishes,

Joao
Comment
Cappuccia Leo

Join Date: Jun 2023

Posts: 6
#12

15 Jun 2023, 05:39

Dear Joao Santos Silva, thank you for the precision, I can clearly see the difference now. I will take a look at the Jeff Wooldridg's book as you suggested.
Thank you for your help, and have a nice day.

Kind regards,
Leo.C
1 like
Comment

Announcement

Heckman: testing normality and homoskedasticity

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment