heckman adjustment

Natalie G

Join Date: Feb 2015

Posts: 3
#1

heckman adjustment

16 Feb 2015, 20:12

I am trying to calculate a Heckman adjustment because there is sample selection bias in my data. My sample includes all convicted defendants. My dependent variable is a binary outcome comparing defendants who were sentenced to prison (coded 1) versus defendants who were not sentenced to prison (coded 0). I have tried to calculate the Heckman multiple times. When I include most of my independent variables, I receive the error message: Dependent variable could not censor because of selection and it could be reduced to OLS regression. I was warned that there may be multicollinearity between some of the measures in my data set. To deal with this issue, I attempted to build the most basic Heckman model with just a couple of my independent variables, trying to get the model to run at all. Below is the syntax that I used.
heckman prissent SEX_R, select (prissent= SEX_R FELONY1) twostep

I receive the following error message: prissent collinear with _cons

My sample is all convicted offenders, so it makes sense that prison sentence and convicted would be related, but I feel like something else is going on. I do not have any missing values. I also checked my binary outcome (prison sentence) and I have variation. I have tried suppressing the constant term but I receive the same message.

Would anyone have a suggestion on what I might be doing wrong? Granted the syntax above only includes two independent variables (defendant sex and felony 1 offense), but I continue to receive the same error message regardless of the combination of IVs I include.

Thank you!
Tags: None

1 like
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#2

17 Feb 2015, 02:13

How do you define the selection into the sample? You mention that your sample includes only convicted defendants (line1) and you say convicted offenders (line 10) and you don't have any missing values so what is the observed selection mechanism. One more thing to note is that your dependent variable is binary (prison vs. non-prison) so you might consider the heckprobit as more relevant.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

17 Feb 2015, 15:24

Heckman selection models are far away from my field as well as personal experience.

That said, it seems to me that you should prefer to use - heckprobit - (instead of - heckman -) since your dependent variable is binary. Now I see this was already pointed out by Oded. Also, within the "select" session, the "dependent" variable could be now an endogenous binary variable, precisely the one you consider in need of adjustment due to sample selection's bias, but not the very same dependent variable you chose in the first parcel of the commands. Please check some examples on this link: http://www.stata.com/manuals13/rheckprobit.pdf

Hopefully it helps.

Best,

Marcos

Last edited by Marcos Almeida; 17 Feb 2015, 15:26.

Best regards,

Marcos
Comment
Natalie G

Join Date: Feb 2015

Posts: 3
#4

18 Feb 2015, 12:02

Thank you both for the suggestion of using the heckprobit. Marcos, thank you for the link!

Oded, my understanding of the select command (which is very new, very brief, and could be completely wrong) is that I want to include all of my independent variables plus any effects that might impact the likelihood of receiving a prison sentence (that would not be included in the multivariate model) under the select command. I was thinking that bond amount might increase the odds of imprisonment if a defendant was unable to pay a particularly high bond amount. While I do not control for bond amount in my multivariate model, I thought this might be something to include in the select command. Is there something else I should consider related to the selection mechanism you asked about?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#5

18 Feb 2015, 13:12

Natalie: Oded touched on this, but for emphasis: You do not have a setup where the Heckman method can be used. You would have to have data on a group of people who were on trial but not convicted. As a mechanical point, you cannot have the selection variable the same as your dependent variable. That makes no sense.

The command would have to be something like this:

heckprob prissent SEX_R, select (convicted = SEX_R FELONY1) twostep

where convicted is a dummy variable indicating conviction -- which assumes you have both convicted and not convicted people in the sample. And for this point, it does not matter whether you use heckman or heckprob, although the latter is more appropriate.

I actually think you can justify just going ahead by explaining that you are not able to say anything about the entire population of people who have gone on trial. Rather, you are conditioning on being convicted. You have no choice unless you can expand your data set to account for unconvicted people. That means your thought experiment is a bit weird: "If this unconvicted person had been convicted, would he/she go to prison." I'm not sure this makes a lot of sense.
2 likes
Comment
Natalie G

Join Date: Feb 2015

Posts: 3
#6

18 Feb 2015, 13:22

Jeff, thank you for clarifying. I understand the point on both accounts now. As I mentioned, I have been struggling with this concept for a bit of time, and was basically flying blindly. I appreciate your help.
Comment
Guest
#7

01 Mar 2015, 08:11

Hi everyone,

I have two important questions with respect to heckman adjustment for categorial outcomes:

1. I am searching for a solution to control for selection bias in a multinomial model. I know that Stata provides a solution for binary (Binary probit model with selection: heckprob) and ordered (Ordered probit model with selection: heckoprob) outcomes. Any ideas for multinomial dependent variables?

2. I am interesting in adding two selection equation in a model that is based on a sample that is selected in two steps. I know there is a solution for linear regression models (Selmlog), but I am looking for a solution for categorical outcomes. Can anybody help?

I would be very pleased for any suggestions!

Best,

Chris
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#8

01 Mar 2015, 15:55

On Question 2 [and if I understand you correctly]: see Cappellari and Jenkins, Stata Journal 6(2), 2006 [free download from SJ site] for examples of multivariate probit models with multiple selections. In short, as long as you know the likelihood in theory, you should be able to adapt our code to suit your purposes. I.e. tricky but do-able
Comment
Guest
#9

02 Mar 2015, 00:22

Thank you very much! This might help a lot. I have just started to use Stata, but I will have a closer look at it.

Can anyone help me with respect to my first question?
Comment
Embarika Farouk

Join Date: Dec 2016

Posts: 15
#10

31 Jul 2018, 06:36

Hello all

@Natalie G this really interesting finding I have got the same with my analysis,
surprising that when I redefined my independent variable in the selection equation I could ride off this error message,

to make it more clear, my independent variable was

Code:

not weighed at birth Freq. Percent Cum. 0 9,566 68.48 68.48 1 4,403 31.52 100.00 Total 13,969 100.00

then I reversed it to be

Code:

not weighed at birth Freq. Percent Cum. 0 4,403 31.52 31.52 1 9,566 68.48 100.00 Total 13,969 100.00

So that means I changed the default population sample to be un-weighed children who count 4,403.

I am not sure if it is correct or not and that let me think more about the technique in which Heckman procedure works,

I would much appreciate any explanation to be added here.

Thank you in advance
Comment
Embarika Farouk

Join Date: Dec 2016

Posts: 15
#11

31 Jul 2018, 06:40

Originally posted by Jeff Wooldridge View Post

Natalie: Oded touched on this, but for emphasis: You do not have a setup where the Heckman method can be used. You would have to have data on a group of people who were on trial but not convicted. As a mechanical point, you cannot have the selection variable the same as your dependent variable. That makes no sense.

The command would have to be something like this:

heckprob prissent SEX_R, select (convicted = SEX_R FELONY1) twostep

where convicted is a dummy variable indicating conviction -- which assumes you have both convicted and not convicted people in the sample. And for this point, it does not matter whether you use heckman or heckprob, although the latter is more appropriate.

I actually think you can justify just going ahead by explaining that you are not able to say anything about the entire population of people who have gone on trial. Rather, you are conditioning on being convicted. You have no choice unless you can expand your data set to account for unconvicted people. That means your thought experiment is a bit weird: "If this unconvicted person had been convicted, would he/she go to prison." I'm not sure this makes a lot of sense.

Hello Mr Jeff

I would appreciate if you read my finding below and help me to understand the difference

Thanks in advance
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#12

25 Aug 2019, 08:15

Originally posted by Jeff Wooldridge View Post

Natalie: Oded touched on this, but for emphasis: You do not have a setup where the Heckman method can be used. You would have to have data on a group of people who were on trial but not convicted. As a mechanical point, you cannot have the selection variable the same as your dependent variable. That makes no sense.

The command would have to be something like this:

heckprob prissent SEX_R, select (convicted = SEX_R FELONY1) twostep

where convicted is a dummy variable indicating conviction -- which assumes you have both convicted and not convicted people in the sample. And for this point, it does not matter whether you use heckman or heckprob, although the latter is more appropriate.

I actually think you can justify just going ahead by explaining that you are not able to say anything about the entire population of people who have gone on trial. Rather, you are conditioning on being convicted. You have no choice unless you can expand your data set to account for unconvicted people. That means your thought experiment is a bit weird: "If this unconvicted person had been convicted, would he/she go to prison." I'm not sure this makes a lot of sense.

according to your explanation i think Heckman model is not suitable for models with binary dependent variable.
I have faced same problem (my dependent variable is corporate fraud, 1 if firm committed fraud zero otherwise) so there are no other categories.
So Kindly could you tell me if it is fixable issue or not because i m tired of searching, please.

best regards
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#13

25 Aug 2019, 09:07

There is a Heckman selection when the response is binary. But do you have a case where your y variable is not always observed? Or is a binary explanatory variable endogenous? You need to say more about your setup.
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#14

25 Aug 2019, 23:46

Originally posted by Jeff Wooldridge View Post

There is a Heckman selection when the response is binary. But do you have a case where your y variable is not always observed? Or is a binary explanatory variable endogenous? You need to say more about your setup.

Thank you for replying
I will try give more details about my paper.
the hypothesis is : CEO pay is negatively associated with corporate fraud.
my model as follows:
Fraud(dummy)= CEO pay + CEO characteristics + specific-firm variables
study period from 2010-2017
number of observations = 18212

honestly I do not understand the meaning of (y variable is not always observed).
what i understand is
I have dataset contains only the firms that committed fraud, i merged it with non committed firms. and after deleting the missing value i got those observations.
So How can I know whether I have a case of unobserved dependent variable?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#15

26 Aug 2019, 12:47

When you tab your fraud dummy, is it always there or is there missing data? My guess is no data systematically missing, so there is no need to use a Heckman model. Your dependent variable is binary, and you have some firms that committed fraud and others that did not. So you use a correlated random effects probit model, estimated by pooled probit because you probably want to allow CEO pay to be correlated with unobserved firm heterogeneity. I'm not sure where a Heckman model would come into play.
Comment

Announcement

heckman adjustment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment