Should I use a heckman selection model when extracting a subset of my data for analysis?

Raahil Madhok

Join Date: Jan 2019

Posts: 9
#1

Should I use a heckman selection model when extracting a subset of my data for analysis?

01 Nov 2019, 19:24

This is a new topic for me, so pardon my basic understanding of the heckman model. In fact, it's possible I should not be using a selection model at all, so I wanted to check first.

My dependent variable (individual-level) is species diversity captured by a birdwatcher in a district-time period. The independent variable is deforestation in the district-time period. Many individuals go looking for specific birds, so their data points are less useful for eliciting the general impact of deforestation on species diversity. The goal is to identify the individuals who capture everything in sight and focus on them for the analysis.

I plan to define a "veteran" birdwatcher as someone capturing a reasonably representative measure of diversity based on some predefined criteria. This could include the total trips they take, the number of months per year they go out, and whether they report all species during the trip. These predict veteran status but are not part of the outcome equation.

Initially, I dropped all observations that didn't meet the selection criteria. My understanding is that doing this biases my coefficients by truncating the distribution of error terms. Can I treat my issue as a selection model?

My idea is to generate a dummy=1 for veterans and 0 for non-veterans, based on the above criteria. Then I would estimate coefficients and standard errors with the heckman command. Is this the right approach? Is it a problem that I am "incidentally truncating" the data myself, and then correcting for it? Thanks.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17703
#2

02 Nov 2019, 04:43

Raahil:
instead of focusing your analysis on expert birdwatchers only, why not intriduce 0/1 categorical variable for amateur/verteran birdwatchers among the predictors of your regression model and see if, when adjusted for the remaining regressors, it causes variations in the regressand?

Kind regards,
Carlo
(Stata 19.0)
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

04 Nov 2019, 07:05

To add to Carlo's help suggestion, you can interact your experience variable with a pile of other variables using i.experience#(c.x i.z) with i and c depending on whether you want the variables treated as continuous or not. So you can easily chose which variables you want to allow to vary between amateurs and verterans.

While I'm not sure this is necessarily a problem in your study, many think converting a continuous variable into a discrete variable is not a good idea - it throws away information and adds measurement error.
Comment

Announcement

Should I use a heckman selection model when extracting a subset of my data for analysis?

Comment

Comment