I'm looking to try to understand the postestimation prediction quantities -margins- offers for -heckprobit-. I've realized that the default predicted values and derivatives for the “outcome” variable do not reflect variables that appear only in the selection equation. This stems from the so-called "pmargin" default that -margins- uses on for -heckprobit-. (See here for the full documentation.)
I find this surprising because I thought a key substantive point of a selection model was to recognize that in the wider population, the outcome of interest depends partly on variables that affect the selection process, insofar as being selected is necessary to experience the ultimate outcome.
To fix ideas, let's imagine a study of persons in which the ultimate outcome of interest is alcohol "abuse," for which one must first be selected into "any use" of alcohol. Let's suppose that "religious conservatism" (RC) affects selection into "any use," but not "abuse," as opposed to "reasonable use." If we take the default post-estimation prediction from -margins-, we'll implicitly ignore that RC affects "abuse" by preventing one from entering a state without which "abuse" can't occur. From this perspective,using one of the non-default predictions that -margins- offers would make more sense, in particular one that is documented as:
p11: Φ2(xjb,zjg,?), predicted probability Pr(yjprobit =1, yjselect =1)
where Φ2 is the bivariate normal probability function, yjprobit is the outcome of interest, and yjselect is the outcome in the selection process.
I'd interpret this to mean the probability that an individual is selected and goes on to experience the outcome of interest.
I'd think that this "p11" prediction would clearly be preferred here, and by extension in other similar and common situations. Because the default choices in Stata typically reflect the most typical and sensible choice, I'm questioning my thinking about the preferred prediction choice and I'd appreciate some reaction. I'm not recognizing any discussion of this point in the more abstract discussion in the literature. Can anyone offer some some clarification or a countering perspective?
I find this surprising because I thought a key substantive point of a selection model was to recognize that in the wider population, the outcome of interest depends partly on variables that affect the selection process, insofar as being selected is necessary to experience the ultimate outcome.
To fix ideas, let's imagine a study of persons in which the ultimate outcome of interest is alcohol "abuse," for which one must first be selected into "any use" of alcohol. Let's suppose that "religious conservatism" (RC) affects selection into "any use," but not "abuse," as opposed to "reasonable use." If we take the default post-estimation prediction from -margins-, we'll implicitly ignore that RC affects "abuse" by preventing one from entering a state without which "abuse" can't occur. From this perspective,using one of the non-default predictions that -margins- offers would make more sense, in particular one that is documented as:
p11: Φ2(xjb,zjg,?), predicted probability Pr(yjprobit =1, yjselect =1)
where Φ2 is the bivariate normal probability function, yjprobit is the outcome of interest, and yjselect is the outcome in the selection process.
I'd interpret this to mean the probability that an individual is selected and goes on to experience the outcome of interest.
I'd think that this "p11" prediction would clearly be preferred here, and by extension in other similar and common situations. Because the default choices in Stata typically reflect the most typical and sensible choice, I'm questioning my thinking about the preferred prediction choice and I'd appreciate some reaction. I'm not recognizing any discussion of this point in the more abstract discussion in the literature. Can anyone offer some some clarification or a countering perspective?

Comment