Variable Choice in hetprobit (heteroskedasticity in probit)

Andrew Granato

Join Date: Jun 2023

Posts: 1
#1

Variable Choice in hetprobit (heteroskedasticity in probit)

21 Jun 2023, 16:29

Hi all,
I have a probit regression that goes something like:
probit manager experience experienced_sqd gender school, vce(robust)
So I am evaluating the probability that someone becomes a manager based on their years of experience, gender, school they attended, etc. Looking at the hetprobit command, should the way I structure the test for heteroskedasticity be:
hetprobit manager experience experienced_sqd gender school, het(experience experienced_sqd gender school) vce(robust)?
I am not sure if I am supposed to put exactly the same independent variables into the het() portion. Looking back to previous debates, I see there has been discussion that putting the entire same set of independent variables may cause multicollinearity issues (https://www.statalist.org/forums/for...probit-command). Putting in the same independent variables also causes some of my coefficients to be wildly implausible. I think I am likely misunderstanding how this command is meant to be used. If someone could help, I would be very grateful, thank you!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10178
#2

21 Jun 2023, 20:55

You are explicitly modeling the heteroskedasticity, so the variables should come from economic theory or common sense. The canonical example illustrating heteroskedasticity relates distance traveled for family vacations and household characteristics such as household size and household income. It is observed that households with low incomes travel zero to a few miles whereas there is large variance in distance traveled for households with high incomes, i.e., some travel many miles and some do not. This results in heteroskedasticity. Here, it is clear that family income is a key variable explaining the heteroskedasticity. Families with low incomes cannot afford to travel far, if at all, whereas families with high incomes have more flexibility and some choose to travel far (because they can afford to do so).
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2148
#3

22 Jun 2023, 22:15

Heteroskedasticity in probit models is not the same as in a linear model with a quantitative outcome. In the latter, E(y|x) and Var(y|x) can be completely separate. With a binary outcome, there is only P(y = 1|x) = E(y|x), and then Var(y|x) = p(x)[1 - p(x)]. So when you allow heteroskedasticity in a probit you're changing the functional form of p(x). That's find, but hetprobit is not necessarily the best way.

As mentioned by Andrew, parameters are often imprecisely estimated with hetprobit. There is a kind of collinearity. But the parameters themselves don't mean that much. You should focus on the average partial effects. These are often estimated much more precisely, and can be compared across models.

I use an application to mortgage approval rates, and the hetprobit coefficient estimates are much less precise. But the APEs are almost identical to usual probit and logit. My view is hetprobit doesn't do much for us. Try squares and interactions directly in probit.

I think it only makes sense to include the same variables in both places if you use hetprobit. But there can be perfect collinearity if you aren't careful. For example, with a single binary X you lose identification.

Code:

use loanapp, clear hetprob approve i.white hrat obrat loanprc unem male married dep sch cosign chist /// pubrec mortlat1 mortlat2, het(i.white hrat obrat loanprc unem male /// married dep sch cosign chist pubrec mortlat1 mortlat2) margins, dydx(white loanprc chist)

Data set here: http://qcpages.qc.cuny.edu/~rvesselinov/statafiles.html
1 like
Comment

Announcement

Variable Choice in hetprobit (heteroskedasticity in probit)

Comment

Comment