Hello everyone,
I'm working on my first empirical project ever and got stuck running heckprobit.
My dependent variable is House, indicating whether s.o. owns a house. So it's binary coded (1 if s.o. owns a house, 0 if not).
Unfortunately I have this information from only 129 individuals, while my whole dataset consists of 523 persons. As I doubt the representativity of the sample of 129, I want to test (and possibly remediate) selection bias via heckprobit.
Now I have the following two questions/ problems:
First I had a selection model with only three variables. All with a p-value of less than 0.1, but likelihood-ratio test reported at the bottom of the output gave a p–value equal to 0.7197, which would imply, if I understand correctly, that the Heckman selection equation with my data is not a good idea and not necessary.
Then, with some ttests to detect differences in variables between the large (523) and small (129) sample, I tried to find further variables for the selection equation. As a result, I added six variables to the selection equation, so now I'm working with a selection model of 9 variables. All but one variable are dummies.
What happened was that on the one hand the likelihood-ratio test reported at the bottom of the output has a p–value of 0.0183, which would imply that the Heckman selection equation with my data is useful and better than standard probit. On the other hand, all variables in the selection model except for one have p-values between 0.12 and 0.88.
The only significant variable is part of a set of dummies, so that I cannot simply include this (significant) one but leave out the others (with quite high p-values) of this dummy variable set, right?
Trying to find a more parsimonious selection model, I again excluded some of the variables with high p-values. But I came across the following problem:
For some specifications of the selection model Stata is endlessly "Fitting full model" with thousands of iterations, where log likelihood doesn't change any more. When I then stop the process via "Break" Stata tells me "parameter named athrho not found". So far, I didn't find out what the problem with the respective selection models is. To me it seems random whether a specification of the selection model can be calculated within a second or two or it leads to endless iterations.
Does someone know what to do?
Your help is greatly appreciated!
Thanks a lot!
Simon Kuhn
I'm working on my first empirical project ever and got stuck running heckprobit.
My dependent variable is House, indicating whether s.o. owns a house. So it's binary coded (1 if s.o. owns a house, 0 if not).
Unfortunately I have this information from only 129 individuals, while my whole dataset consists of 523 persons. As I doubt the representativity of the sample of 129, I want to test (and possibly remediate) selection bias via heckprobit.
Now I have the following two questions/ problems:
First I had a selection model with only three variables. All with a p-value of less than 0.1, but likelihood-ratio test reported at the bottom of the output gave a p–value equal to 0.7197, which would imply, if I understand correctly, that the Heckman selection equation with my data is not a good idea and not necessary.
Then, with some ttests to detect differences in variables between the large (523) and small (129) sample, I tried to find further variables for the selection equation. As a result, I added six variables to the selection equation, so now I'm working with a selection model of 9 variables. All but one variable are dummies.
What happened was that on the one hand the likelihood-ratio test reported at the bottom of the output has a p–value of 0.0183, which would imply that the Heckman selection equation with my data is useful and better than standard probit. On the other hand, all variables in the selection model except for one have p-values between 0.12 and 0.88.
The only significant variable is part of a set of dummies, so that I cannot simply include this (significant) one but leave out the others (with quite high p-values) of this dummy variable set, right?
Trying to find a more parsimonious selection model, I again excluded some of the variables with high p-values. But I came across the following problem:
For some specifications of the selection model Stata is endlessly "Fitting full model" with thousands of iterations, where log likelihood doesn't change any more. When I then stop the process via "Break" Stata tells me "parameter named athrho not found". So far, I didn't find out what the problem with the respective selection models is. To me it seems random whether a specification of the selection model can be calculated within a second or two or it leads to endless iterations.
Does someone know what to do?
Your help is greatly appreciated!
Thanks a lot!
Simon Kuhn
Comment