Hi,
I am trying to construct a Heckman selection model, but I am struggling to figure out which variables should be included where.
My data is a panel of countries with t=20, n=100
Here's my approach, please if someone could kindly troubleshoot my code (shortened and made general to make it easier).
Generalised model specification: y1 = x1 + x2 + x3 + ui + vt + eit. Let's say y1 is government education spending, x1 is number of children as a percent of total population, x2 is a macro-level financial stability indicator, and x3 is the number of IMF conditionalities in place. Let's assume the ratio of children in a country (x1) does not predict IMF programme participation, however financial instability (x2) does.
I generate a selection variable as follows: selection_var (IMF programme participation) is generated using x3 (number of IMF conditionalities in place), such that selection_var=1 if x3.=0 (not equal to zero)
I then generate a probit model - note, here I am not including x3 because selection_var is generated from x3
*generate Inverse Mills Ratio:
*Run regression including Inverse Mills Ratio - here I am again not including x3 because my selection_var is generated from it, but maybe I should??
*Here I am also in doubt whether I should indeed include x2, because financial instability will in part select whether a country joins into an IMF programme..?
*Alternative method:
*Again, in doubt about which of the x's should be included where?
I also don't get the same inverse mills ratios with the two different methods no matter how I try to specify.
Any help is sincerely appreciated!
Kindest regards,
Freddy
I am trying to construct a Heckman selection model, but I am struggling to figure out which variables should be included where.
My data is a panel of countries with t=20, n=100
Here's my approach, please if someone could kindly troubleshoot my code (shortened and made general to make it easier).
Generalised model specification: y1 = x1 + x2 + x3 + ui + vt + eit. Let's say y1 is government education spending, x1 is number of children as a percent of total population, x2 is a macro-level financial stability indicator, and x3 is the number of IMF conditionalities in place. Let's assume the ratio of children in a country (x1) does not predict IMF programme participation, however financial instability (x2) does.
I generate a selection variable as follows: selection_var (IMF programme participation) is generated using x3 (number of IMF conditionalities in place), such that selection_var=1 if x3.=0 (not equal to zero)
I then generate a probit model - note, here I am not including x3 because selection_var is generated from x3
Code:
probit selection_var x1 x2
Code:
predict yhat,xb
Code:
gen mills=normalden(yhat)/normal(yhat)
Code:
reg y1 x1 x2 mills
*Alternative method:
Code:
heckman y1 x1 x2 x3 select(x1 x2 selection_var) twostep
I also don't get the same inverse mills ratios with the two different methods no matter how I try to specify.
Any help is sincerely appreciated!
Kindest regards,
Freddy