Dear All,
I have a panel dataset, and I would like to do a Dynamic Inverse Probability Weighting to run my estimation.
In other words, I would like that the probability of being observed to vary across units and over time.
My concern is that in my main dataset I have a number of observations where the probit model "predicts success/faliure perfectly". Should I impute the I impute the probability of being observed to be ==1 in these cases, or should I keep them as missing values?
Take the following example
I calulate my IPW like so:
probit union ///
i.occ_year c.age c.tenure, ///
vce(cluster occ_code)
predict p_remain, pr
* generate weights
gen w=.
replace w=1/p_remain if union==1
replace w=1/(1-p_remain) if union==0
summarize w
* estimate IPW model
reghdfe ln_w union [pweight=w], a(idcode occ_year) cluster(occ_code)
eststo ipw
Note that my probit model drops a number of observations because they predict success or faliure perfectly.
Because I do not have weights for these observations my estimation sample of the IPW regression drops of over 400 observations.
The question I have is whether this is correct, or whether I should impute weights for those observations where the probit model predicts success or faliure perfectly. In this cases should my weight be 1?
Any broader suggestion on whether I am doing this correctly would be extremely helpful
Best
I have a panel dataset, and I would like to do a Dynamic Inverse Probability Weighting to run my estimation.
In other words, I would like that the probability of being observed to vary across units and over time.
My concern is that in my main dataset I have a number of observations where the probit model "predicts success/faliure perfectly". Should I impute the I impute the probability of being observed to be ==1 in these cases, or should I keep them as missing values?
Take the following example
Code:
use http://www.stata-press.com/data/r17/nlswork, clear (National Longitudinal Survey of Young Women, 14-24 years old in 1968) * Generate group-time variable, used for estimation egen occ_year = group(occ_code year) (121 missing values generated) * baseline model to estimate: impact of union on wages controlling for individual FEs, and occupation-time FEs reghdfe ln_w union, a(idcode occ_year) cluster(occ_code) eststo base
probit union ///
i.occ_year c.age c.tenure, ///
vce(cluster occ_code)
predict p_remain, pr
* generate weights
gen w=.
replace w=1/p_remain if union==1
replace w=1/(1-p_remain) if union==0
summarize w
* estimate IPW model
reghdfe ln_w union [pweight=w], a(idcode occ_year) cluster(occ_code)
eststo ipw
note: 93.occ_year != 0 predicts failure perfectly;
93.occ_year omitted and 21 obs not used.
note: 95.occ_year != 0 predicts failure perfectly;
95.occ_year omitted and 28 obs not used.
.....
note: 145.occ_year != 0 predicts success perfectly;
145.occ_year omitted and 2 obs not used.
. esttab base ipw, se
--------------------------------------------
(1) (2)
ln_wage ln_wage
--------------------------------------------
union 0.0965*** 0.0788**
(0.0185) (0.0172)
_cons 1.737*** 1.773***
(0.00440) (0.00867)
--------------------------------------------
N 18495 18011
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
93.occ_year omitted and 21 obs not used.
note: 95.occ_year != 0 predicts failure perfectly;
95.occ_year omitted and 28 obs not used.
.....
note: 145.occ_year != 0 predicts success perfectly;
145.occ_year omitted and 2 obs not used.
. esttab base ipw, se
--------------------------------------------
(1) (2)
ln_wage ln_wage
--------------------------------------------
union 0.0965*** 0.0788**
(0.0185) (0.0172)
_cons 1.737*** 1.773***
(0.00440) (0.00867)
--------------------------------------------
N 18495 18011
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
The question I have is whether this is correct, or whether I should impute weights for those observations where the probit model predicts success or faliure perfectly. In this cases should my weight be 1?
Any broader suggestion on whether I am doing this correctly would be extremely helpful
Best