The situation I have in mind is a treatment effects model, where I am allowing for heterogenous effects in each subgroup. After using a regression to compute the treatment effect in each subgroup, I want to take the weighted average of thesae, where the weights should be the (population) proportion of people in each subgroup, which is estimated by the sample proportion. My question: how to do this in Stata?
In order to fix ideas, I have translated my question to an easily available Stata data set, and am also sketching the solution I came up. In the example below, I am interested in whether being Black is associated with a lower log wage. I want to compute this difference within three subgroups defined on education. Having computed the three difference within each subgroup, I compute sample proportions, take the weighted sum [in the disp command] and compute the relevant standard error, taking into account that proportion of people in each category is estimated from the sample [using sureg + nltest].
The solution I came up with is to use sureg and then nltest. Is there a better way to do this? I've got to imagine so!
My proposed solution below -- and thank you in advance. I've been using Stata for several decades. I rarely post here, but have found the forum very useful when at an impasse.
webuse nlswork.dta, clear
*define three education subgroups
gen hsdropout=(grade<12)
gen hs=(grade>=12&grade<16)
gen college=(grade>=16)
*define indicator for Black
gen black=(race==2)
*generate interactions between education group and Black
gen hsdropout_black=hsdropout*black
gen hs_black=hs*black
gen college_black=college*black
sureg (ln_wage hsdropout hs college hsdropout_black hs_black college_black, nocons) (hsdropout) (hs)
*The first regression is the one I'm interested in - the second and third are used to estimate sample proportions of two of three education groups
disp [ln_wage]hsdropout_black*[hsdropout]_cons + [ln_wage]hs_black*[hs]_cons + [ln_wage]college_black*(1-[hsdropout]_cons-[hs]_cons)
testnl [ln_wage]hsdropout_black*[hsdropout]_cons + [ln_wage]hs_black*[hs]_cons + [ln_wage]college_black*(1-[hsdropout]_cons-[hs]_cons)=0
In order to fix ideas, I have translated my question to an easily available Stata data set, and am also sketching the solution I came up. In the example below, I am interested in whether being Black is associated with a lower log wage. I want to compute this difference within three subgroups defined on education. Having computed the three difference within each subgroup, I compute sample proportions, take the weighted sum [in the disp command] and compute the relevant standard error, taking into account that proportion of people in each category is estimated from the sample [using sureg + nltest].
The solution I came up with is to use sureg and then nltest. Is there a better way to do this? I've got to imagine so!
My proposed solution below -- and thank you in advance. I've been using Stata for several decades. I rarely post here, but have found the forum very useful when at an impasse.
webuse nlswork.dta, clear
*define three education subgroups
gen hsdropout=(grade<12)
gen hs=(grade>=12&grade<16)
gen college=(grade>=16)
*define indicator for Black
gen black=(race==2)
*generate interactions between education group and Black
gen hsdropout_black=hsdropout*black
gen hs_black=hs*black
gen college_black=college*black
sureg (ln_wage hsdropout hs college hsdropout_black hs_black college_black, nocons) (hsdropout) (hs)
*The first regression is the one I'm interested in - the second and third are used to estimate sample proportions of two of three education groups
disp [ln_wage]hsdropout_black*[hsdropout]_cons + [ln_wage]hs_black*[hs]_cons + [ln_wage]college_black*(1-[hsdropout]_cons-[hs]_cons)
testnl [ln_wage]hsdropout_black*[hsdropout]_cons + [ln_wage]hs_black*[hs]_cons + [ln_wage]college_black*(1-[hsdropout]_cons-[hs]_cons)=0