I am using Stata 14.1 and running regressions on CPS data that I have collapsed into weighted state-year averages. My question is with regard to the correct weighting in order to account not only for within state over- or undersampling of certain groups, but also different population counts across states in order to calculate average partial effects.
For simplification, say I want to run a regression of log wages on the minimum wage. The minimum wage variable comes from an external source and it is not weighted. It is a state-by-year variable that I have merged into my dataset.
I want to run the regression on state-year grouped data and therefore create weighted means with the following command:
collapse (mean) lwages [aw=earnwt], by(state year)
where earnwt is the individual weight variable to be used with analyses on wages and is calculated as 1/probability of being in the sample and lwages are individual logged wages
Then I merge in my minimum wage:
merge 1:1 state year using "${data}\mw.dta", keepusing(logmw)
where logmw is the minimum wage that does not vary by state and year and I run the following regression:
reg lwages logmw i.state i.year, cluster(state) robust
My question: the logged wages are weighted averages that take into account within state sampling differences. It seems though that I would need to still take into account the different population numbers across states in order to obtain consistent average partial effects. Would it suffice for me to create a second weight for the number of observations per state (below “population”) and then run the regression with frequency weights like in the example below or would I again need to account for the within-state weights, for instance the average earnwt by state-year cell multiplied by the number of observations per cell?
reg lwages logmw i.state i.year [fw=population], cluster(state) robust
Many thanks in advance if someone could give some guidance on this.
For simplification, say I want to run a regression of log wages on the minimum wage. The minimum wage variable comes from an external source and it is not weighted. It is a state-by-year variable that I have merged into my dataset.
I want to run the regression on state-year grouped data and therefore create weighted means with the following command:
collapse (mean) lwages [aw=earnwt], by(state year)
where earnwt is the individual weight variable to be used with analyses on wages and is calculated as 1/probability of being in the sample and lwages are individual logged wages
Then I merge in my minimum wage:
merge 1:1 state year using "${data}\mw.dta", keepusing(logmw)
where logmw is the minimum wage that does not vary by state and year and I run the following regression:
reg lwages logmw i.state i.year, cluster(state) robust
My question: the logged wages are weighted averages that take into account within state sampling differences. It seems though that I would need to still take into account the different population numbers across states in order to obtain consistent average partial effects. Would it suffice for me to create a second weight for the number of observations per state (below “population”) and then run the regression with frequency weights like in the example below or would I again need to account for the within-state weights, for instance the average earnwt by state-year cell multiplied by the number of observations per cell?
reg lwages logmw i.state i.year [fw=population], cluster(state) robust
Many thanks in advance if someone could give some guidance on this.