Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correct weighting with grouped data in the CPS

    I am using Stata 14.1 and running regressions on CPS data that I have collapsed into weighted state-year averages. My question is with regard to the correct weighting in order to account not only for within state over- or undersampling of certain groups, but also different population counts across states in order to calculate average partial effects.

    For simplification, say I want to run a regression of log wages on the minimum wage. The minimum wage variable comes from an external source and it is not weighted. It is a state-by-year variable that I have merged into my dataset.

    I want to run the regression on state-year grouped data and therefore create weighted means with the following command:

    collapse (mean) lwages [aw=earnwt], by(state year)

    where earnwt is the individual weight variable to be used with analyses on wages and is calculated as 1/probability of being in the sample and lwages are individual logged wages

    Then I merge in my minimum wage:

    merge 1:1 state year using "${data}\mw.dta", keepusing(logmw)

    where logmw is the minimum wage that does not vary by state and year and I run the following regression:

    reg lwages logmw i.state i.year, cluster(state) robust

    My question: the logged wages are weighted averages that take into account within state sampling differences. It seems though that I would need to still take into account the different population numbers across states in order to obtain consistent average partial effects. Would it suffice for me to create a second weight for the number of observations per state (below “population”) and then run the regression with frequency weights like in the example below or would I again need to account for the within-state weights, for instance the average earnwt by state-year cell multiplied by the number of observations per cell?

    reg lwages logmw i.state i.year [fw=population], cluster(state) robust

    Many thanks in advance if someone could give some guidance on this.
Working...
X