Hi Forum,

This is a bit of a Hail Mary, but Statistics Canada literally recommended that I ask you guys/gals.

I'm using public-use microfiles (PUMF) from the Canadian Community Health Survey (CCHS). The CCHS is nationally-representative data collected by Statistics Canada on a variety of health topics. One of the variables provided by Statistics Canada is a weighting variable that takes into account the unequal probability of respondents being selected. Whereas the American GSS will provide strata and PSU variables, Statistics Canada collapses a lot of this information into a single variable (wts_m).

Alright. So the issue that I'm experiencing is whether the final weight is an analytic weight or a probability weight. I've pointed out to Statistics Canada that in several circumstances Stata produces identical estimates regardless of whether you select the aw or pw designation.

For the example below, I pulled a few random variables from the PUMF for the 2012-CCHS-MH, and put them in the regression model. The output contains 24,000ish observations of Canadians and would suggest that age is a significant negative predictor of total social support.

As you can see, Model 1 used an analytic weight (with robust error) and Model 2 used a probability weight (which automatically incorporates robust error). These models are identical across the board so I'm unsure of which weighting approach should be used (in the present example it wouldn't appear to matte). Given that aw and pw are disallowed in some analyses, I'd rather avoid mistakes if I can. Does anyone have analysis experience with the PUMFs from CCHS, if so, how did you deal with it? Alternatively, does anyone have any conceptual insight into why aw are distinct from pw?

Employees of Statistics Canada that I've spoken with seem to indicate that wts_m is an analytic weight, but this doesn't seem to jive with the description of what an analytic weight is in the Stata help pages.

Cheers,

David.

This is a bit of a Hail Mary, but Statistics Canada literally recommended that I ask you guys/gals.

I'm using public-use microfiles (PUMF) from the Canadian Community Health Survey (CCHS). The CCHS is nationally-representative data collected by Statistics Canada on a variety of health topics. One of the variables provided by Statistics Canada is a weighting variable that takes into account the unequal probability of respondents being selected. Whereas the American GSS will provide strata and PSU variables, Statistics Canada collapses a lot of this information into a single variable (wts_m).

Alright. So the issue that I'm experiencing is whether the final weight is an analytic weight or a probability weight. I've pointed out to Statistics Canada that in several circumstances Stata produces identical estimates regardless of whether you select the aw or pw designation.

For the example below, I pulled a few random variables from the PUMF for the 2012-CCHS-MH, and put them in the regression model. The output contains 24,000ish observations of Canadians and would suggest that age is a significant negative predictor of total social support.

Code:

***1. Weighted regression model using an analytic weight with robust standard error*** regress spsdcon dhhgage ccc_131 [aw=wts_m], vce(robust) (sum of wgt is 27,491,783.48) Linear regression Number of obs = 24,231 F(2, 24228) = 70.90 Prob > F = 0.0000 R-squared = 0.0113 Root MSE = 4.3116 ------------------------------------------------------------------------------ | Robust spsdcon | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- dhhgage | -.1253243 .0109749 -11.42 0.000 -.1468358 -.1038129 ccc_131 | .341673 .4335655 0.79 0.431 -.5081423 1.191488 _cons | 36.15842 .8822762 40.98 0.000 34.42911 37.88774 ------------------------------------------------------------------------------ ***2. Weighted regression model using a probability weight*** regress spsdcon dhhgage ccc_131 [pw=wts_m] (sum of wgt is 27,491,783.48) Linear regression Number of obs = 24,231 F(2, 24228) = 70.90 Prob > F = 0.0000 R-squared = 0.0113 Root MSE = 4.3116 ------------------------------------------------------------------------------ | Robust spsdcon | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- dhhgage | -.1253243 .0109749 -11.42 0.000 -.1468358 -.1038129 ccc_131 | .341673 .4335655 0.79 0.431 -.5081423 1.191488 _cons | 36.15842 .8822762 40.98 0.000 34.42911 37.88774 ------------------------------------------------------------------------------

As you can see, Model 1 used an analytic weight (with robust error) and Model 2 used a probability weight (which automatically incorporates robust error). These models are identical across the board so I'm unsure of which weighting approach should be used (in the present example it wouldn't appear to matte). Given that aw and pw are disallowed in some analyses, I'd rather avoid mistakes if I can. Does anyone have analysis experience with the PUMFs from CCHS, if so, how did you deal with it? Alternatively, does anyone have any conceptual insight into why aw are distinct from pw?

Employees of Statistics Canada that I've spoken with seem to indicate that wts_m is an analytic weight, but this doesn't seem to jive with the description of what an analytic weight is in the Stata help pages.

Cheers,

David.

## Comment