Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survey weights in Canadian Community Health Survey Public-Use Microfiles.

    Hi Forum,

    This is a bit of a Hail Mary, but Statistics Canada literally recommended that I ask you guys/gals.

    I'm using public-use microfiles (PUMF) from the Canadian Community Health Survey (CCHS). The CCHS is nationally-representative data collected by Statistics Canada on a variety of health topics. One of the variables provided by Statistics Canada is a weighting variable that takes into account the unequal probability of respondents being selected. Whereas the American GSS will provide strata and PSU variables, Statistics Canada collapses a lot of this information into a single variable (wts_m).

    Alright. So the issue that I'm experiencing is whether the final weight is an analytic weight or a probability weight. I've pointed out to Statistics Canada that in several circumstances Stata produces identical estimates regardless of whether you select the aw or pw designation.

    For the example below, I pulled a few random variables from the PUMF for the 2012-CCHS-MH, and put them in the regression model. The output contains 24,000ish observations of Canadians and would suggest that age is a significant negative predictor of total social support.

    Code:
    ***1. Weighted regression model using an analytic weight with robust standard error***
     
    regress spsdcon dhhgage ccc_131 [aw=wts_m], vce(robust)
    (sum of wgt is 27,491,783.48)
     
    Linear regression                               Number of obs     =     24,231
                                                    F(2, 24228)       =      70.90
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.0113
                                                    Root MSE          =     4.3116
     
    ------------------------------------------------------------------------------
                 |               Robust
        spsdcon |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         dhhgage |  -.1253243   .0109749   -11.42   0.000    -.1468358   -.1038129
         ccc_131 |    .341673   .4335655     0.79   0.431    -.5081423    1.191488
           _cons |   36.15842   .8822762    40.98   0.000     34.42911    37.88774
    ------------------------------------------------------------------------------
     
    ***2. Weighted regression model using a probability weight***
     
    regress spsdcon dhhgage ccc_131 [pw=wts_m]
    (sum of wgt is 27,491,783.48)
     
    Linear regression                               Number of obs     =     24,231
                                                    F(2, 24228)       =      70.90
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.0113
                                                    Root MSE          =     4.3116
     
    ------------------------------------------------------------------------------
                 |               Robust
         spsdcon |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         dhhgage |  -.1253243   .0109749   -11.42   0.000    -.1468358   -.1038129
         ccc_131 |    .341673   .4335655     0.79   0.431    -.5081423    1.191488
           _cons |   36.15842   .8822762    40.98   0.000     34.42911    37.88774
    ------------------------------------------------------------------------------


    As you can see, Model 1 used an analytic weight (with robust error) and Model 2 used a probability weight (which automatically incorporates robust error). These models are identical across the board so I'm unsure of which weighting approach should be used (in the present example it wouldn't appear to matte). Given that aw and pw are disallowed in some analyses, I'd rather avoid mistakes if I can. Does anyone have analysis experience with the PUMFs from CCHS, if so, how did you deal with it? Alternatively, does anyone have any conceptual insight into why aw are distinct from pw?

    Employees of Statistics Canada that I've spoken with seem to indicate that wts_m is an analytic weight, but this doesn't seem to jive with the description of what an analytic weight is in the Stata help pages.

    Cheers,

    David.

  • #2
    It's been a while since I've used a CCHS product, so my memory is not precise. I do recall there being detailed description of how the weights are created, as well as general instructions for how to use the weights for analysis in the PUMF User Guide (or perhaps a simiarly named document). Either way, technical details are part of the documentation that supports the PUMF file, and should be followed.

    Next, let's define the three types of weights you mention. Frequency weights (fweights) are a shorthand to represent that there are n identical values of a cell or observation. For the purpose of survey analysis, this is certainly not the one you want.

    Analytic weights (aweights) are used by Stata to mean that a value represents the mean of of n samples. Probability weights (pweights) are sometimes called sampling weights, and are used in scenarios of random sampling without replacement. My hunch is that the CCHS weights are pweights, but you will need to refer to documentation to figure that out. Here is a FAQ from Statacorp regarding aweights and pweights. Note that there are some scenarios where either weighting method results in the same set of estimates, but in general, this is neither guaranteed nor required.

    Lastly, I would also suggest reviewing how Stata handles survey data using the svyset command. You may read the output of -help svyset- and documentation links therein. You should be able to specify the weighting and pertinent design details once, using svyset, and then carry out regression using the svy prefix. As long as your survey data are correctly set up, your results should be correct.

    Comment


    • #3
      Hi Leonardo,

      Thanks a lot for your fast reply, I'll see if I can hunt down this document.

      Cheers,

      David.

      Comment


      • #4
        Just an update - I contacted a different branch of Stats Can, and they assured me that the CCHS master weight for its PUMF is indeed a pweight.

        Comment


        • #5
          Glad you got that bit sorted out. Thanks for the update, David.

          Comment

          Working...
          X