Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survey analysis of couple data

    I have data on health outcomes among 340 couples and would like to look at how these are associated between couple members using svy commands in STATA. However, I am not sure that what I am currently doing is correct.

    I used the following - svyset coupleid [pweight=weight] - to set up my data in order to run weighted cross-tabs that take into account clusters based on couples as per the STATA survey manual (there is no other information to input (for example, strata)) but this gives me output that is the same as non-clustered svy commands (svyset [pweight=weight]). As I understand it, the clustered svyset should be taking into account non-independence and therefore I should get slightly different output.

    Is there something I have missed or that I am doing completely wrong?


  • #2
    Absent missing values in your coupleid variable, primary sampling units (PSUs) will not affect
    point estimates. The PSU information is used in variance estimation.

    If you are using svy: tabulate for computing the weighted cross-tabs, you can request that
    standard errors be reported by specifying the se option. Here is an example from the help
    file for svy: tabulate twoway.

    Code:
    . webuse nhanes2b
    
    . svy: tabulate race diabetes, se
    (running tabulate on estimation sample)
    
    Number of strata   =        31                Number of obs     =       10,349
    Number of PSUs     =        62                Population size   =  117,131,111
                                                  Design df         =           31
    
    -------------------------------------------
    1=white,  |
    2=black,  |      diabetes, 1=yes, 0=no     
    3=other   |         0          1      Total
    ----------+--------------------------------
        White |      .851      .0281      .8791
              |   (.0158)    (.0019)    (.0167)
              | 
        Black |     .0899      .0056      .0955
              |   (.0121)  (8.5e-04)    (.0128)
              | 
        Other |     .0248    5.2e-04      .0253
              |   (.0102)  (3.9e-04)    (.0105)
              | 
        Total |     .9658      .0342          1
              |   (.0018)    (.0018)           
    -------------------------------------------
      Key:  cell proportion
            (linearized standard error of cell proportion)
    
      Pearson:
        Uncorrected   chi2(2)         =   21.3483
        Design-based  F(1.52, 47.26)  =   15.0056     P = 0.0000
    Here is the same tabulation, with sampling weights only.

    Code:
    . svyset [pw=finalwgt]
    
          pweight: finalwgt
              VCE: linearized
      Single unit: missing
         Strata 1: <one>
             SU 1: <observations>
            FPC 1: <zero>
    
    . svy: tabulate race diabetes, se
    (running tabulate on estimation sample)
    
    Number of strata   =         1                Number of obs     =       10,349
    Number of PSUs     =    10,349                Population size   =  117,131,111
                                                  Design df         =       10,348
    
    -------------------------------------------
    1=white,  |
    2=black,  |      diabetes, 1=yes, 0=no     
    3=other   |         0          1      Total
    ----------+--------------------------------
        White |      .851      .0281      .8791
              |   (.0041)    (.0017)    (.0039)
              | 
        Black |     .0899      .0056      .0955
              |   (.0033)  (8.0e-04)    (.0034)
              | 
        Other |     .0248    5.2e-04      .0253
              |    (.002)  (2.4e-04)     (.002)
              | 
        Total |     .9658      .0342          1
              |   (.0019)    (.0019)           
    -------------------------------------------
      Key:  cell proportion
            (linearized standard error of cell proportion)
    
      Pearson:
        Uncorrected   chi2(2)         =   21.3483
        Design-based  F(2.00, 20692.32)=    9.0393    P = 0.0001
    The estimated cell proportions are the same, but the standard error estimates are different.

    Comment


    • #3
      Thanks Jeff, that's really helpful.

      With my data, I am slightly concerned that the chi2 values do not change between the two types of svyset, which would have indicated that Stata is doing something different for each and taking into account non-independence. I suspect that because I only have coupleid (and partner1id/partner2id) but no other information such as strata, Stata is treating the 340 couples as 340 individuals and analysing what should be couple data as individual data.

      I'm not sure if I can set-out my data in other form that would get around this?

      Comment

      Working...
      X