Survey analysis of couple data

Rachael G

Join Date: Apr 2015

Posts: 2
#1

Survey analysis of couple data

27 Apr 2015, 04:53

I have data on health outcomes among 340 couples and would like to look at how these are associated between couple members using svy commands in STATA. However, I am not sure that what I am currently doing is correct.

I used the following - svyset coupleid [pweight=weight] - to set up my data in order to run weighted cross-tabs that take into account clusters based on couples as per the STATA survey manual (there is no other information to input (for example, strata)) but this gives me output that is the same as non-clustered svy commands (svyset [pweight=weight]). As I understand it, the clustered svyset should be taking into account non-independence and therefore I should get slightly different output.

Is there something I have missed or that I am doing completely wrong?
Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 696

27 Apr 2015, 09:10

Absent missing values in your coupleid variable, primary sampling units (PSUs) will not affect
point estimates. The PSU information is used in variance estimation.

If you are using svy: tabulate for computing the weighted cross-tabs, you can request that
standard errors be reported by specifying the se option. Here is an example from the help
file for svy: tabulate twoway.

Code:

. webuse nhanes2b

. svy: tabulate race diabetes, se
(running tabulate on estimation sample)

Number of strata   =        31                Number of obs     =       10,349
Number of PSUs     =        62                Population size   =  117,131,111
                                              Design df         =           31

-------------------------------------------
1=white,  |
2=black,  |      diabetes, 1=yes, 0=no     
3=other   |         0          1      Total
----------+--------------------------------
    White |      .851      .0281      .8791
          |   (.0158)    (.0019)    (.0167)
          | 
    Black |     .0899      .0056      .0955
          |   (.0121)  (8.5e-04)    (.0128)
          | 
    Other |     .0248    5.2e-04      .0253
          |   (.0102)  (3.9e-04)    (.0105)
          | 
    Total |     .9658      .0342          1
          |   (.0018)    (.0018)           
-------------------------------------------
  Key:  cell proportion
        (linearized standard error of cell proportion)

  Pearson:
    Uncorrected   chi2(2)         =   21.3483
    Design-based  F(1.52, 47.26)  =   15.0056     P = 0.0000

Here is the same tabulation, with sampling weights only.

Code:

. svyset [pw=finalwgt]

      pweight: finalwgt
          VCE: linearized
  Single unit: missing
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: <zero>

. svy: tabulate race diabetes, se
(running tabulate on estimation sample)

Number of strata   =         1                Number of obs     =       10,349
Number of PSUs     =    10,349                Population size   =  117,131,111
                                              Design df         =       10,348

-------------------------------------------
1=white,  |
2=black,  |      diabetes, 1=yes, 0=no     
3=other   |         0          1      Total
----------+--------------------------------
    White |      .851      .0281      .8791
          |   (.0041)    (.0017)    (.0039)
          | 
    Black |     .0899      .0056      .0955
          |   (.0033)  (8.0e-04)    (.0034)
          | 
    Other |     .0248    5.2e-04      .0253
          |    (.002)  (2.4e-04)     (.002)
          | 
    Total |     .9658      .0342          1
          |   (.0019)    (.0019)           
-------------------------------------------
  Key:  cell proportion
        (linearized standard error of cell proportion)

  Pearson:
    Uncorrected   chi2(2)         =   21.3483
    Design-based  F(2.00, 20692.32)=    9.0393    P = 0.0001

The estimated cell proportions are the same, but the standard error estimates are different.

Comment

Rachael G

Join Date: Apr 2015

Posts: 2
#3

27 Apr 2015, 10:22

Thanks Jeff, that's really helpful.

With my data, I am slightly concerned that the chi2 values do not change between the two types of svyset, which would have indicated that Stata is doing something different for each and taking into account non-independence. I suspect that because I only have coupleid (and partner1id/partner2id) but no other information such as strata, Stata is treating the 340 couples as 340 individuals and analysing what should be couple data as individual data.

I'm not sure if I can set-out my data in other form that would get around this?
Comment

Announcement