Calculating difference of proportions - 2 cross-sectional surveys

Brett Collins

Join Date: Nov 2021

Posts: 11
#1

Calculating difference of proportions - 2 cross-sectional surveys

16 Mar 2022, 11:31

Hi there,

I am planning to compare a baseline and endline cross-sectional survey that used complex sampling methods (multi-stage cluster sampling). I understand how to calculate for the individual cross-sectional survey data using svyset commands after receiving insight from the kind users on this forum. However, my question is what would the best way to calculate the difference in proportions and confidence intervals of the difference and also conducting statistical tests? It is unpaired data. The only guidance for STATA I could find is the 'prtest' command used between 2 samples which I don't think considers the changes related to complex sampling.

Thanks,
-Brett
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

16 Mar 2022, 12:44

Brett:
you may want to take a look at -svy: tabulate twoway-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Brett Collins

Join Date: Nov 2021

Posts: 11
#3

19 Mar 2022, 07:55

Hi Carlo,

Thanks for your reply. I will check this out a bit further. It looks like what I would need.

Thanks again,
-Brett
Comment
Brett Collins

Join Date: Nov 2021

Posts: 11
#4

05 Jul 2022, 17:09

Originally posted by Carlo Lazzaro View Post

Brett:
you may want to take a look at -svy: tabulate twoway-.

Hi Carlo,

Going back to this suggestion, I calculated some estimates for variables as below:

svy: tabulate eb1 eb2, ci
(running tabulate on estimation sample)

Number of strata = 1 Number of obs = 784
Number of PSUs = 38 Population size = 784
Design df = 37

-------------------------------------------------------
| eb2
eb1 | 0 1 Total
----------+--------------------------------------------
0 | .5829 .0765 .6594
| [.5355,.6288] [.059,.0988] [.6126,.7034]
|
1 | .0638 .2768 .3406
| [.0418,.0962] [.2484,.3071] [.2966,.3874]
|
Total | .6467 .3533 1
| [.6112,.6806] [.3194,.3888]
-------------------------------------------------------
Key: cell proportion
[95% confidence interval for cell proportion]

Pearson:
Uncorrected chi2(1) = 374.0216
Design-based F(1, 37) = 289.4519 P = 0.0000

I just have 2 questions you might be able to answer.
First I notice the number of observations is the final number for eb2 and doesn't show eb1. Is this normal?

Secondly does this output provide any insight other than the pearson chi square test result? I couldn't understand what the two way table was suggesting.

Thanks again,
-Brett
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

06 Jul 2022, 00:01

Brett:
you should have obtained something similar to what follows (your output is hard to read because it is not reported between CODE delimiters):

Code:

. webuse nhanes2f

. svy: tabulate sex race , ci
(running tabulate on estimation sample)

Number of strata = 31                            Number of obs   =      10,337
Number of PSUs   = 62                            Population size = 117,023,659
                                                 Design df       =          31

----------------------------------------------------------------------
          |                            Race                          
      Sex |         White          Black          Other          Total
----------+-----------------------------------------------------------
     Male |         .4227          .0435          .0133          .4796
          | [.4064,.4392]  [.0321,.0589]  [.0048,.0363]  [.4678,.4914]
          |
   Female |         .4563          .0521           .012          .5204
          | [.4344,.4784]   [.0397,.068]  [.0061,.0237]  [.5086,.5322]
          |
    Total |          .879          .0956          .0254              1
          | [.8406,.9092]   [.0725,.125]  [.0108,.0585]              
----------------------------------------------------------------------
Key: Cell proportion
     [95% confidence interval for cell proportion]

  Pearson:
    Uncorrected   chi2(2)         =    4.5394
    Design-based  F(1.92, 59.66)  =    1.2559     P = 0.2913

.

That said:
-proportions are summed up per column;
- your Pearson outcome rejects the null that your two variables are uncorrelated.

Kind regards,
Carlo
(Stata 19.0)

Comment

Brett Collins

Join Date: Nov 2021

Posts: 11
#6

06 Jul 2022, 10:12

Hi Carlo,

Appreciate the response and sorry for my output. I am still learning how to use the forums. In my example, eb1 and eb2 are the variables from 2 separate surveys at different points in time (baseline and endline). Would this still be the practical way to calculate chi-square to see whether there is statistically significant correlation between baseline data and endline data and exposure (which is the intervention)? This would also take into consideration the cluster sampling approach in both surveys?

Thanks,
-Brett
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#7

06 Jul 2022, 10:39

Brett:
I would say yes about your first question, but I do not know whether the cluster sampling appraoch is correcly captured by this test.

Kind regards,
Carlo
(Stata 19.0)
Comment
Brett Collins

Join Date: Nov 2021

Posts: 11
#8

07 Jul 2022, 07:38

Okay thanks for your support Carlo! Much appreciated!

-Brett
Comment

Announcement

Calculating difference of proportions - 2 cross-sectional surveys

Comment

Comment

Comment

Comment

Comment

Comment

Comment