Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating difference of proportions - 2 cross-sectional surveys

    Hi there,

    I am planning to compare a baseline and endline cross-sectional survey that used complex sampling methods (multi-stage cluster sampling). I understand how to calculate for the individual cross-sectional survey data using svyset commands after receiving insight from the kind users on this forum. However, my question is what would the best way to calculate the difference in proportions and confidence intervals of the difference and also conducting statistical tests? It is unpaired data. The only guidance for STATA I could find is the 'prtest' command used between 2 samples which I don't think considers the changes related to complex sampling.

    Thanks,
    -Brett

  • #2
    Brett:
    you may want to take a look at -svy: tabulate twoway-.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hi Carlo,

      Thanks for your reply. I will check this out a bit further. It looks like what I would need.

      Thanks again,
      -Brett

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Brett:
        you may want to take a look at -svy: tabulate twoway-.
        Hi Carlo,

        Going back to this suggestion, I calculated some estimates for variables as below:

        svy: tabulate eb1 eb2, ci
        (running tabulate on estimation sample)

        Number of strata = 1 Number of obs = 784
        Number of PSUs = 38 Population size = 784
        Design df = 37

        -------------------------------------------------------
        | eb2
        eb1 | 0 1 Total
        ----------+--------------------------------------------
        0 | .5829 .0765 .6594
        | [.5355,.6288] [.059,.0988] [.6126,.7034]
        |
        1 | .0638 .2768 .3406
        | [.0418,.0962] [.2484,.3071] [.2966,.3874]
        |
        Total | .6467 .3533 1
        | [.6112,.6806] [.3194,.3888]
        -------------------------------------------------------
        Key: cell proportion
        [95% confidence interval for cell proportion]

        Pearson:
        Uncorrected chi2(1) = 374.0216
        Design-based F(1, 37) = 289.4519 P = 0.0000

        I just have 2 questions you might be able to answer.
        • First I notice the number of observations is the final number for eb2 and doesn't show eb1. Is this normal?
        • Secondly does this output provide any insight other than the pearson chi square test result? I couldn't understand what the two way table was suggesting.
        Thanks again,
        -Brett

        Comment


        • #5
          Brett:
          you should have obtained something similar to what follows (your output is hard to read because it is not reported between CODE delimiters):
          Code:
          . webuse nhanes2f
          
          . svy: tabulate sex race , ci
          (running tabulate on estimation sample)
          
          Number of strata = 31                            Number of obs   =      10,337
          Number of PSUs   = 62                            Population size = 117,023,659
                                                           Design df       =          31
          
          ----------------------------------------------------------------------
                    |                            Race                          
                Sex |         White          Black          Other          Total
          ----------+-----------------------------------------------------------
               Male |         .4227          .0435          .0133          .4796
                    | [.4064,.4392]  [.0321,.0589]  [.0048,.0363]  [.4678,.4914]
                    |
             Female |         .4563          .0521           .012          .5204
                    | [.4344,.4784]   [.0397,.068]  [.0061,.0237]  [.5086,.5322]
                    |
              Total |          .879          .0956          .0254              1
                    | [.8406,.9092]   [.0725,.125]  [.0108,.0585]              
          ----------------------------------------------------------------------
          Key: Cell proportion
               [95% confidence interval for cell proportion]
          
            Pearson:
              Uncorrected   chi2(2)         =    4.5394
              Design-based  F(1.92, 59.66)  =    1.2559     P = 0.2913
          
          .
          That said:
          -proportions are summed up per column;
          - your Pearson outcome rejects the null that your two variables are uncorrelated.
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Hi Carlo,

            Appreciate the response and sorry for my output. I am still learning how to use the forums. In my example, eb1 and eb2 are the variables from 2 separate surveys at different points in time (baseline and endline). Would this still be the practical way to calculate chi-square to see whether there is statistically significant correlation between baseline data and endline data and exposure (which is the intervention)? This would also take into consideration the cluster sampling approach in both surveys?

            Thanks,
            -Brett

            Comment


            • #7
              Brett:
              I would say yes about your first question, but I do not know whether the cluster sampling appraoch is correcly captured by this test.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Okay thanks for your support Carlo! Much appreciated!

                -Brett

                Comment

                Working...
                X