Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • analysis of complex survey data

    Hi All,

    I am trying to estimate population prevalence of disease X using survey data. I created a new variable sampling weight calculated as an inverse probability of sample selection. After declaring survey design, I calculated my population prevalence. I have then found online a formula for calculation of population proportion from the survey using stratified PPS sampling (please, see below) and recalculated my results by hand. Using the formula below I obtained slightly different result. I now understand that Stata calculate population proportion by weighting the individual observations with disease by their respective weights to obtain the total population count which is then divided by the total N.

    My question is why the two different ways of obtaining population proportion yielded slightly different results and which one is correct? I thought that there would only be one way of correctly analysing data collected from the stratified sampling structure.

    I would be grateful if anyone could clarify this to me.

    Thank you!

    Martina





  • #2
    Seems that the formula was not attached properly...

    Thank you.

    Martina

    Attached Files

    Comment


    • #3
      The formula used for hand re-calculation of population prevalence is provided in the document "under stratified PPS sampling". The first document contains formula that Stata uses for the calculation of the population prevalence
      Attached Files

      Comment


      • #4
        Martina I think you would have better luck if you posted your Stata code and output. See pt. 12 of the FAQ. It would be especially good if you could post a replicatable example using the dataex command.

        The Stata manual include the formulas used, so you could compare them with the ones you used,

        How different is slightly different? Is it small enough that it could just be rounding error on your part?

        In general, if Stata does something slightly different than I expect it to do my default assumption is that Stata is smarter than I am. Although occasionally you will find bugs or at least discover that alternate formulas and approaches are out there.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Dear Richard,

          Thank you for your kind reply. I apologise for not getting it right in my previous posts.

          Please see below how I have obtained my population prevalence using Stata:

          Code:
          svyset farmid [pweight=herdsize_weight], strata(region_size_cat) vce(linearized) singleunit(certainty)
          Code:
          svy: prop  fascelisa_pn
          Code:
          Survey: Proportion estimation
          
          Number of strata =      17          Number of obs    =     224
          Number of PSUs   =     224          Population size  = 9308.67
                                              Design df        =     207
          
          --------------------------------------------------------------
                       |             Linearized
                       | Proportion   Std. Err.     [95% Conf. Interval]
          -------------+------------------------------------------------
          fascelisa_pn |
                     0 |   .4482787     .03431      .3806369    .5159205
                     1 |   .5517213     .03431      .4840795    .6193631
          For hand calculation I used formula which I found on internet and is basically sum of (proportion of individuals with disease outcome (1) in each stratum multiplied by their respective weights)/(sum of weights from all strata). The result I obtained was 61.5% (the last line is sum of weights w_i=838.08 and sum of p_i*w_i=516.31):

          Code:
          w_i p_i w_i*p_i
          0 0 0
          51.26667 0.933333 47.84889
          200.1667 1 200.1667
          30.33333 1 30.33333
          15.3913 0.608696 9.36862
          35.77778 0.444444 15.90123
          8.6 0.8 6.88
          7.190476 0.809524 5.820862
          32.54545 0.636364 20.71074
          95.5 1 95.5
          64.6 0.266667 17.22667
          92 0 0
          34 0 0
          14 0.095238 1.333333
          37.21053 0.105263 3.916898
          38.8 0.6 23.28
          43.05556 0.388889 16.74383
          37.65217 0.565217 21.28166
          838.0899  516.3127
          I have then realised that there must be other ways of calculating it or I have made a mistake in Stata when declaring the survey design or when calculating sampling weights. I have found formula Stata is using and indeed it is different. But I am now wondering why there is such a big difference between the two results. as both formulas are intended for calculation of population prevalence (proportion)

          Thank you.

          Kind regards,

          Martina

          Comment

          Working...
          X