analysis of complex survey data

Martina Velasova

Join Date: Jul 2016

Posts: 4
#1

analysis of complex survey data

05 Jul 2016, 07:20

Hi All,

I am trying to estimate population prevalence of disease X using survey data. I created a new variable sampling weight calculated as an inverse probability of sample selection. After declaring survey design, I calculated my population prevalence. I have then found online a formula for calculation of population proportion from the survey using stratified PPS sampling (please, see below) and recalculated my results by hand. Using the formula below I obtained slightly different result. I now understand that Stata calculate population proportion by weighting the individual observations with disease by their respective weights to obtain the total population count which is then divided by the total N.

My question is why the two different ways of obtaining population proportion yielded slightly different results and which one is correct? I thought that there would only be one way of correctly analysing data collected from the stratified sampling structure.

I would be grateful if anyone could clarify this to me.

Thank you!

Martina
Tags: None
Martina Velasova

Join Date: Jul 2016

Posts: 4
#2

05 Jul 2016, 09:57

Seems that the formula was not attached properly...

Thank you.

Martina

Attached Files

formula fpr population prevalence estimation.docx (11.6 KB, 1 view)
Comment
Martina Velasova

Join Date: Jul 2016

Posts: 4
#3

05 Jul 2016, 15:46

The formula used for hand re-calculation of population prevalence is provided in the document "under stratified PPS sampling". The first document contains formula that Stata uses for the calculation of the population prevalence
Attached Files

population proportion under stratified PPS sampling.docx (18.1 KB, 1 view)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#4

05 Jul 2016, 19:44

Martina I think you would have better luck if you posted your Stata code and output. See pt. 12 of the FAQ. It would be especially good if you could post a replicatable example using the dataex command.

The Stata manual include the formulas used, so you could compare them with the ones you used,

How different is slightly different? Is it small enough that it could just be rounding error on your part?

In general, if Stata does something slightly different than I expect it to do my default assumption is that Stata is smarter than I am. Although occasionally you will find bugs or at least discover that alternate formulas and approaches are out there.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Martina Velasova

Join Date: Jul 2016
Posts: 4

06 Jul 2016, 16:50

Dear Richard,

Thank you for your kind reply. I apologise for not getting it right in my previous posts.

Please see below how I have obtained my population prevalence using Stata:

Code:

svyset farmid [pweight=herdsize_weight], strata(region_size_cat) vce(linearized) singleunit(certainty)

Code:

svy: prop  fascelisa_pn

Code:

Survey: Proportion estimation

Number of strata =      17          Number of obs    =     224
Number of PSUs   =     224          Population size  = 9308.67
                                    Design df        =     207

--------------------------------------------------------------
             |             Linearized
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
fascelisa_pn |
           0 |   .4482787     .03431      .3806369    .5159205
           1 |   .5517213     .03431      .4840795    .6193631

For hand calculation I used formula which I found on internet and is basically sum of (proportion of individuals with disease outcome (1) in each stratum multiplied by their respective weights)/(sum of weights from all strata). The result I obtained was 61.5% (the last line is sum of weights w_i=838.08 and sum of p_i*w_i=516.31):

Code:

w_i p_i w_i*p_i
0 0 0
51.26667 0.933333 47.84889
200.1667 1 200.1667
30.33333 1 30.33333
15.3913 0.608696 9.36862
35.77778 0.444444 15.90123
8.6 0.8 6.88
7.190476 0.809524 5.820862
32.54545 0.636364 20.71074
95.5 1 95.5
64.6 0.266667 17.22667
92 0 0
34 0 0
14 0.095238 1.333333
37.21053 0.105263 3.916898
38.8 0.6 23.28
43.05556 0.388889 16.74383
37.65217 0.565217 21.28166
838.0899  516.3127

I have then realised that there must be other ways of calculating it or I have made a mistake in Stata when declaring the survey design or when calculating sampling weights. I have found formula Stata is using and indeed it is different. But I am now wondering why there is such a big difference between the two results. as both formulas are intended for calculation of population prevalence (proportion)

Thank you.

Kind regards,

Martina

Announcement

analysis of complex survey data

Comment

Comment

Comment

Comment