calculate weights

Attila Nagy

Join Date: May 2016

Posts: 19
#1

calculate weights

20 May 2016, 16:12

Hi, I have a local representative database, in which the sampling is similar to European Health Interview Survey sampling. How can I calculate weights? What is the formula? Do I need strata specific sampling probabilities based on source population then in next step shall I use nonrespodents to calculate final weights? After calculating this which weight type shall I use eg in summary stats or regressions?
Thanks a lot, regards
Tags: data, survey, weighting
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

20 May 2016, 17:10

I'm unfamiliar with the European Health Survey. I can refer you to two sampling texts that cover construction of weights.

Groves, Robert M., Floyd J. Fowler, Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau. 2009. Survey methodology, Second Edition. Hoboken, N.J.: Wiley, Section 10.5
Lohr, Sharon L. 2009. Sampling: Design and Analysis. Boston, MA: Cengage Brooks/Cole, Chapter 7 and Section 8.5

Groves is my first recommendation. Lohr has more statistical theory along with its good examples.

Good luck!

Last edited by Steve Samuels; 20 May 2016, 17:52.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Attila Nagy

Join Date: May 2016

Posts: 19
#3

21 May 2016, 13:50

Thanks..I try to figure it out...so there is no simple formula? Or in this book?
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

21 May 2016, 16:31

I know nothing about your sample design-sampling frame, strata, sampling stages, probabilities at each stage, sampling method at each stage, ultimate analysis unit. Therefore I can't tell whether the formulas will be "simple" or not. (Ordinarily the non-response weighting adjustment would not be "simple").

Since you are emulating known surveys, post a link to a description of the design and analysis for one of those surveys. If necessary, write to the authors of publications from that survey; they are often willing to share information. Best would be for you to describe in detail the design of the local survey.

Groves et al. list weights with four different functions:

1. w1: First stage ratio adjustment to compensate for chance variation in size of primary sampling unit

2. w2: Compensation for unequal sampling probabilities: multiply probabilities at each stage to get final probability of selection, then invert to get final selection weight. Some surveys plan sampling so that final selection weights are equal (equal probability selection methods, or "epsem")

3. w3: Adjustment for non-response, for example weight up responders with probability estimated from a logistic regression of the probability of response

The weight up to that point is w* = w1 x w2 x w3

4. w4 (final weight): Post-stratify w* to match known population characteristics (sample balancing, raking). This can also partly compensate for a poor design at the expense of increasing standard errors. Stata has contributed commands ipfweight, ipfraking, survwgt rake, and calibrate that can do this.

If you google "sampling weight construction" or related terms, you might also find some helpful suggestions.

Last edited by Steve Samuels; 21 May 2016, 17:00.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Attila Nagy

Join Date: May 2016

Posts: 19
#5

22 May 2016, 02:50

Thanks a lot..I am checking design and decriptions
Comment
Attila Nagy

Join Date: May 2016

Posts: 19
#6

26 May 2016, 12:09

I have a database in which there are patients. Sample size is 1000. (100% response due to spare sample/replacement etc)
20 doctors were involved. Out of 50 doctors, 20 were selected and 50 patients were selected randomly by each doctor.

To use survey modul and to calculate weights, shall I use this one (svyset example):

One-stage clustered design with stratification
svyset su1 [pweight=pw], strata(strata)

where pw=N/n (total number(50GP) of patients in given strata (agegroup/gender) divided by the ones in the sample. Strata should be the GP id code?
And su1?

Or is a different formula should be used?

thanks,
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

26 May 2016, 13:47

Thanks. Exactly what's analyses do you intend to do? To describe experience in the population? To measure association of some characteristics with other characteristicd? To test hypotheses?

You say that "50 patients were selected randomly by each doctor". Exactly how did the doctors do this? Were they taught how to take a simple random sample (needs random ordering) or a systematic sample (needs random start)?. Note that your statement about "100% response" is not correct technically; "response" applies to the initial selections only; and in any report you will have to report the fraction of patients for whom no substitutes were required as the original "response rate".

You have two stages of sampling.

The weight at each stage is the inverse of the sampling probabilities:
1. Select doctors.
If 20 doctors were selected from 50 with simple random sampling, the sampling probability is f_i = 20/50 for each doctor, so the weight is

w_1i = 50/20 = 2.5

2. Select patients. If 50 patients were selected by each doctor using a real random sampling method, the initial probability of selecting a patient j is f_ij = 50/n_i where N_i is the number of patients in the doctor's population. We'll give the probability to substitutes, although this is not completely accurate as some approached substitutes also might not have responded.
Then, the patient selection weight is:

w2_i = N_i/50

3. The final weight multiplies the two weights:

wtfinal_ij = = w1_i x w2_i = 2.5 x N_i/50 = N_i/20

You'll want to create a Stata variable to hold the N for each doctor, call it maybe "docpats".

To estimate descriptive statistics (proportions, means, totals), use this svyset

Code:

gen fpc1 = 50 // total number of doctors svyset gp_id [pw = wtfinal], fpc(fpc1) || _n, fpc(docpats)

For analyses of association (e.g. regression) and for hypothesis testing, do a new svyset

Code:

svyset gp_id [pw = wtfinal] || _n

If you know the patient totals by age and gender in the entire population, not just the sample, you can post-stratify by age and gender. See the manual entry for poststratification (p. 53 for the Stata 14 SVY Manual).

Last edited by Steve Samuels; 26 May 2016, 13:57.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Attila Nagy

Join Date: May 2016

Posts: 19
#8

26 May 2016, 14:08

Thanks,
Yes sample was randomly ordered, and I need both descriptives and associations. The 50 doctors are representative for source population, and I know age/gender distribution in both sample(20 doctors) and total doctors (50). So for weighting no need age/gender distribution/age-gender specific weights, only for poststratification? I know how to poststratify e.g. for gender, but I will check how to post-stratify for age and gender.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

26 May 2016, 14:21

Good! Yes, you need age and gender only for post-stratification. Easiest way: if there are, say 4 age groups and 2 genders, form a single variable age_gend with 8 categories. You can get this by:

Code:

egen age_gend= group(agegp gender)

If you only have age category counts and gender category counts, but not the combination you'll do something called "sample raking" and will need to download a contributed command.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Attila Nagy

Join Date: May 2016

Posts: 19
#10

26 May 2016, 14:23

checked...so no need for age and gender during weighting and I can create a variable like agegender, which is e.g. 21 in case of 20-30 year old male, and I have to merge population numbers for them e.g. in a variable called agegenderpop, so e.g. agegender=21 (20-30 male) and agegenderpop=12500 total number in this case and poststrata is agegender and poststrataweight is agegenderpop. Am I right? If so do I need posstrata for both descriptive and associations?
Comment
Attila Nagy

Join Date: May 2016

Posts: 19
#11

26 May 2016, 14:26

I have just read your prev comment. I know the ages so I can create any agegroups, but I am thinking about 10 year agegroups like 20-29, etc is it good or shall I use less categories? age is 18+
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#12

26 May 2016, 14:39

You have a big sample size; ten-year age groups look okay to me; you could have some smaller ones, if you wished, just for the post-stratification. Note that the post-stratification and the analysis categories don't need to be identical.

Last edited by Steve Samuels; 26 May 2016, 14:42.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Attila Nagy

Join Date: May 2016

Posts: 19
#13

26 May 2016, 15:34

Thanks
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment