Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with raking: am I using survwgt rake correctly?

    Hello, thank you for taking the time to go through my question! First of all, I’m sorry if I use any terminology that isn’t correct or very precise within my question, let me know if you need clarification. So I’m doing the initial statistical analysis of an opt-in and therefore non-probabilistic survey for which I need to construct the necessary weights for it to be “representative” of Chile’s national population. Since there are four sociodemographic variables to characterize our respondents (sex, socioeconomic group, age, and geographic zone) for which there are known population values derived from census results I decided to try raking using the survwgt package and specifically the survwgt rake command.

    After I constructed and entered the necessary code, and set the resulting raked weight as my pweight (typing “svyset [pw=rakedweight]”) and then typed, for example, “svy: tab sex”, the resulting proportion matched the population proportions for gender (.4895 are men and .5105 are women) which makes me think my code is correct. Moreover, when I look at the results for the several questions we asked respondents in terms of frequency and percentages the weighted results are different from the unweighted results but in what appears like a sensible and not dramatic way that aligns with the differences between our sample’s sociodemographic characteristics and the population’s.

    However, I wanted to share my code and what I did with you guys to really make sure I constructed the weights correctly before I create all the frequency/percentage tables that we will then use to create the graphics for our final report on the survey results. Let me know what you think of my code and if it appears that I used the survwgt package and commands correctly! Thank you

    First I created a weight variable called “weight1” that contains the weight for each category of each sociodemographic variables (so the population proportion over the sample proportion derived from the census). For sex there are two values (male and female), for age we split up respondents into three age groups, for region we split up respondents into three regions, and for socioeconomic group (gse) we split up respondents into five groups. By the way, I’m not including the values resulting from dividing the sample proportion by the population proportion to keep the data confidential, I’ve left a “?” instead. But I obtained them in the following way: for sex, for example, I divided the total number of males in the population by the total population (population proportion) and divided the total number of male respondents in the sample by the total number of respondents in the sample (sample proportion).Then I divided that population proportion by the sample proportion and so obtained each of the values that would go in the “weight1 = ?” part of my code.

    gen weight1 = .
    replace weight1 = ? if sex == 1
    replace weight1 = ? if sex == 2
    replace weight1 = ? if age == 1
    replace weight1 = ? if age == 2
    replace weight1 = ? if age == 3
    replace weight1 = ? if region == 1
    replace weight1 = ? if region == 2
    replace weight1 = ? if region == 3
    replace weight1 = ? if gse == 1
    replace weight1 = ? if gse == 2
    replace weight1 = ? if gse == 3
    replace weight1 = ? if gse == 4
    replace weight1 = ? if gse == 5


    Then I created the population totals for each of the values of each of the four sociodemographic variables. The total population is 13,303,435 since it considers those 18+ and not younger. The marginal(?) totals for every single variable add up to that number.

    generate sex_tot = .
    replace sex_tot = 6512031 if sex == 1
    replace sex_tot = 6791404 if sex == 2

    generate age_tot = .
    replace age _tot = 4653488 if age == 1
    replace age _tot = 4751997 if age == 2
    replace age _tot = 3897950 if age == 3

    generate region_tot = .
    replace region_tot = 3046487 if region == 1
    replace region_tot = 5383900 if region == 2
    replace region_tot = 4873048 if region == 3

    generate gse_tot = .
    replace gse_tot = 1822571 if gse == 1
    replace gse_tot = 1529895 if gse == 2
    replace gse_tot = 3365769 if gse == 3
    replace gse_tot = 4908968 if gse == 4
    replace gse_tot = 1676232 if gse == 5


    Finally, here’s my code using the survwgt rake command:

    survwgt rake weight1 , by(sex age region gse) totvars(sex_tot age_tot region_tot gse_tot) generate(rakedweight)

    svyset [pw=rakedweight]


    Is what I did correct?

    Then, in order to obtain the weighted frequencies and percentages for each question I used the following code (example is for question 1 or “p1”):

    tabulate p1 [aw = rakedweight]

    Is it okay for me to use the raked weight variable I generated (rakedweight) as an aweight and tabulate like I did?

    Then, to obtain the weighted frequencies and percentages for each question through cross tabulations for each sociodemographic variable I used the following code (example is also for question 1 or p1):

    svy: tabulate p1 sex, col
    svy: tabulate p1 age, col
    svy: tabulate p1 region, col
    svy: tabulate p1 gse, col


    Is that code okay as well?

    Thank you so much!!

  • #2
    Let me dig this thread out and continue to expect the experienced to answer.

    Comment


    • #3
      Richard Valliant and Jill A. Dever wrote a book Survey Weights: A Step-by-Step Guide to Calculation, https://www.stata.com/bookstore/survey-weights/. I think it will be helpful.

      Comment

      Working...
      X