Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weight Calculation for OLS Model Utilizing Survey Data and Census Data

    Hello,

    This is my first post here, and I'm very glad to join Statalist. I've been a passive user of the forum for a year and a half now, and I've always been impressed with how supportive this community is. Grateful to its members and encouraged by their generosity, I am posting my first question. I hope it won't violate the forum's guidelines, but if yes, I'll apologize and I'll be quick to learn.

    My main problem:
    - My dataset comes from an online survey conducted in Poland. It has an unbalanced sex and age structure with respect to the population it's supposed to represent. ​I would like to calculate weights for each gender-age group, separately for each region. To that end, I have downloaded Poland's census data. Then, I would like to apply the weights in my regression models (OLS, robust).

    A problem of secondary importance:
    - Poland's census municipality-level data only contains the number of people by sex-age groups, but not by sex-age-municipality groups (where, by municipality I mean the type of municipality based on its urbanization and size). Given this, is it possible for me to weigh my observations so that they match not only the sex-age structure of the population, but also the municipality-type structure of the population at the same time?

    Questions:
    1. Is my calculation formula for weights correct given my regression model and the situation described?
    2. Am I using the right Stata command, given the calculation formula?

    My weight calculation (applied to each subsample of sex-age-region separately):
    prob_inverse = 1 / census_share

    where:
    ​census_share = (number of people of sex X, age range Y-Z, region A) / (population of region A)​

    My stata command:
    Code:
    reg  y x [pweight = prob_inverse], r

    (Alternatively, I have considered the following calculation:

    weight = ​census share / sample share

    where:
    census_share is the same as above;
    sample_share = (respondents of people of sex X, age range Y-Z, region A) / (all respondents from region A)



    However, I am not sure which Stata command the latter weight would correspond to.)

    I would really appreciate any help--not only direct answers, but also reading recommendations (preferably a practical guide that I can read and apply quickly). Thank you.

Working...
X