Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Specifying stage-specific weights for a multilevel model using survey data

    I am using Stata 16.

    I am trying to fit a multilevel model with country random effects on survey data from multiple countries. The response is an unordered categorical variable, so I am using multinomial logit. Here is a sample of my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(identity age empl sex marstat urban educ) long country double wgt int WP12258 double(WP12258A WP12259)
    2    17    5    2    4    3    1    9    0.918483471    1    2017100001    7
    1    21    6    2    2    2    2    9    1.063579888    7    2017100007    7
    2    22    6    1    2    2    2    9    0.337893372    1    2017100001    10
    1    23    6    2    2    2    2    9    1.567769236    18    2017100018    10
    1    27    2    1    2    6    1    9    0.475945213    11    2017100011    16
    1    28    6    2    2    2    3    9    1.056964223    19    2017100019    16
    2    31    1    1    2    2    2    9    0.265830759    7    2017100007    24
    2    31    1    2    1    6    1    9    1.072376993    5    2017100005    25
    2    32    6    2    3    1    2    9    0.458895125    6    2017100006    33
    2    35    6    2    2    3    1    9    1.478014368    1    2017100001    44
    1    36    2    1    2    6    1    9    0.43331561    15    2017100015    45
    1    37    6    2    1    2    2    9    1.008920356    8    2017100008    53
    3    39    1    1    2    2    1    9    0.293730707    1    2017100001    61
    1    41    2    1    2    2    2    9    1.298432642    10    2017100010    61
    1    41    1    1    2    1    2    9    1.511035436    9    2017100009    80
    2    45    2    2    2    1    2    9    0.398746139    1    2017100001    81
    1    60    1    1    2    6    1    9    0.553875541    14    2017100014    87
    1    60    1    1    1    2    2    9    0.280035342    9    2017100009    91
    2    61    2    1    2    2    2    9    2.464917433    5    2017100005    92
    1    67    6    1    1    6    1    9    1.056964223    19    2017100019    96
    
    
    end
    label values identity WP22091
    label def WP22091 1 "Being a part of the city or area where you live", modify
    label def WP22091 2 "Being a part of this country", modify
    label def WP22091 3 "Being a part of the world", modify
    label values age WP1220
    label values empl EMP_2010
    label def EMP_2010 1 "Employed full time for an employer", modify
    label def EMP_2010 2 "Employed full time for self", modify
    label def EMP_2010 5 "Employed part time want full time", modify
    label def EMP_2010 6 "Out of workforce", modify
    label values sex WP1219
    label def WP1219 1 "Male", modify
    label def WP1219 2 "Female", modify
    label values marstat WP1223
    label def WP1223 1 "Single/Never been married", modify
    label def WP1223 2 "Married", modify
    label def WP1223 3 "Separated", modify
    label def WP1223 4 "Divorced", modify
    label values urban WP14
    label def WP14 1 "A rural area or on a farm", modify
    label def WP14 2 "A small town or village", modify
    label def WP14 3 "A large city", modify
    label def WP14 6 "A suburb of a large city", modify
    label values educ WP3117
    label def WP3117 1 "Completed elementary education or less (up to 8 years of basic education)", modify
    label def WP3117 2 "Secondary - 3 year TertiarySecondary education and some education beyond secondary education (9-15 years of educatio", modify
    label def WP3117 3 "Completed four years of education beyond high school and/or received a 4-year college degree.", modify
    label values country country
    label def country 9 "Andorra", modify
    I am treating my data as two-stage: a country gets selected first with a constant probability of 1 (this is a simplifying assumption, since countries were not drawn randomly) and then each response has a final design weight (variable WGT). I also have the strata and PSU id's. So far, my survey design declaration is

    Code:
    svyset [pweight = wgt], strata(WP12258A) psu(WP12259) singleunit(certainty)
    The only way to fit a multinomial logit model with random effects to survey data in Stata 16 that I found seems to be through gsem. I am basing it on https://www.stata.com/stata-news/news29-2/xtmlogit/.


    This is the model I try to run:

    Code:
    . svy: gsem (2.identity <- age i.empl i.sex i.marstat i.urban i.educ R1[country]) (3.identity
    >  <- age i.empl i.sex i.marstat i.urban i.educ R2[country]), mlogit
    (running gsem on estimation sample)
    survey final weights not allowed with multilevel models;
        a final weight variable was svyset using the [pw=exp] syntax, but multilevel models
        require that each stage-level weight variable is svyset using the stage's corresponding
        weight() option
    an error occurred when svy executed gsem
    The error message is asking me to specify stage-level weights, which I think in my case are WGT at first stage (individuals) and 1 at second stage (countries). I consulted the Stata manual and did much googling but couldn't figure out how to enter the stage-specific weights, especially a constant 1, into svyset. Does anyone have any ideas?
    Last edited by Rouslan Karimov; 12 Apr 2023, 17:34.

  • #2
    After additional research, it appears the answer can be found here: https://www.statalist.org/forums/for...61#post1291461.

    For my specific case, the svyset becomes

    Code:
    . gen countrywgt = 1
    
    . svyset WP12259, weight(wgt) strata(WP12258A) singleunit(certainty) || country, weight(countrywgt)

    Comment

    Working...
    X