Specifying stage-specific weights for a multilevel model using survey data

Rouslan Karimov

Join Date: Apr 2023
Posts: 8

Specifying stage-specific weights for a multilevel model using survey data

12 Apr 2023, 17:24

I am using Stata 16.

I am trying to fit a multilevel model with country random effects on survey data from multiple countries. The response is an unordered categorical variable, so I am using multinomial logit. Here is a sample of my data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(identity age empl sex marstat urban educ) long country double wgt int WP12258 double(WP12258A WP12259)
2    17    5    2    4    3    1    9    0.918483471    1    2017100001    7
1    21    6    2    2    2    2    9    1.063579888    7    2017100007    7
2    22    6    1    2    2    2    9    0.337893372    1    2017100001    10
1    23    6    2    2    2    2    9    1.567769236    18    2017100018    10
1    27    2    1    2    6    1    9    0.475945213    11    2017100011    16
1    28    6    2    2    2    3    9    1.056964223    19    2017100019    16
2    31    1    1    2    2    2    9    0.265830759    7    2017100007    24
2    31    1    2    1    6    1    9    1.072376993    5    2017100005    25
2    32    6    2    3    1    2    9    0.458895125    6    2017100006    33
2    35    6    2    2    3    1    9    1.478014368    1    2017100001    44
1    36    2    1    2    6    1    9    0.43331561    15    2017100015    45
1    37    6    2    1    2    2    9    1.008920356    8    2017100008    53
3    39    1    1    2    2    1    9    0.293730707    1    2017100001    61
1    41    2    1    2    2    2    9    1.298432642    10    2017100010    61
1    41    1    1    2    1    2    9    1.511035436    9    2017100009    80
2    45    2    2    2    1    2    9    0.398746139    1    2017100001    81
1    60    1    1    2    6    1    9    0.553875541    14    2017100014    87
1    60    1    1    1    2    2    9    0.280035342    9    2017100009    91
2    61    2    1    2    2    2    9    2.464917433    5    2017100005    92
1    67    6    1    1    6    1    9    1.056964223    19    2017100019    96


end
label values identity WP22091
label def WP22091 1 "Being a part of the city or area where you live", modify
label def WP22091 2 "Being a part of this country", modify
label def WP22091 3 "Being a part of the world", modify
label values age WP1220
label values empl EMP_2010
label def EMP_2010 1 "Employed full time for an employer", modify
label def EMP_2010 2 "Employed full time for self", modify
label def EMP_2010 5 "Employed part time want full time", modify
label def EMP_2010 6 "Out of workforce", modify
label values sex WP1219
label def WP1219 1 "Male", modify
label def WP1219 2 "Female", modify
label values marstat WP1223
label def WP1223 1 "Single/Never been married", modify
label def WP1223 2 "Married", modify
label def WP1223 3 "Separated", modify
label def WP1223 4 "Divorced", modify
label values urban WP14
label def WP14 1 "A rural area or on a farm", modify
label def WP14 2 "A small town or village", modify
label def WP14 3 "A large city", modify
label def WP14 6 "A suburb of a large city", modify
label values educ WP3117
label def WP3117 1 "Completed elementary education or less (up to 8 years of basic education)", modify
label def WP3117 2 "Secondary - 3 year TertiarySecondary education and some education beyond secondary education (9-15 years of educatio", modify
label def WP3117 3 "Completed four years of education beyond high school and/or received a 4-year college degree.", modify
label values country country
label def country 9 "Andorra", modify

I am treating my data as two-stage: a country gets selected first with a constant probability of 1 (this is a simplifying assumption, since countries were not drawn randomly) and then each response has a final design weight (variable WGT). I also have the strata and PSU id's. So far, my survey design declaration is

Code:

svyset [pweight = wgt], strata(WP12258A) psu(WP12259) singleunit(certainty)

The only way to fit a multinomial logit model with random effects to survey data in Stata 16 that I found seems to be through gsem. I am basing it on https://www.stata.com/stata-news/news29-2/xtmlogit/.

This is the model I try to run:

Code:

. svy: gsem (2.identity <- age i.empl i.sex i.marstat i.urban i.educ R1[country]) (3.identity
>  <- age i.empl i.sex i.marstat i.urban i.educ R2[country]), mlogit
(running gsem on estimation sample)
survey final weights not allowed with multilevel models;
    a final weight variable was svyset using the [pw=exp] syntax, but multilevel models
    require that each stage-level weight variable is svyset using the stage's corresponding
    weight() option
an error occurred when svy executed gsem

The error message is asking me to specify stage-level weights, which I think in my case are WGT at first stage (individuals) and 1 at second stage (countries). I consulted the Stata manual and did much googling but couldn't figure out how to enter the stage-specific weights, especially a constant 1, into svyset. Does anyone have any ideas?

Last edited by Rouslan Karimov; 12 Apr 2023, 17:34.

Tags: None

Rouslan Karimov

Join Date: Apr 2023

Posts: 8
#2

14 Apr 2023, 11:10

After additional research, it appears the answer can be found here: https://www.statalist.org/forums/for...61#post1291461.

For my specific case, the svyset becomes

Code:

. gen countrywgt = 1 . svyset WP12259, weight(wgt) strata(WP12258A) singleunit(certainty) || country, weight(countrywgt)
Comment

Announcement

Specifying stage-specific weights for a multilevel model using survey data

Comment