Dear all,
I am reposting my previous unanswered query with more information in the hope I might get a response this time.
My question essentially boils down to how to set -svyset- in order to run a 2-level logit model taking account of complex survey design (psu, strata, and weights).
To give you a bit of background, I have gone through a variety of material including the Stata Manuals (here & here) as well as the book entitled 'Multilevel and Longitudinal Modelling using Stata' (2005 ed.) by Sophia Rabe‐Hesketh, but I am looking for someone to confirm (or not) my understanding. I tried to engage with another piece (Rabe‐Hesketh, S., and Skrondal, A. (2006). Multilevel modeling of complex survey data) but wasn't able to fully digest it. Particularly when the discussion around needing to have weights at both levels was introduced. As I mentioned, I have also tried to get assistance from Stata users via the forum a couple of months ago but have unfortunately not had much luck.
Put succinctly, I am trying to run a 2-level logistic regression taking into account complex survey design, but I'm not quite sure if my Stata code is correct. I have chosen to run a 2-level logistic regression (i.e. using -melogit- because I cannot use -svyset- with -xtlogit-). I am using Stata 15.1.
My data is 2-level in that it reflects multiple observations over time (2009-2018) nested within an individual (identified by the pidp variable in the code below). My dependent variable, empstatus, is coded as 1=employed and 0=unemployed. My dataset comprises of approximately 160,000 observations.
As stated above, my question essentially boils down to how to set -svyset- in order to run a 2-level logit model taking account of complex survey design (psu, strata, and weights). I think I have understood how to apply weights in a multilevel logit model, notably through the following code,
However, unless I am mistaken this code doesn't take into account psu and stratification (identified through my 'strata' and 'psu' variables in the code below) which is what I would like to do. Please note the values for the 'strata', 'psu' an 'weights' variables are provided by the survey provider when I download the data so I don't actually calculate them.
From all the code I have played around with, Model 1 below seems to make the most sense to me (but as I said I'm not 100% certain it is indeed doing what I want).
Model 1:
However, I have also played around and found other possible ways I could set -svyset-, for example:
Model 2: SAME AS MODEL 1 BUT 'psu' REPLACED BY 'pidp' IN THE SVYSET COMMAND
Model 3: THIS MODEL SETS -SVYSET- FOLLOWING THE METHOD IN THE STATA MANUAL (link 1 in this email p.107) BY CREATING A NEW WEIGHT 'pst1s1'
Formula used to create 'pst1s1':
Please note: when I create the 'new weight', pst1s1, it ends up having a value of 1 or 0.999999 for all observations.
Model 4: THIS MODEL SETS -SVYSET- FOLLOWING THE METHOD IN THE STATA MANUAL (link 1 in this email p.107) BY CREATING A NEW WEIGHT 'pst1s1' (see above) BUT DIFFERENT TO MODEL 3, I INCLUDE 'strata' IN THE -SVYSET- COMMAND
Model 5: THIS MODEL IS THAT SAME AS MODEL 4 BUT THE ONLY DIFFERENCE IS THAT 'pidp' IS REPLACED BY 'psu' IN THE CODE FOR -SVYSET-
Please note: Because the 'new weight', pst1s1, has a value of 1 or 0.999999 for all observations, the odd ratios and standard errors from Model 5 are identical to those in Model 1.
Model 6: AN ALTERNATIVE METHOD WIHTOUT USING SVYSET BUT ADDING ONLY WEIGHTS
All the above commands run with no error in Stata 15.1, and all models (bar model 6) yield the same odds ratio coefficients with slight differences in the standard errors. That said, I'm not sure which is the correct command to execute, and I'm keen to understand what exactly I am doing rather than simply 'push buttons' in Stata.
Therefore, I would be very grateful if you could advise which of the above models is the correct one to execute if I want to run a 2-level logit regression taking into account complex survey design (i.e. strata, psu, & weights).
Thank you in advance, and have a pleasant day.
Samir Sweida-Metwally
I am reposting my previous unanswered query with more information in the hope I might get a response this time.
My question essentially boils down to how to set -svyset- in order to run a 2-level logit model taking account of complex survey design (psu, strata, and weights).
To give you a bit of background, I have gone through a variety of material including the Stata Manuals (here & here) as well as the book entitled 'Multilevel and Longitudinal Modelling using Stata' (2005 ed.) by Sophia Rabe‐Hesketh, but I am looking for someone to confirm (or not) my understanding. I tried to engage with another piece (Rabe‐Hesketh, S., and Skrondal, A. (2006). Multilevel modeling of complex survey data) but wasn't able to fully digest it. Particularly when the discussion around needing to have weights at both levels was introduced. As I mentioned, I have also tried to get assistance from Stata users via the forum a couple of months ago but have unfortunately not had much luck.
Put succinctly, I am trying to run a 2-level logistic regression taking into account complex survey design, but I'm not quite sure if my Stata code is correct. I have chosen to run a 2-level logistic regression (i.e. using -melogit- because I cannot use -svyset- with -xtlogit-). I am using Stata 15.1.
My data is 2-level in that it reflects multiple observations over time (2009-2018) nested within an individual (identified by the pidp variable in the code below). My dependent variable, empstatus, is coded as 1=employed and 0=unemployed. My dataset comprises of approximately 160,000 observations.
As stated above, my question essentially boils down to how to set -svyset- in order to run a 2-level logit model taking account of complex survey design (psu, strata, and weights). I think I have understood how to apply weights in a multilevel logit model, notably through the following code,
Code:
melogit empstatus i.gender age i.race[pweight=weight] || pidp: , allbase
From all the code I have played around with, Model 1 below seems to make the most sense to me (but as I said I'm not 100% certain it is indeed doing what I want).
Model 1:
Code:
svyset, clear svyset psu, strata(strata) weight(weight) singleunit(scaled) svy: melogit y x1 x2 x3 ||pidp: , or
Model 2: SAME AS MODEL 1 BUT 'psu' REPLACED BY 'pidp' IN THE SVYSET COMMAND
Code:
svyset, clear svyset pidp, strata(strata) weight(weight) singleunit(scaled) svy: melogit y x1 x2 x3 ||pidp: , or
Code:
svyset, clear svyset pidp, weight(weight) singleunit(scaled) || _n, weight(pst1s1) svy: melogity x1 x2 x3 ||pidp: , or
Code:
sort pidp generate sqw = weight * weight by pidp: egen sumw = sum(weight) by pidp: egen sumsqw = sum(sqw) generate pst1s1 = weight*sumw/sumsqw
Model 4: THIS MODEL SETS -SVYSET- FOLLOWING THE METHOD IN THE STATA MANUAL (link 1 in this email p.107) BY CREATING A NEW WEIGHT 'pst1s1' (see above) BUT DIFFERENT TO MODEL 3, I INCLUDE 'strata' IN THE -SVYSET- COMMAND
Code:
svyset, clear svyset pidp, strata(strata) weight(weight) singleunit(scaled) || _n, weight(pst1s1) svy: melogit y x1 x2 x3 ||pidp: , or
Code:
svyset, clear svyset psu, strata(strata) weight(weight) singleunit(scaled) || _n, weight(pst1s1) svy: melogit y x1 x2 x3 || pidp: , or
Model 6: AN ALTERNATIVE METHOD WIHTOUT USING SVYSET BUT ADDING ONLY WEIGHTS
Code:
melogit y x1 x2 x3 [pweight=weight] || pidp: , or
All the above commands run with no error in Stata 15.1, and all models (bar model 6) yield the same odds ratio coefficients with slight differences in the standard errors. That said, I'm not sure which is the correct command to execute, and I'm keen to understand what exactly I am doing rather than simply 'push buttons' in Stata.
Therefore, I would be very grateful if you could advise which of the above models is the correct one to execute if I want to run a 2-level logit regression taking into account complex survey design (i.e. strata, psu, & weights).
Thank you in advance, and have a pleasant day.
Samir Sweida-Metwally
Comment