Dear Statalisters:
I am facing the following issue.
I am using data from a multi-stage survey conducted by the Global Entrepreneurship Monitor (GEM), which describes the data in the following terms: “Our Adult Population Survey (APS) looks at the characteristics, motivations and ambitions of individuals starting businesses, as well as social attitudes towards entrepreneurship”.
The GEM collects nationally representative samples of the adult population on a number of countries. To the best of my understanding, the countries are not randomly selected. Instead, the inclusion of certain countries is an administrative decision and/or the result of such past decisions. Within country, further stratified and clustered sampling is done. In the final stage, a random sample of individuals within country, within strata, within clusters is selected.
However, when GEM reports the data, no information regarding the specifics is provided. Instead, the GEM provides data on the individuals surveyed (a unique ID for the individual) in all countries surveyed (along with a country ID) and the final weights for each individual. The method for sample design weights is described here: http://gem-consortium.ns-client.xyz/wiki/1175
My situation is the following. I want to estimate a random effect model (for country) using Stata’s “svy:” prefix for the data for any given year.
The data would look like this:
Specifically, I am not sure how to proceed with the “svyset” command.
I have tried the following (where "R[country_ID"is a latent variable that stands for the country “random effect”):
then:
(Side note: I have my reasons, which are not directly related to the issue at hand, for wanting to use gsem)
However, I get an error message:
I have scoured the web (and Statalist in particular) searching for a solution. A couple of people have posted a solution that seems reasonable. For example:
This comes from:
https://www.statalist.org/forums/for...-data-question
https://www.statalist.org/forums/for...-in-stata-13-1
I find it strange to setup “country_ID” as the Stage 1 PSU because as explained above, no sampling occurred at this stage. Countries were selected a priori for other reasons. That is why I thought it made sense to treat them as “strata”. I can convince myself that the last svyset is correct by consideirng the meaning of “country_weights = 1”. If the weights are inversely proportional to the probability of being selected, then a weight of 1 implies a selection probability of 1, which is precisely the case when the countries were selected a priori. Can someone please shed some light as to which of these is the correct “svyset” command?
Thank you in advance.
I am facing the following issue.
I am using data from a multi-stage survey conducted by the Global Entrepreneurship Monitor (GEM), which describes the data in the following terms: “Our Adult Population Survey (APS) looks at the characteristics, motivations and ambitions of individuals starting businesses, as well as social attitudes towards entrepreneurship”.
The GEM collects nationally representative samples of the adult population on a number of countries. To the best of my understanding, the countries are not randomly selected. Instead, the inclusion of certain countries is an administrative decision and/or the result of such past decisions. Within country, further stratified and clustered sampling is done. In the final stage, a random sample of individuals within country, within strata, within clusters is selected.
However, when GEM reports the data, no information regarding the specifics is provided. Instead, the GEM provides data on the individuals surveyed (a unique ID for the individual) in all countries surveyed (along with a country ID) and the final weights for each individual. The method for sample design weights is described here: http://gem-consortium.ns-client.xyz/wiki/1175
My situation is the following. I want to estimate a random effect model (for country) using Stata’s “svy:” prefix for the data for any given year.
The data would look like this:
country_ID | year | respondent_ID | weight_a | var1 | var2 | var3 |
Netherlands | 2011 | 1 | 0.784929 | Retired, | 57 | 27 |
Netherlands | 2011 | 2 | 1.081878 | Retired, | 100 | 81 |
Netherlands | 2011 | 3 | 1.081878 | Retired, | 28 | 92 |
Netherlands | 2011 | 4 | 1.081878 | Not work | 37 | 6 |
Belgium | 2011 | 5 | 0.75417 | Full: fu | 73 | 58 |
Belgium | 2011 | 6 | 0.75417 | 76 | 72 | |
Belgium | 2011 | 7 | 0.75417 | Full: fu | 92 | 14 |
Belgium | 2011 | 8 | 0.75417 | Full: fu | 22 | 92 |
France | 2011 | 9 | 0.939495 | Full: fu | 53 | 96 |
France | 2011 | 10 | 0.909229 | Homemake | 90 | 66 |
France | 2011 | 11 | 1.021805 | Retired, | 1 | 82 |
France | 2011 | 12 | 1.058208 | Full: fu | 13 | 19 |
France | 2011 | 13 | 0.815568 | Retired, | 59 | 83 |
France | 2011 | 14 | 1.001615 | Retired, | 20 | 60 |
I have tried the following (where "R[country_ID"is a latent variable that stands for the country “random effect”):
Code:
svyset respondent_ID [pweight= weight_a ], strata(country_ID)
Code:
svy: gsem (var2 <- var1 var2 R[country_ID], family(gaussian) link(identity))
However, I get an error message:
“survey final weights not allowed with multilevel models; a final weight variable was svyset using the [pw=exp] syntax, but multilevel models require that each stage-level weight variable is svyset using the stage's corresponding weight() option”
Code:
gen country_weights = 1 svyset country_ID, weight(country_weights) || respondent_ID, weight(weight_a)
https://www.statalist.org/forums/for...-data-question
https://www.statalist.org/forums/for...-in-stata-13-1
I find it strange to setup “country_ID” as the Stage 1 PSU because as explained above, no sampling occurred at this stage. Countries were selected a priori for other reasons. That is why I thought it made sense to treat them as “strata”. I can convince myself that the last svyset is correct by consideirng the meaning of “country_weights = 1”. If the weights are inversely proportional to the probability of being selected, then a weight of 1 implies a selection probability of 1, which is precisely the case when the countries were selected a priori. Can someone please shed some light as to which of these is the correct “svyset” command?
Thank you in advance.
Comment