Dear Stata Forum,
I have a question regarding the choice of the right model for the analysis of survey data on the individual level where the dependent variable is ordinal and the independent variables are related to the clusters of sampling.
Objective: I want to analyze whether the distance to a lithium mine increases the likelihood to protest. In the first step I want to perform the analysis for Chile, in the second step include other countries.
Data and variables: I have survey data on the individual level (LatinobarĂ³metro, 1200 observations per country) for one year. I don't have detailed information about the data sampling. In the codebook its written that the data is based on a three-stage probability sample and variables are included with information about the region, city and commune of the respondents. Furthermore, the data includes a sampling weight for every individual.
My dependent variable is ordinal (10-point Likert-Scale). The main independent variables are 1) the distance to the extraction projects (binary variable, within a specific radius, same value on the city level) and 2) individual evaluation of the water supply (5-point Likert Scale, different on individual level but can be related to the city). Control variables are on the individual level (sex, age, etc.), village level (size/population) and country level (for the second step of the analysis, GNI, GDP etc.).
Model: Since my independent variable is ordinal, I would choose an ordinal logistic regression model. The Brant test revealed that the proportional odds assumption was violated for 4 variables, so I think I have to use the partial proportional odds model (gologit2).
Question: I am now wondering if it is enough to perform the gologit2 with clustered standard errors on the city level* and fixed effects on the country level (for the second analysis, where control variables on country level are included) or whether I have to use a multilevel model since the data is based on a three-stage sampling and the control variables are on the individual as well country level. I tried to think about the factors that were named in this post regarding a similar question but I am still unsure. I would be very grateful, if you could give me your opinion on this case.
Thanks in advance for your help!
My idea how the simplyfied codes in stata for gologit2 models with clustered standard errors could look like:
svyset city_var [pw=w]
[this code first to specify the sampling design variables and weights]
- First step: only Chile
gologit2 y x1 x2 x3, gsvy cl(city_var) pl(x2 x3)
- Second step: more countries:
gologit2 y x1 x2 x3, i.country_var, gsvy cl(city_var) pl(x2 x3)
*I have population data on the city level and calculated the distances to the extraction sides from the cities, so I would choose this as the level for the clustering of the standard errors although I technically have information on the level below, the communes. But the information on the commune level is just a number, and I can't find out what the name of the commune is.
I have a question regarding the choice of the right model for the analysis of survey data on the individual level where the dependent variable is ordinal and the independent variables are related to the clusters of sampling.
Objective: I want to analyze whether the distance to a lithium mine increases the likelihood to protest. In the first step I want to perform the analysis for Chile, in the second step include other countries.
Data and variables: I have survey data on the individual level (LatinobarĂ³metro, 1200 observations per country) for one year. I don't have detailed information about the data sampling. In the codebook its written that the data is based on a three-stage probability sample and variables are included with information about the region, city and commune of the respondents. Furthermore, the data includes a sampling weight for every individual.
My dependent variable is ordinal (10-point Likert-Scale). The main independent variables are 1) the distance to the extraction projects (binary variable, within a specific radius, same value on the city level) and 2) individual evaluation of the water supply (5-point Likert Scale, different on individual level but can be related to the city). Control variables are on the individual level (sex, age, etc.), village level (size/population) and country level (for the second step of the analysis, GNI, GDP etc.).
Model: Since my independent variable is ordinal, I would choose an ordinal logistic regression model. The Brant test revealed that the proportional odds assumption was violated for 4 variables, so I think I have to use the partial proportional odds model (gologit2).
Question: I am now wondering if it is enough to perform the gologit2 with clustered standard errors on the city level* and fixed effects on the country level (for the second analysis, where control variables on country level are included) or whether I have to use a multilevel model since the data is based on a three-stage sampling and the control variables are on the individual as well country level. I tried to think about the factors that were named in this post regarding a similar question but I am still unsure. I would be very grateful, if you could give me your opinion on this case.
Thanks in advance for your help!
My idea how the simplyfied codes in stata for gologit2 models with clustered standard errors could look like:
svyset city_var [pw=w]
[this code first to specify the sampling design variables and weights]
- First step: only Chile
gologit2 y x1 x2 x3, gsvy cl(city_var) pl(x2 x3)
- Second step: more countries:
gologit2 y x1 x2 x3, i.country_var, gsvy cl(city_var) pl(x2 x3)
*I have population data on the city level and calculated the distances to the extraction sides from the cities, so I would choose this as the level for the clustering of the standard errors although I technically have information on the level below, the communes. But the information on the commune level is just a number, and I can't find out what the name of the commune is.