Dear Statalist
I have an issue which I expect to be really simple, I just can't find a solution for it.
I'm doing cross-country analysis of welfare attitudes in Europe via multiple linear regression. I'm trying to determine the extent to which one country's population is unique in its attitudes towards the welfare state on the basis of a few statements that approximately 1500 respondents in 20 countries have answered. I treat that one country as a base level reference category, and then compare the point estimates of all of the other countries with it. My challenge arises when I want to control for gini coefficient. GINI coefficient is country specific, not person specific, which means that it perfectly predicts each level of reference when making the regression, which forces STATA to omit it.
How can I solve this? I know that I can compare the means of countries on the relevant variables with their corresponding GINI coefficient, but then I won't be able to make the other analyses.
Here is a MWE:
I've included 4 variables in the example above:
Country - country code, ranging from 1-3. Note that there are several observations with the same code, meaning that the data is on an individual level, not at the country level
value_variable - a hypothetical variable representing values from 1-10 in a questionnaire. This is the dependent variable in the regression.
age - A hypothetical background variable that is used as a background contorl
gini - the GINI coefficient of the country in question.
The regression analysys looks like this, when applied to the code above:
Now, the results omit the gini variable, because it perfectly predicts each country. This is because gini is country specific, not person specific.
How, my dear friends, can I control for gini coefficient in data that is vastly larger than the MWE posted above? I have about 55000 observations.
Thank you very much for any help at all.
Kasper
I have an issue which I expect to be really simple, I just can't find a solution for it.
I'm doing cross-country analysis of welfare attitudes in Europe via multiple linear regression. I'm trying to determine the extent to which one country's population is unique in its attitudes towards the welfare state on the basis of a few statements that approximately 1500 respondents in 20 countries have answered. I treat that one country as a base level reference category, and then compare the point estimates of all of the other countries with it. My challenge arises when I want to control for gini coefficient. GINI coefficient is country specific, not person specific, which means that it perfectly predicts each level of reference when making the regression, which forces STATA to omit it.
How can I solve this? I know that I can compare the means of countries on the relevant variables with their corresponding GINI coefficient, but then I won't be able to make the other analyses.
Here is a MWE:
Code:
clear set obs 10 gen country=. replace country = 1 in 1/3 replace country = 2 in 4/6 replace country = 3 in 7/10 gen value_variable =. replace value_variable = 5 in 1 replace value_variable = 4 in 2 replace value_variable = 6 in 3 replace value_variable = 6 in 4 replace value_variable = 5 in 5 replace value_variable = 7 in 6 replace value_variable = 8 in 7 replace value_variable = 8 in 8 replace value_variable = 7 in 9 replace value_variable = 9 in 10 gen age=. gen gini=. replace gini = 32.1 if country==1 replace gini = 27.3 if country==2 replace gini = 40.1 if country==3 replace age = 54 in 1 replace age = 34 in 2 replace age = 22 in 3 replace age = 34 in 4 replace age = 65 in 5 replace age = 67 in 6 replace age = 43 in 7 replace age = 54 in 8 replace age = 12 in 9 replace age = 34 in 10
Country - country code, ranging from 1-3. Note that there are several observations with the same code, meaning that the data is on an individual level, not at the country level
value_variable - a hypothetical variable representing values from 1-10 in a questionnaire. This is the dependent variable in the regression.
age - A hypothetical background variable that is used as a background contorl
gini - the GINI coefficient of the country in question.
The regression analysys looks like this, when applied to the code above:
Code:
reg value_variable age ib1.country gini
Now, the results omit the gini variable, because it perfectly predicts each country. This is because gini is country specific, not person specific.
How, my dear friends, can I control for gini coefficient in data that is vastly larger than the MWE posted above? I have about 55000 observations.
Thank you very much for any help at all.
Kasper
Comment