Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Controlling for GINI-coefficient on person data across countries

    Dear Statalist

    I have an issue which I expect to be really simple, I just can't find a solution for it.

    I'm doing cross-country analysis of welfare attitudes in Europe via multiple linear regression. I'm trying to determine the extent to which one country's population is unique in its attitudes towards the welfare state on the basis of a few statements that approximately 1500 respondents in 20 countries have answered. I treat that one country as a base level reference category, and then compare the point estimates of all of the other countries with it. My challenge arises when I want to control for gini coefficient. GINI coefficient is country specific, not person specific, which means that it perfectly predicts each level of reference when making the regression, which forces STATA to omit it.

    How can I solve this? I know that I can compare the means of countries on the relevant variables with their corresponding GINI coefficient, but then I won't be able to make the other analyses.

    Here is a MWE:

    Code:
    clear
    
    set obs 10
    gen country=.
    replace country = 1 in 1/3
    replace country = 2 in 4/6
    replace country = 3 in 7/10
    gen value_variable =.
    replace value_variable = 5 in 1
    replace value_variable = 4 in 2
    replace value_variable = 6 in 3
    replace value_variable = 6 in 4
    replace value_variable = 5 in 5
    replace value_variable = 7 in 6
    replace value_variable = 8 in 7
    replace value_variable = 8 in 8
    replace value_variable = 7 in 9
    replace value_variable = 9 in 10
    gen age=.
    gen gini=.
    replace gini = 32.1 if country==1
    replace gini = 27.3 if country==2
    replace gini = 40.1 if country==3
    replace age = 54 in 1
    replace age = 34 in 2
    replace age = 22 in 3
    replace age = 34 in 4
    replace age = 65 in 5
    replace age = 67 in 6
    replace age = 43 in 7
    replace age = 54 in 8
    replace age = 12 in 9
    replace age = 34 in 10
    I've included 4 variables in the example above:

    Country - country code, ranging from 1-3. Note that there are several observations with the same code, meaning that the data is on an individual level, not at the country level
    value_variable - a hypothetical variable representing values from 1-10 in a questionnaire. This is the dependent variable in the regression.
    age - A hypothetical background variable that is used as a background contorl
    gini - the GINI coefficient of the country in question.

    The regression analysys looks like this, when applied to the code above:
    Code:
    reg value_variable age ib1.country gini

    Now, the results omit the gini variable, because it perfectly predicts each country. This is because gini is country specific, not person specific.

    How, my dear friends, can I control for gini coefficient in data that is vastly larger than the MWE posted above? I have about 55000 observations.


    Thank you very much for any help at all.

    Kasper

    Last edited by Kasper Nielsen Denmark; 21 Jan 2018, 00:01. Reason: grammar

  • #2
    Hi, Kasper,

    As far as I can see, the variables `country' and `gini' are always perfectly collinear. So, please remove `country' from your regression in order to obtain an estimate of `gini'.


    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

    Comment


    • #3
      Hi River, thank you very much for your reply.

      I know that I can obtain an estimate of gini by excluding the country variable. The challenge is that I wish to compare a specific country, for example country '1' in the code above, by using it as a base reference for all of the other countries. This is why I specify ib7.country in the regression.

      The challenge is to try and control for gini, which is a country specific variable, when analyzing data that are on an individual level. So far, the only solution I've come up with is to divide the values of GINI into ranges, making it an ordinal variable instead of a continous one. I still wonder though, if there is another way to do it.

      Thank you very much for replying!

      Kasper

      Comment


      • #4
        This is a variant on the standard fixed effect issue that with fixed effects (your country dummies) you cannot estimate any variables that don't vary within panels (countries). There are some alternatives - random effects, Mundalk estimator, xthtaylor come to mind.

        Comment


        • #5
          Hi Kasper, I see your point. Phil suggested a way to get around this problem. Please also see http://www.stata-journal.com/article...article=st0283.
          Ho-Chuan (River) Huang
          Stata 17.0, MP(4)

          Comment

          Working...
          X