Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested loop over year and group to generate a function of the conditional group mean

    Hello,

    I have a household panel data set containing individual data for 6 years. For each year, ca. 15000 individuals are divided into 50 groups. I also have a variable, say "income", that is a decimal for every individual.

    Now I want to calculate for each year and group the sum of the differences between an individual i's income and the income of all other individuals in his group if their income is higher (respectively lower) than the income of individual i and divide this value by 'the number of individuals minus 1' in the group.

    I tried various combinations of foreach and forvalues loops without success. The group sizes differ every year and ca. 1% of observations of "income" are missing in every group

    I am grateful for any idea of how to implement a loop to generate the values for "higher" and "lower" income as described.

    Thanks and best wishes,
    Johannes


  • #2
    I'm not really sure I understand what you want, but it seems as if you want to calculate, for each individual in each year, the deviation between the individual's income and the mean income of all others in that individual's group. If so, you don't need any loops at all to do this.

    Code:
    egen group_total_income = total(income), by(group year)
    egen group_size = count(income), by(group year)
    gen others_total_income = group_total_income - income
    gen others_mean_income = others_total_income/(group_size-1)
    gen deviation = income - others_mean_income
    Note: if any of the groups contain only a single person in a given year, what you seek to calculate is undefined, and the above code will create a missing value for that person. that year

    Comment


    • #3
      Thank you very much for your helpful reply Clyde!
      I think my problem is almost solved. I am looking for a way to adjust your code so that in the calculation of "others_total_income" only the incomes higher than individual i's income are considered.

      In other words, I want to calculate this function for every individual by group-year:

      lower = (1/(group_size)-1) * \sum_{j!=i} \max{x_j - x_i,o}

      I did not find a way to implement this in the code you suggested, but the code below should do the job if there is a way to adress the correct cell in variable "lower".

      Code:
      gen lower=0
      egen group_size = count(income), by(group, year)
      local total_count = 0
      
      foreach y of varlist year {
          foreach g of varlist group {
              quietly sum group_size if group==`g' & year==`y'
              forvalues i = 1(1)`r(max)' {
                  gen x`i'=(income - income[`i'])/`r(max)' if income>income[`i']
                  local count_var = `r(max)'
                  quietly sum x`i'
                  replace lower = r(sum) if lower[_n + `total_count']==lower[`i' + `total_count']
                  local total_count = `total_count'+`count_var'
                  local count_var = 0
                  drop x`i'
              }
          }
      }
      For instance, I tried to address the cell by
      Code:
       replace lower = r(sum) if lower[_n]==lower[`i']
      but this just rewrites the solutions into the first cells. It works for the first year and first group, but then it does not keep the count. Adding a local counter as above did not work because "lower[_n+`total_count']" was not calculated.


      Thank you very much for your help and happy holidays,
      Johannes
      Last edited by Johannes Eigner; 28 Dec 2014, 06:50.

      Comment


      • #4
        So I think you can do this as follows:
        Code:
        gsort group year -income
        by group year: gen total_higher_incomes = sum(income)
        will get you the sum of all incomes greater than or equal to that of the index person in that group that year. I think you can take it from there.

        Comment


        • #5
          Actually, it dawns on me that the code in my preceding post gives you the total of incomes higher than or equal to the index observation. To get only the total of incomes strictly higher:

          Code:
          gsort group year -income
          by group year: gen total_higher_or_equal_incomes = sum(income) // NOTE: gen, not egen; sum, not total
          by group year income, sort: egen count_equal_incomes = count(income) //NOTE: egen, not gen
          gen total_strictly_higher_incomes = total_higher_or_equal_incomes - income*count_equal_incomes
          Last edited by Clyde Schechter; 28 Dec 2014, 18:17.

          Comment


          • #6
            Thank you so much for your help Clyde!

            I worked it out with the help of your second post. I used "duplicates tag income, gen()" to find the equal incomes. With Excel it was easy to bring it in the form I need after that.
            Your last code does the job quick and neat and will save me a lot of work!

            Best wishes and happy new year!
            Johannes

            Comment

            Working...
            X