Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating multiple Gini coefficients - Error: "Too many values"

    Hello everyone,
    I have to calculate many gini coefficients for income (here: PrimaryIncome_E) on regional level with Stata 15. My data set consits of 429 regions (here: GeographicRegion_E) with 3000 to 5000 oberservations per region. Additionately it covers a time range from 01.2014 to 06.2017 with quarterly Oberservations so that I have 14 points in time (here: PoolCutoffDate). I want to calculate ginis with a loop for every region at every point in time. So that I have 14*429 = 6.006 ginis in the ende. I used the tool "ineqdec0" from Prof. Jenkins for the calculations and build up the loop with the help of previous posts of the forum.
    However, I get the error massage "no observations r(2000)" if I perform the whole loop or that there are "too many values" without the loop. If I run the same code only for the regions, ignoring the time. The calculation of the 429 regions takes some time but in the end it all went well and I get a gini coefficent for each regions.
    From my point of view stata seems to be unable to deal with so much data in the memory. Does anyone has an Idea how to deal with this issue?

    Thank you very much for answer.

    With kind regards,

    Peter

    Code:
    gen gini = .
    egen group = group(GeographicRegion_E PoolCutoffDate)
    su group, meanonly
    forval i = 1/`r(max)' {
        ineqdec0 PrimaryIncome_E if group == `i'
          replace gini = r(gini) if group == `i'
    }

  • #2
    I do not know how you got a "too many values" error but the r(2000) error indicates that there are no observations in at least one of the groups. Rather than trying to fix your code, I'll suggest that you use rangerun (from SSC) to do the job.

    With rangerun, you generate results for each observation in the data in memory by running a user-supplied program on a subset of the observations (in this case all observations within the same group as the current observation). Your program will generate the same results for each observation in the group so there is no need to compute the results for more than one observation per group. You can do this by creating a valid interval for only one observation per group. For that observation, the lower and upper bound is the group number. That will select all observations in the data with the same value for group as the current observation. For the other observations in the group, use an upper bound of -1. Since -1 is lower than any value stored in the group variable, this creates an invalid interval for all these repeat observations in the group. When rangerun encounters an invalid interval, it simply skips running the user's program and moves on to the next observation.

    The following is a quick example with fake data that has 429 regions, each with 3000 to 5000 observations. Within each region, there are 14 cutoff dates. This should approximate the structure of your data. Just to show how the r(2000) error can occur, I replace the income variable with missing values in group 3. rangerun computes the results for all 6006 groups in about 25 seconds on my computer. The 3 individual spot checks at the end take 16 seconds to run!

    Code:
    * create fake data
    clear all
    set obs 429
    gen GeographicRegion_E = _n
    expand runiformint(3000,5000)
    bysort GeographicRegion_E: gen PoolCutoffDate = runiformint(1,14)
    bysort GeographicRegion_E PoolCutoffDate: gen long obs = _n
    gen PrimaryIncome_E = runiform()
    egen group = group(GeographicRegion_E PoolCutoffDate)
    
    * make group 3 all missing
    replace PrimaryIncome_E = . if group == 3
    
    * define program to calculate gini with only obs from a specific group
    program mygini
        ineqdec0 PrimaryIncome_E
        gen gini = r(gini)
    end
    
    * define a valid upper interval bound only for the first obs in the group
    bysort group (obs): gen high = cond(_n == 1, group, -1)
    
    rangerun mygini, interval(group group high)
    
    * list rangerun results for the first 3 groups
    list if inlist(group,1,2,3) & obs == 1
    
    * spot check for the first 3 groups
    ineqdec0 PrimaryIncome_E if group == 1
    ineqdec0 PrimaryIncome_E if group == 2
    ineqdec0 PrimaryIncome_E if group == 3
    and the results
    Code:
    . list if inlist(group,1,2,3) & obs == 1
    
             +----------------------------------------------------------------+
             | Geogra~E   PoolCu~e   obs   Primar~E   group   high       gini |
             |----------------------------------------------------------------|
          1. |        1          1     1     .85594       1      1   .3315018 |
        292. |        1          2     1   .0520792       2      2   .3036681 |
        599. |        1          3     1          .       3      3          . |
             +----------------------------------------------------------------+
    
    . 
    . * spot check for the first 3 groups
    . ineqdec0 PrimaryIncome_E if group == 1
     
    Percentile ratios
    
    ----------------------------------------------------------
      All obs |    p90/p10     p90/p50     p10/p50     p75/p25
    ----------+-----------------------------------------------
              |      9.892       1.837       0.186       2.737
    ----------------------------------------------------------
      
    Generalized Entropy index GE(2), and Gini coefficient
    
    ----------------------------------
      All obs |      GE(2)        Gini
    ----------+-----------------------
              |    0.16504     0.33150
    ----------------------------------
    
    . ineqdec0 PrimaryIncome_E if group == 2
     
    Percentile ratios
    
    ----------------------------------------------------------
      All obs |    p90/p10     p90/p50     p10/p50     p75/p25
    ----------+-----------------------------------------------
              |      6.591       1.659       0.252       2.616
    ----------------------------------------------------------
      
    Generalized Entropy index GE(2), and Gini coefficient
    
    ----------------------------------
      All obs |      GE(2)        Gini
    ----------+-----------------------
              |    0.13861     0.30367
    ----------------------------------
    
    . ineqdec0 PrimaryIncome_E if group == 3
    no observations
    r(2000);

    Comment

    Working...
    X