Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating and saving measures of inequality by groups

    I want to calculate some standard measures of inequality across different regions & store the results in a dataset for further analysis with one observation per region.
    statsby seems to be the obvious way to go. There are two user written programs for measuring inequality , ainequal & inequal7 . There are probably others.
    It seems like it should be straightforward but I cannot get either to work . I guess I'm doing something dumb.

    For example with ainequal (which stores the gini in r(gini_1) )
    sysuse auto
    . statsby gini = r(gini_1) , by(foreign) saving(myresults1) : ainequal price
    (running ainequal on estimation sample)
    type mismatch
    r(109);

    Likewise with inequal7 (which stores the gini as r(gini):
    statsby gini = r(gini) , by(foreign) saving(myresults2) : inequal7 price
    (running inequal7 on estimation sample)
    type mismatch
    r(109);

    Neither ainequal nor inequal7 is "by"able.
    Anyone know a solution to this, whether with statsby or otherwise?
    Thanks
    Kevin
    I use Stata 15.

  • #2
    I would check out ineqdeco (SSC) by Stephen Jenkins which seems closer to the state of the art.

    Comment


    • #3
      Nick exaggerates, but still ...

      ... check out -ineqdeco- and -ineqdec0- on SSC. (The latter includes zero and negative values for the outcome variable; the former works with positive values only.)

      [The programs Kevin cites are rather old and also may not calculate the Gini correctly (deal with tied values properly), or handle weights, if I recall correctly. ]

      Note the following:

      Code:
      . sysuse nlsw88, clear
      (NLSW, 1988 extract)
      
      . statsby gini = r(gini), by(race): ineqdeco wage
      (running ineqdeco on estimation sample)
      
            command:  ineqdeco wage
               gini:  r(gini)
                 by:  race
      
      Statsby groups
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
      ...
      
      . 
      end of do-file
      
      . list
      
           +------------------+
           |  race       gini |
           |------------------|
        1. | white   .3306955 |
        2. | black   .3295656 |
        3. | other   .3115918 |
           +------------------+
      But does Kevin actually need to save the subgroup Ginis to a separate dataset? They could be calculated and stored as local macros that could be called on as required:


      Code:
      . sysuse nlsw88, clear
      (NLSW, 1988 extract)
      
      . describe, short
      
      Contains data from C:\Program Files\Stata16\ado\base/n/nlsw88.dta
        obs:         2,246                          NLSW, 1988 extract
       vars:            17                          1 May 2018 22:52
      Sorted by: idcode
      
      . 
      . ineqdeco wage, by(race)
       
      Percentile ratios
      
      ----------------------------------------------------------
        All obs |    p90/p10     p90/p50     p10/p50     p75/p25
      ----------+-----------------------------------------------
                |      3.967       2.037       0.513       2.253
      ----------------------------------------------------------
        
      Generalized Entropy indices GE(a), where a = income difference
       sensitivity parameter, and Gini coefficient
      
      ----------------------------------------------------------------------
        All obs |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
      ----------+-----------------------------------------------------------
                |    0.19892     0.18123     0.19978     0.27444     0.33253
      ----------------------------------------------------------------------
         
      Atkinson indices, A(e), where e > 0 is the inequality aversion parameter
      
      ----------------------------------------------
        All obs |     A(0.5)        A(1)        A(2)
      ----------+-----------------------------------
                |    0.09062     0.16576     0.28461
      ----------------------------------------------
        
      Subgroup summary statistics, for each subgroup k = 1,...,K:
        
      
      -------------------------------------------------------------------------------------
           race |   Popn. share           Mean  Relative mean   Income share      log(mean)
      ----------+--------------------------------------------------------------------------
          white |       0.72885        8.08300        1.04069        0.75851        2.08976
          black |       0.25957        6.84456        0.88124        0.22875        1.92345
          other |       0.01158        8.55078        1.10092        0.01274        2.14602
      -------------------------------------------------------------------------------------
        
      Subgroup indices: GE_k(a) and Gini_k 
      
      ----------------------------------------------------------------------
           race |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
      ----------+-----------------------------------------------------------
          white |    0.19545     0.17909     0.19811     0.27123     0.33070
          black |    0.19335     0.17738     0.19622     0.27454     0.32957
          other |    0.22626     0.17528     0.16270     0.17845     0.31159
      ----------------------------------------------------------------------
        
      Within-group inequality, GE_W(a)
      
      ----------------------------------------------------------
        All obs |     GE(-1)       GE(0)       GE(1)       GE(2)
      ----------+-----------------------------------------------
                |    0.19621     0.17860     0.19722     0.27195
      ----------------------------------------------------------
                    
      Between-group inequality, GE_B(a):
      
      ----------------------------------------------------------
        All obs |     GE(-1)       GE(0)       GE(1)       GE(2)
      ----------+-----------------------------------------------
                |    0.00271     0.00263     0.00256     0.00249
      ----------------------------------------------------------
                    
      Subgroup Atkinson indices, A_k(e)
      
      ----------------------------------------------
           race |     A(0.5)        A(1)        A(2)
      ----------+-----------------------------------
          white |    0.08977     0.16397     0.28104
          black |    0.08885     0.16254     0.27886
          other |    0.08093     0.16078     0.31154
      ----------------------------------------------
        
      Within-group inequality, A_W(e)
      
      ----------------------------------------------
        All obs |     A(0.5)        A(1)        A(2)
      ----------+-----------------------------------
                |    0.08945     0.16360     0.28093
      ----------------------------------------------
       
      Between-group inequality, A_B(e)
      
      ----------------------------------------------
        All obs |     A(0.5)        A(1)        A(2)
      ----------+-----------------------------------
                |    0.00129     0.00258     0.00512
      ----------------------------------------------
      
      . return list // note: inequality indices stored for each group
      
      scalars:
                     r(ede2) =  5.556361943508938
                     r(ede1) =  6.479510699924358
                  r(edehalf) =  7.063117315757416
               r(between_a2) =  .0051237375531682
               r(between_a1) =  .0025810862267621
            r(between_ahalf) =  .0012881492710385
                r(within_a2) =  .2809302755005116
                r(within_a1) =  .1635997450888978
             r(within_ahalf) =  .0894458858602139
                     r(a2_3) =  .3115384469910878
                     r(a1_3) =  .1607819199562073
                  r(ahalf_3) =  .0809274094402956
                     r(a2_2) =  .2788603414859615
                     r(a1_2) =  .1625351905822754
                  r(ahalf_2) =  .088848127639054
                     r(a2_1) =  .281040237223204
                     r(a1_1) =  .1639681309461594
                  r(ahalf_1) =  .0897692801424109
              r(between_ge2) =  .0024928222080745
              r(between_ge1) =  .0025600077162691
              r(between_ge0) =  .0026324389107029
             r(between_gem1) =  .0027105009322837
               r(within_ge2) =  .2719458142999248
               r(within_ge1) =  .1972240126050694
               r(within_ge0) =  .1785999908377445
              r(within_gem1) =  .1962134637388422
                   r(sumw_3) =  26
                      r(v_3) =  .0115761353517364
                 r(lambda_3) =  1.100918933985472
                  r(theta_3) =  .0127443865911052
                 r(lgmean_3) =  2.146022653579506
                   r(mean_3) =  8.550781254584972
                   r(gini_3) =  .3115917651056431
                    r(ge2_3) =  .178445459813601
                    r(ge1_3) =  .1627015860067118
                    r(ge0_3) =  .1752846774480728
                   r(gem1_3) =  .2262569678942221
                   r(sumw_2) =  583
                      r(v_2) =  .2595725734639359
                 r(lambda_2) =  .8812414959279448
                  r(theta_2) =  .2287461229412252
                 r(lgmean_2) =  1.923453853077668
                   r(mean_2) =  6.844557788523352
                   r(gini_2) =  .3295655841130474
                    r(ge2_2) =  .2745420101707881
                    r(ge1_2) =  .1962204760088086
                    r(ge0_2) =  .1773760334434379
                   r(gem1_2) =  .1933469739138833
                   r(sumw_1) =  1637
                      r(v_1) =  .7288512911843277
                 r(lambda_1) =  1.040691701643483
                  r(theta_1) =  .7585094904676675
                 r(lgmean_1) =  2.089763017798653
                   r(mean_1) =  8.082999410320486
                   r(gini_1) =  .3306955523030369
                    r(ge2_1) =  .2712271343465578
                    r(ge1_1) =  .1981066940542421
                    r(ge0_1) =  .1790885463105681
                   r(gem1_1) =  .1954492113284315
                       r(a2) =  .2846146000512779
                       r(a1) =  .165758566366879
                    r(ahalf) =  .0906188154785861
                      r(ge2) =  .2744386365079999
                      r(ge1) =  .1997840203213393
                      r(ge0) =  .1812324297484487
                     r(gem1) =  .1989239646711261
                     r(gini) =  .3325258122691099
                   r(p75p50) =  1.530135539988736
                   r(p25p50) =  .6790615166417512
                   r(p10p50) =  .513468320887852
                   r(p90p50) =  2.037185006063599
                   r(p75p25) =  2.253309166385852
                   r(p90p10) =  3.967498915105508
                      r(p95) =  16.52978897094727
                      r(p90) =  12.77777481079102
                      r(p75) =  9.597423553466797
                      r(p50) =  6.272270202636719
                      r(p25) =  4.259257316589355
                      r(p10) =  3.220612049102783
                       r(p5) =  2.801002025604248
                      r(max) =  40.74658966064453
                      r(min) =  1.00495183467865
                        r(N) =  2246
                     r(sumw) =  2246
                       r(sd) =  5.755522859382768
                      r(Var) =  33.12604338487759
                     r(mean) =  7.76694903741006
      
      macros:
                   r(levels) : "1 2 3"
      
      .                         
      . ge rgini = .
      (2,246 missing values generated)
      
      . ge rmld = .
      (2,246 missing values generated)
      
      . levelsof race, local(race_cat)                  
      1 2 3
      
      . foreach r of local race_cat  {
        2.         ineqdeco wage if race == `r'
        3.         replace rgini = r(gini) if  race == `r'
        4.         replace rmld = r(ge0)  if  race == `r'
        5. }
       
      Percentile ratios
      
      ----------------------------------------------------------
        All obs |    p90/p10     p90/p50     p10/p50     p75/p25
      ----------+-----------------------------------------------
                |      3.929       2.007       0.511       2.160
      ----------------------------------------------------------
        
      Generalized Entropy indices GE(a), where a = income difference
       sensitivity parameter, and Gini coefficient
      
      ----------------------------------------------------------------------
        All obs |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
      ----------+-----------------------------------------------------------
                |    0.19545     0.17909     0.19811     0.27123     0.33070
      ----------------------------------------------------------------------
         
      Atkinson indices, A(e), where e > 0 is the inequality aversion parameter
      
      ----------------------------------------------
        All obs |     A(0.5)        A(1)        A(2)
      ----------+-----------------------------------
                |    0.08977     0.16397     0.28104
      ----------------------------------------------
      (1,637 real changes made)
      (1,637 real changes made)
       
      Percentile ratios
      
      ----------------------------------------------------------
        All obs |    p90/p10     p90/p50     p10/p50     p75/p25
      ----------+-----------------------------------------------
                |      4.025       2.136       0.531       2.237
      ----------------------------------------------------------
        
      Generalized Entropy indices GE(a), where a = income difference
       sensitivity parameter, and Gini coefficient
      
      ----------------------------------------------------------------------
        All obs |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
      ----------+-----------------------------------------------------------
                |    0.19335     0.17738     0.19622     0.27454     0.32957
      ----------------------------------------------------------------------
         
      Atkinson indices, A(e), where e > 0 is the inequality aversion parameter
      
      ----------------------------------------------
        All obs |     A(0.5)        A(1)        A(2)
      ----------+-----------------------------------
                |    0.08885     0.16254     0.27886
      ----------------------------------------------
      (583 real changes made)
      (583 real changes made)
       
      Percentile ratios
      
      ----------------------------------------------------------
        All obs |    p90/p10     p90/p50     p10/p50     p75/p25
      ----------+-----------------------------------------------
                |      4.408       1.690       0.383       2.315
      ----------------------------------------------------------
        
      Generalized Entropy indices GE(a), where a = income difference
       sensitivity parameter, and Gini coefficient
      
      ----------------------------------------------------------------------
        All obs |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
      ----------+-----------------------------------------------------------
                |    0.22626     0.17528     0.16270     0.17845     0.31159
      ----------------------------------------------------------------------
         
      Atkinson indices, A(e), where e > 0 is the inequality aversion parameter
      
      ----------------------------------------------
        All obs |     A(0.5)        A(1)        A(2)
      ----------+-----------------------------------
                |    0.08093     0.16078     0.31154
      ----------------------------------------------
      (26 real changes made)
      (26 real changes made)
      
      . 
      . ta race, su(rgini)
      
                  |          Summary of rgini
             race |        Mean   Std. Dev.       Freq.
      ------------+------------------------------------
            white |   .33069554           0       1,637
            black |   .32956558           0         583
            other |   .31159177           0          26
      ------------+------------------------------------
            Total |   .33018109   .00207206       2,246
      
      . // values are stored in the active dataset
      
      . ta race, su(rmld)
      
                  |           Summary of rmld
             race |        Mean   Std. Dev.       Freq.
      ------------+------------------------------------
            white |   .17908855           0       1,637
            black |   .17737603           0         583
            other |   .17528468           0          26
      ------------+------------------------------------
            Total |   .17859999   .00083089       2,246
      
      . tabstat rgini rmld, by(race)
      
      Summary statistics: mean
        by categories of: race (race)
      
        race |     rgini      rmld
      -------+--------------------
       white |  .3306955  .1790885
       black |  .3295656   .177376
       other |  .3115918  .1752847
      -------+--------------------
       Total |  .3301811     .1786
      ----------------------------
      
      . // but you can get -statsby- type results directly now, if wished:
      
      
      . collapse (mean) rgini rmld , by(race)
      
      . 
      . list
      
           +-----------------------------+
           |  race      rgini       rmld |
           |-----------------------------|
        1. | white   .3306955   .1790885 |
        2. | black   .3295656    .177376 |
        3. | other   .3115918   .1752847 |
           +-----------------------------+
      
      . describe
      
      Contains data
        obs:             3                          NLSW, 1988 extract
       vars:             3                          
                                                    (_dta has notes)
      -----------------------------------------------------------------------------------------------------------------------------------------------------------
                    storage   display    value
      variable name   type    format     label      variable label
      -----------------------------------------------------------------------------------------------------------------------------------------------------------
      race            byte    %8.0g      racelbl    race
      rgini           float   %9.0g                 (mean) rgini
      rmld            float   %9.0g                 (mean) rmld
      -----------------------------------------------------------------------------------------------------------------------------------------------------------
      Sorted by: race
           Note: Dataset has changed since last saved.
      PS Kevin: you know where I live (so to speak). Happy to discuss further via email.

      Comment


      • #4
        Thanks Nick & Steve, super helpful as ever. That should probably do . If not I may 'visit'!

        Comment


        • #5
          Here opposite statements are correct, as in quantum physics, supposedly, and most philosophy or social science.

          1. Official Stata would be better off with a clear, correct and comprehensive suite of commands for income inequality calculations.

          2. This is an area where many users work and several smart, experienced users have produced excellent commands over several years. I don't know of any Stata developer who's ever had comparable interests and expertise, so the company seems happy with the fact that the expertise is on the user side. That is explicit at https://www.stata.com/manuals/rinequality.pdf

          There are also some not so good commands. That is not really a criticism: in at least some cases, the authors' work did what they wanted it to do and they weren't trying to do anything else or be more comprehensive, and they moved on to other projects, or just faded away.

          The etiquette (which on other grounds I approve) is more or less that it is in order to point out definite errors but not very nice to comment that a particular command is not very good, rather odd, or distinctly limited in scope. Hence users are expected to feel their way towards finding the best commands.

          I'll put on a hat as an Editor of the Stata Journal and invite a comprehensive, authoritative review of income inequality calculations in Stata, which would be pretty much guaranteed instant classic status. No names here, but if you think I mean you, you're right.

          On the rather different topic of entropy, diversity, concentration and the like for categories, I started writing something myself early in lockdown, but no law of nature or society stops someone else getting there sooner.

          Comment


          • #6
            Actually saving the results to a dataset is rather easy because statsby creates such a dataset:

            Here Steve was listing the dataset so you don't need to do anything fancy, just save.
            list +------------------+ | race gini | |------------------| 1. | white .3306955 | 2. | black .3295656 | 3. | other .3115918 | +------------------+

            Comment

            Working...
            X