Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ineqdeco: how to request one inequality only?

    Hi,

    I am using American Community Survey data to decompose national income inequality to within-puma inequality and between-puma inequality. I am only interested in the measures of generalized entropy and Gini coefficient. However, I do not know how to as the package ineqdesco to calculate just one coefficient alone. Because of the huge number of observations, over a million, and the huge number of groups, 2000 pumas, it takes forever to calculate all the measures at once. Is there any advice on how to solve this issue. I am very grateful.

  • #2
    ineqdeco (from SSC -- please follow Forum FAQ guidance and report the provenance of community-contributed commands), and its sibling ineqdec0 (also on SSC), produce all the various estimates -- as you observe. This cannot be changed and, as author, I am unwilling to revise the code to do what you want. Whatever, please be more precise about timings than simply referring to "forever". If you look at the toy example below, you'll see ineqdeco generating inequality estimates for a sample of 1 million observations and for 2 subgroups (each 500,000), all within 40 seconds. [I using a 5 year old PC running Windows 10 and Stata 16/MP4.] When I up the number of subgroups to 2000, the run-time is also around 40 seconds. ineqdeco uses egen when doing calculations; that is what governs the speed. (Look also at the time taken in the second example to create the subgroup membership variable using egen,)

    So, on the evidence of what I've seen so far, "forever" is actually rather a short time. (I concede that run-time may be slower if you're using a less powerful computer than mine -- but mine isn't particularly powerful.)

    My recommendation: take a p% random sample (where p is 'small'), and get all your do-file code working properly. Then, and only then, apply your do-file code to the whole sample.

    Code:
    . cscript
    -------------------------------------------------------------------------BEGIN
    
    . set obs 1000000
    number of observations (_N) was 0, now 1,000,000
    
    . ge x = rweibull(2,1000)
    
    . ge y = _n <= 500000
    
    . su
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               x |  1,000,000    886.5044     463.434    1.03187    4094.88
               y |  1,000,000          .5    .5000003          0          1
    
    . timer clear
    
    . timer on 1
    
    . ineqdeco x, by(y)
     
    Percentile ratios
    
    ----------------------------------------------------------
      All obs |    p90/p10     p90/p50     p10/p50     p75/p25
    ----------+-----------------------------------------------
              |      4.674       1.823       0.390       2.196
    ----------------------------------------------------------
      
    Generalized Entropy indices GE(a), where a = income difference
     sensitivity parameter, and Gini coefficient
    
    ----------------------------------------------------------------------
      All obs |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
    ----------+-----------------------------------------------------------
              |    0.28679     0.16781     0.13902     0.13664     0.29289
    ----------------------------------------------------------------------
      
    Atkinson indices, A(e), where e > 0 is the inequality aversion parameter
    
    ----------------------------------------------
      All obs |     A(0.5)        A(1)        A(2)
    ----------+-----------------------------------
              |    0.07295     0.15448     0.36450
    ----------------------------------------------
      
    Subgroup summary statistics, for each subgroup k = 1,...,K:
      
    
    ----------------------------------------------------------------------------
            y |  Popn. share         Mean     __00000F Income share    log(mean)
    ----------+-----------------------------------------------------------------
            0 |      0.50000    886.41725      0.99990      0.49995      6.78719
            1 |      0.50000    886.59155      1.00010      0.50005      6.78738
    ----------------------------------------------------------------------------
      
    Subgroup indices: GE_k(a) and Gini_k
    
    ----------------------------------------------------------------------
            y |     GE(-1)       GE(0)       GE(1)       GE(2)        Gini
    ----------+-----------------------------------------------------------
            0 |    0.28804     0.16807     0.13920     0.13680     0.29307
            1 |    0.28553     0.16754     0.13883     0.13649     0.29270
    ----------------------------------------------------------------------
      
    Within-group inequality, GE_W(a)
    
    ----------------------------------------------------------
      All obs |     GE(-1)       GE(0)       GE(1)       GE(2)
    ----------+-----------------------------------------------
              |    0.28679     0.16781     0.13902     0.13664
    ----------------------------------------------------------
                  
    Between-group inequality, GE_B(a):
    
    ----------------------------------------------------------
      All obs |     GE(-1)       GE(0)       GE(1)       GE(2)
    ----------+-----------------------------------------------
              |    0.00000     0.00000     0.00000     0.00000
    ----------------------------------------------------------
                  
    Subgroup Atkinson indices, A_k(e)
    
    ----------------------------------------------
            y |     A(0.5)        A(1)        A(2)
    ----------+-----------------------------------
            0 |    0.07305     0.15470     0.36552
            1 |    0.07285     0.15426     0.36349
    ----------------------------------------------
      
    Within-group inequality, A_W(e)
    
    ----------------------------------------------
      All obs |     A(0.5)        A(1)        A(2)
    ----------+-----------------------------------
              |    0.07295     0.15448     0.36450
    ----------------------------------------------
     
    Between-group inequality, A_B(e)
    
    ----------------------------------------------
      All obs |     A(0.5)        A(1)        A(2)
    ----------+-----------------------------------
              |    0.00000     0.00000     0.00000
    ----------------------------------------------
    
    . timer off 1
    
    . timer list 1
       1:     39.99 /        1 =      39.9920
    
    .
    end of do-file
    
    . return list
    
    scalars:
                     r(t1) =  39.992
                    r(nt1) =  1
                   r(ede2) =  563.369397641752
                   r(ede1) =  749.5548518415387
                r(edehalf) =  821.8341376661725
             r(between_a2) =  2.86437773422e-06
             r(between_a1) =  6.51596371304e-08
          r(between_ahalf) =  1.10316945579e-08
              r(within_a2) =  .3645028597966497
              r(within_a1) =  .1544825961456613
           r(within_ahalf) =  .0729497267974666
                   r(a2_1) =  .3634898878742541
                   r(a1_1) =  .1542605459690094
                r(ahalf_1) =  .0728460480439089
                   r(a2_0) =  .3655160308852694
                   r(a1_0) =  .1547046899795532
                r(ahalf_0) =  .0730534259353579
            r(between_ge2) =  4.83173656985e-09
            r(between_ge1) =  4.83173669989e-09
            r(between_ge0) =  4.83171827671e-09
           r(between_gem1) =  4.83172323961e-09
             r(within_ge2) =  .1366418097587038
             r(within_ge1) =  .1390151382648356
             r(within_ge0) =  .1678065852066447
            r(within_gem1) =  .286787850764712
                 r(sumw_1) =  500000
                    r(v_1) =  .5
               r(lambda_1) =  1.00009830287314
                r(theta_1) =  .5000491514365698
               r(lgmean_1) =  6.787384387520166
                 r(mean_1) =  886.5915464947893
                 r(gini_1) =  .2927044679756636
                  r(ge2_1) =  .1364879065313959
                  r(ge1_1) =  .1388310267436618
                  r(ge0_1) =  .1675439332157659
                 r(gem1_1) =  .285533474606041
                 r(sumw_0) =  500000
                    r(v_0) =  .5
               r(lambda_0) =  .9999016971268778
                r(theta_0) =  .4999508485634389
               r(lgmean_0) =  6.787187781773271
                 r(mean_0) =  886.4172546355519
                 r(gini_0) =  .2930723617497785
                  r(ge2_0) =  .1367957708697531
                  r(ge1_0) =  .1391992859893657
                  r(ge0_0) =  .16806923719132
                 r(gem1_0) =  .2880419747998184
                     r(a2) =  .3645046800866486
                     r(a1) =  .1544826496476682
                  r(ahalf) =  .0729497370320574
                    r(ge2) =  .1366418145888031
                    r(ge1) =  .1390151430977779
                    r(ge0) =  .1678065900352397
                   r(gem1) =  .2867878556024105
                   r(gini) =  .2928885498146932
                 r(p75p50) =  1.414311769315346
                 r(p25p50) =  .6440391105946094
                 r(p10p50) =  .3900888450386837
                 r(p90p50) =  1.823261335109264
                 r(p75p25) =  2.196002922880852
                 r(p90p10) =  4.673964298898262
                    r(p95) =  1731.233825683594
                    r(p90) =  1518.33740234375
                    r(p75) =  1177.780944824219
                    r(p50) =  832.7590637207031
                    r(p25) =  536.3294067382813
                    r(p10) =  324.8500213623047
                     r(p5) =  226.8466720581055
                    r(max) =  4094.880126953125
                    r(min) =  1.031869769096375
                      r(N) =  1000000
                   r(sumw) =  1000000
                     r(sd) =  463.4340302316377
                    r(Var) =  214771.1003767385
                   r(mean) =  886.5044005651629
    
    macros:
                 r(levels) : "0 1"
    
    **** Now increase # groups to 2000
    
    
    . cscript
    -------------------------------------------------------------------------BEGIN 
    
    . set obs 1000000
    number of observations (_N) was 0, now 1,000,000
    
    . ge x = rweibull(2,1000)
    
    . timer clear
    
    . timer on 2
    
    . egen y = cut(x), group(2000) // create 2000 equal-sized groups
    
    . timer off 2
    
    . timer list 2
       2:     83.17 /        1 =      83.1690
    
    . su 
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               x |  1,000,000    886.1598    463.2065    1.07217   3665.663
               y |  1,000,000    999.5001    577.3505          0       1999
    
    . * ta y
    . timer on 1
    
    . quietly ineqdeco x, by(y) // suppress screen output, esp. bygroup listings
    
    . timer off 1
    
    . timer list 1
       1:     39.48 /        1 =      39.4830
    
    . di r(gini)
    .29286926
    
    . di r(ge0)
    .16798598
    
    . di r(ge1)
    .13904055
    
    . 
    . di r(ge0_0)
    .08424461
    
    . di r(ge0_1) // and so on ... for all 2000 subgroups
    .00603639
    
    . di r(ge0_1998)
    .00008771
    
    . di r(ge0_1999)
    .00126333
    
    . di r(within_ge0)
    .00004857
    
    . di r(between_ge0)
    .16793741

    Comment

    Working...
    X