Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    can someone provide details what does decile do and how should be used

    Comment


    • #17
      Olive: In turn, what is "decile" precisely? Are you referring or alluding to a particular Stata command or function? Are you asking about the statistical idea of a decile or deciles? You ask for details but provide none yourself. Please expand your question (greatly).

      Comment


      • #18
        Hi Olive, how are you getting on with Nick's question?

        Comment


        • #19
          Originally posted by David Benson View Post
          I'm not sure what your exact question is, but I made up some toy data to test things out:


          Hi everyone,

          Two questions:

          1. How would you interpret the results of sumdist?
          I attempted an interpretation below - this is using David's example.

          HTML Code:
            sumdist income if year==2010, n(10) qgp(gp)
          OUTPUT
          Click image for larger version

Name:	1.png
Views:	1
Size:	15.9 KB
ID:	1569199




          INTERPRETATION

          $17,974 is the first decile group.
          10% of the observations are smaller than it.

          $33,631 is the second decile group.
          20% of the observations are smaller than it.
          ...
          $113,743 is the ninth decile group.
          90% of the observations are smaller than it.


          2. How would you find the total income by decile?
          The output would show 10 series. Each series would sum the income of the bottom 10%, next 10%, ..., top 10%.
          I tried this code on David's example, but I don't think it is correct. It shows the counts and not the total income by decile.

          HTML Code:
          xtile dec = income, nq(10)
          tabstat income, stat(n) by(dec)
          Last edited by Malina Rokis; 19 Aug 2020, 13:20.

          Comment


          • #20
            "$17,974 is the first decile" (not decile group) and 10% of the obs have an income less than the first decile. There are 9 deciles and they characterize 10 decile groups. -sumdist- shows you the share of total income that is held by each quantile group (decile group in this example), and saves results in r(). Also left behind in r() are the overall mean and the sample (sum of weights). From the latter 2 stored results you can get an estimate of total income for the sample as a whole. This total x decile group share = total income held by that decile group.

            Code:
            . sysuse auto
            (1978 Automobile Data)
            
            . sumdist mpg, ng(5)
             
            Distributional summary statistics, 5 quantile groups
            
            ---------------------------------------------------------------------------
            Quantile  |
            group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
            ----------+----------------------------------------------------------------
                    1 |      17.000       85.000       17.132       17.132        3.649
                    2 |      19.000       95.000       19.924       37.056        7.892
                    3 |      22.000      110.000       17.449       54.505       11.608
                    4 |      25.000      125.000       18.401       72.906       15.527
                    5 |                                27.094      100.000       21.297
            ---------------------------------------------------------------------------
            Share = quantile group share of total mpg; 
            L(p)=cumulative group share; GL(p)=L(p)*mean(mpg)
            
            . return list
            
            scalars:
                            r(gl5) =  21.29729729729729
                          r(cush5) =  .9999999999999998
                            r(sh5) =  .2709390862944162
                          r(qrel4) =  1.25
                             r(q4) =  25
                            r(gl4) =  15.52702702702702
                          r(cush4) =  .7290609137055836
                            r(sh4) =  .1840101522842639
                          r(qrel3) =  1.1
                             r(q3) =  22
                            r(gl3) =  11.60810810810811
                          r(cush3) =  .5450507614213197
                            r(sh3) =  .174492385786802
                          r(qrel2) =  .95
                             r(q2) =  19
                            r(gl2) =  7.891891891891891
                          r(cush2) =  .3705583756345177
                            r(sh2) =  .199238578680203
                          r(qrel1) =  .85
                             r(q1) =  17
                            r(gl1) =  3.648648648648649
                          r(cush1) =  .1713197969543147
                            r(sh1) =  .1713197969543147
                           r(ngps) =  5
                            r(p95) =  34
                            r(p90) =  29
                            r(p75) =  25
                            r(p50) =  20
                            r(p25) =  18
                            r(p10) =  14
                             r(p5) =  14
                         r(median) =  20
                              r(N) =  74
                          r(sum_w) =  74
                           r(mean) =  21.2972972972973
            
            matrices:
                   r(relquantiles) :  1 x 4
                         r(shares) :  1 x 5
                      r(quantiles) :  1 x 4
            
            . di "total income held by poorest fifth = "  r(sh1) * r(N) * r(mean)
            total income held by poorest fifth = 270

            Comment


            • #21
              Your reply helped, thanks Stephen!

              How would the exercise change if we were to find the total income by decile? Specifically, if the deciles are based on the population - so say there are 40M individuals, and we then want the total income of the 4M with the lowest income, next 4M with the lowest income, and so forth.

              For example, I adapted your code slightly and added an extra line of code. From these 74 observations, how would we divide the data into equally sized groups?

              HTML Code:
              sysuse auto
              sumdist mpg, ng(5) qgp(group)
              return list
              di "total income held by poorest fifth = " r(sh1) * r(N) * r(mean)
              tabstat mpg, stat(sum n) by(group)
              
              Summary for variables: mpg
                   by categories of: group (Quantile group)
              
                 group |       sum         N
              ---------+--------------------
                     1 |       270        18
                     2 |       314        17
                     3 |       275        13
                     4 |       290        12
                     5 |       427        14
              ---------+--------------------
                 Total |      1576        74
              ------------------------------
              Is there a better way to xtile?
              HTML Code:
              xtile gp=mpg, n(5)
              tab gp
              Last edited by Malina Rokis; 19 Aug 2020, 17:42.

              Comment


              • #22
                Type -help sumdist- and you'll see that there's an option to derive quantile group membership! (As it happens, -sumdist- calls -xtile- to do its work, as a "viewsource sumdist.ado" would show you.) With a membership variable created (with values 1,..., 10, for the decile group case, you can then loop to derive group totals

                Code:
                . sysuse nlsw88.dta,
                (NLSW, 1988 extract)
                
                . sumdist wage, ng(10) qgp(group)
                 
                Distributional summary statistics, 10 quantile groups
                
                ---------------------------------------------------------------------------
                Quantile  |
                group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
                ----------+----------------------------------------------------------------
                        1 |       3.221       51.347        3.803        3.803        0.295
                        2 |       4.026       64.184        5.088        8.891        0.691
                        3 |       4.694       74.838        4.705       13.595        1.056
                        4 |       5.435       86.648        6.541       20.136        1.564
                        5 |       6.272       99.998        7.461       27.597        2.143
                        6 |       7.311      116.557        8.750       36.347        2.823
                        7 |       8.671      138.251       10.368       46.716        3.628
                        8 |      10.274      163.796       12.105       58.821        4.569
                        9 |      12.778      203.719       15.127       73.948        5.743
                       10 |                                26.052      100.000        7.767
                ---------------------------------------------------------------------------
                Share = quantile group share of total wage; 
                L(p)=cumulative group share; GL(p)=L(p)*mean(wage)
                
                . ta group
                
                   Quantile |
                      group |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          1 |        245       10.91       10.91
                          2 |        242       10.77       21.68
                          3 |        188        8.37       30.05
                          4 |        226       10.06       40.12
                          5 |        222        9.88       50.00
                          6 |        225       10.02       60.02
                          7 |        227       10.11       70.12
                          8 |        222        9.88       80.01
                          9 |        231       10.28       90.29
                         10 |        218        9.71      100.00
                ------------+-----------------------------------
                      Total |      2,246      100.00
                
                . ge gtotal = .
                (2,246 missing values generated)
                
                . forval g = 1/10 {
                  2.         sum wage if group == `g'
                  3.         replace gtotal = r(sum) if group == `g'
                  4. }
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        245    2.707919    .4634796   1.004952   3.220612
                (245 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        242    3.667434    .2614696   3.239966   4.025765
                (242 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        188    4.365382    .1798131   4.033815   4.694041
                (188 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        226    5.048622    .2051734   4.703177   5.434783
                (226 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        222     5.86305    .2401936   5.442833    6.27214
                (222 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        225    6.784099    .2933849   6.272401   7.310784
                (225 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        227    7.967731    .3787266   7.318838   8.671494
                (227 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        222    9.512104    .4832126   8.679548   10.27375
                (222 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        231    11.42343    .6994007   10.32206   12.77777
                (231 real changes made)
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        wage |        218    20.84741    9.051053   12.82608   40.74659
                (218 real changes made)
                
                . ta group , su(gtotal)
                
                   Quantile |          Summary of gtotal
                      group |        Mean   Std. Dev.       Freq.
                ------------+------------------------------------
                          1 |   663.44012           0         245
                          2 |   887.51904           0         242
                          3 |   820.69177           0         188
                          4 |   1140.9886           0         226
                          5 |    1301.597           0         222
                          6 |   1526.4224           0         225
                          7 |   1808.6749           0         227
                          8 |    2111.687           0         222
                          9 |   2638.8113           0         231
                         10 |   4544.7354           0         218
                ------------+------------------------------------
                      Total |   1737.1135   1093.8109       2,246
                
                . tabstat gtotal, by(group) stats(mean count)
                
                Summary for variables: gtotal
                     by categories of: group (Quantile group)
                
                   group |      mean         N
                ---------+--------------------
                       1 |  663.4401       245
                       2 |   887.519       242
                       3 |  820.6918       188
                       4 |  1140.989       226
                       5 |  1301.597       222
                       6 |  1526.422       225
                       7 |  1808.675       227
                       8 |  2111.687       222
                       9 |  2638.811       231
                      10 |  4544.735       218
                ---------+--------------------
                   Total |  1737.114      2246
                ------------------------------
                Notice, importantly, that the number of cases per decile group is not the same. This is a well-known (and common) issue, often discussed on Statalist (you can search), and arises because of the discrete nature of the finite sample data. Observe that the number of observations divided by ten does not give an integer result: 2246/10 = 224.6.

                I appreciate that you're new to Statalist -- welcome. But, please, read help-files in their entirety. before posting. And it'd be a good idea to read the Forum FAQ as well.

                Comment


                • #23
                  Stephen Jenkins gives excellent advice as always.

                  Here's another way to do it, without a loop.

                  Code:
                  . sysuse nlsw88, clear
                  (NLSW, 1988 extract)
                  
                  . sumdist wage, ng(10) qgp(group)
                   
                  Distributional summary statistics, 10 quantile groups
                  
                  ---------------------------------------------------------------------------
                  Quantile  |
                  group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
                  ----------+----------------------------------------------------------------
                          1 |       3.221       51.347        3.803        3.803        0.295
                          2 |       4.026       64.184        5.088        8.891        0.691
                          3 |       4.694       74.838        4.705       13.595        1.056
                          4 |       5.435       86.648        6.541       20.136        1.564
                          5 |       6.272       99.998        7.461       27.597        2.143
                          6 |       7.311      116.557        8.750       36.347        2.823
                          7 |       8.671      138.251       10.368       46.716        3.628
                          8 |      10.274      163.796       12.105       58.821        4.569
                          9 |      12.778      203.719       15.127       73.948        5.743
                         10 |                                26.052      100.000        7.767
                  ---------------------------------------------------------------------------
                  Share = quantile group share of total wage;
                  L(p)=cumulative group share; GL(p)=L(p)*mean(wage)
                  
                  . egen mean = mean(wage), by(group)
                  
                  . egen count = count(wage), by(group)
                  
                  . gen gtotal = mean * count
                  
                  . tabdisp group, c(mean count gtotal)
                  
                  ----------------------------------------------
                  Quantile  |
                  group     |       mean       count      gtotal
                  ----------+-----------------------------------
                          1 |   2.707919         245    663.4401
                          2 |   3.667434         242    887.5191
                          3 |   4.365382         188    820.6918
                          4 |   5.048622         226    1140.989
                          5 |    5.86305         222    1301.597
                          6 |   6.784099         225    1526.422
                          7 |   7.967731         227    1808.675
                          8 |   9.512104         222    2111.687
                          9 |   11.42342         231    2638.811
                         10 |   20.84741         218    4544.735
                  ----------------------------------------------
                  Naturally the number of decimal places for mean and total wage is ridiculous, except to show that you get the same answers.

                  I will add a personal hobby-horse. Historically, deciles and other named quantiles are levels of a variable. The bins, classes or intervals they delimit are also often also called deciles (or whatever). The ambiguity doesn't usually bite hard, but spelling out that you mean bins, classes or intervals is painless and will impress those you want to impress.

                  The fullest list of such terms I know, with dates of known first use, is at

                  https://stats.stackexchange.com/ques.../235334#235334

                  which itself updates a list published in the Stata Journal in 2016. Further contributions to the menagerie are most welcome;
                  Last edited by Nick Cox; 20 Aug 2020, 03:27.

                  Comment


                  • #24
                    Thank you very much Stephen and Nick!

                    Comment


                    • #25
                      Thank you all very much.

                      Comment


                      • #26
                        Hi everyone. Pls what is the stata code for estimating descriptive statistic by quantile. i want nq(4).
                        Thanks

                        Comment


                        • #27
                          Hi everyone. Pls, what is the stata code for estimating descriptive statistic by quantile. i want nq(4). i have panel data
                          Thanks

                          Comment

                          Working...
                          X