Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quartiles, Quintiles, Deciles, and Percentiles

    Dear All,
    Want to confirm the following
    1. Quintile - Divides the distribution into fifths,
    Code:
    sumdist x [aw=wght], n(5)
    ; 2. Decile-Divides the distribution into tenths,
    Code:
    sumdist x [aw=wght], n(10);
    3. Percentile- Divides the distribution into hundredths
    Code:
    sumdist x [aw=wght], n(100);
    . 4. Quartiles- Divides the distribution into quarters,
    Code:
    sumdist x [aw=wght], n(20)
    Decile and Quantile are they thesame? Thanks.
    Last edited by Zuhumnan Dapel; 04 Apr 2015, 11:40.

  • #2
    Any help please?

    Comment


    • #3
      Points 1, 2 and 3 are correct as written. The description in point 4 is correct but the code should be
      Code:
      sumdist x [aw=wght], n(4)
      "Quantile" encompasses all the others, and refers to the division of a distribution into any number of equal groups. The others terms are special cases of quantiles. For example, there is no commonly-used term for dividing a distribution into 42 equal groups, so if we do so using
      Code:
      sumdist x [aw=wght], n(42)
      we then refer to the 42 quantiles where we might otherwise refer to the 4 quartiles or 10 deciles or 20 ventiles.

      From the dictionary on my Mac:
      quantile |ˈkwänˌtīl| noun Statistics
      each of any set of values of a variate that divide a frequency distribution into equal groups, each containing the same fraction of the total population.
      • any of the groups so produced, e.g., a quartile or percentile.

      Comment


      • #4
        This is clear to me. Thank you very much

        Comment


        • #5
          Dear All, a couple of issues for help
          Code:
          . sumdist income [aw=popwt2] if year==1980, n(10) qgp(gp)
          Distributional summary statistics, 10 quantile groups
          
          ---------------------------------------------------------------------------
          Quantile  |
          group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
          ----------+----------------------------------------------------------------
                  1 |     3350.59        38.03         1.68         1.68       242.19
                  2 |     4666.15        52.97         2.81         4.49       648.68
                  3 |     5992.97        68.03         3.73         8.22      1187.47
                  4 |     7096.29        80.55         4.52        12.75      1840.80
                  5 |     8809.27       100.00         5.47        18.21      2630.46
                  6 |    10689.07       121.34         6.70        24.91      3597.80
                  7 |    13486.91       153.10         8.25        33.17      4790.01
                  8 |    18280.20       207.51        10.68        43.85      6332.56
                  9 |    29028.91       329.53        15.71        59.55      8601.15
                 10 |                                 40.45       100.00     14442.95
          ---------------------------------------------------------------------------
          Share = quantile group share of total pcepm; 
          L(p)=cumulative group share; GL(p)=L(p)*mean(pcepm)
          Is 3350.59 the average income of the first decile? If yes, what is the average income of the 10th decile as it is blank from the table of results above?
          Thanks,
          Dapel

          Comment


          • #6
            Dear Dapel,

            I am not familiar with this command, but since you are not getting any help...

            My interpretation is that 3350.59 is the first decile, that is, a value such that 10% of the income observations are smaller than it. The 9th decile is 29028.91, so 10% of the observations are above it. There is no value in the last row because you only need 9 values to split the data in 10 parts (like you just need the median to split the sample in two). Quantiles do not involve averages at all.

            All the best,

            Joao

            Comment


            • #7
              Joao is right. In fact, this is all documented directly in the help for sumdist (SSC, was STB, as you are asked to explain):

              sumdist estimates distributional summary statistics commonly used by income distribution
              analysts, complementing those available via pctile, xtile, and summarize, detail.
              Calculations are based on all non-missing values of varname. Use if if you wish to
              exclude values less than or equal to zero.

              For variable x and distribution function F(x), the statistics are:

              (1) quantiles k = 1,2,...,K-1, for K = # quantile groups;

              (2) the quantiles expressed as a percentage of median(x);

              (3) the quantile group shares of x in total x (expressed as a %);

              (4) the cumulative quantile group shares of total x (with cumulation in ascending order of
              x), i.e. the Lorenz ordinates L(p_k) at each p_k = F(x_k) for quantile points x_k
              (expressed as a %);

              (5) the generalised Lorenz ordinates at each p_k = F(x_k), i.e. GL(p_k) = mean(x)*L(p_k).

              Comment


              • #8
                Thank you all very much.

                Is this Ok for the averages?
                Code:
                . mean income [aw=popwt2] if year==1980, over(quantgp80)
                
                Mean estimation                     Number of obs    =   10280
                
                            1: quantgp80 = 1
                            2: quantgp80 = 2
                            3: quantgp80 = 3
                            4: quantgp80 = 4
                            5: quantgp80 = 5
                            6: quantgp80 = 6
                            7: quantgp80 = 7
                            8: quantgp80 = 8
                            9: quantgp80 = 9
                           10: quantgp80 = 10
                
                --------------------------------------------------------------
                        Over |       Mean   Std. Err.     [95% Conf. Interval]
                -------------+------------------------------------------------
                pce          |
                           1 |   11320.03   192.9415      10941.83    11698.24
                           2 |   19012.88   80.62274      18854.84    19170.92
                           3 |   25202.12   69.64595       25065.6    25338.64
                           4 |   30402.86   64.84568      30275.75    30529.97
                           5 |   37114.57   85.54648      36946.88    37282.25
                           6 |   44980.68   98.77955      44787.05     45174.3
                           7 |   55895.25   128.6133      55643.15    56147.36
                           8 |   72392.55   197.6173      72005.18    72779.92
                           9 |   105989.8   341.6985      105320.1    106659.6
                          10 |   273538.6   5060.291      263619.4    283457.8
                --------------------------------------------------------------

                Comment


                • #9
                  One very general point and one specific one:

                  1. We don't have your dataset, so it is optimistic to expect that we can check anything specific to that.

                  2. The upper limit of the lowest decile (bin) was about 3350. That's only consistent with a bin mean below it. It's inconsistent with a mean of 11320, so far as I can see. I don't see that using analytic weights could be responsible. Same applies to other bins.

                  To make progress, invent a very, very simple dataset for yourself with say 20 observations where you can check independently what is being done. Or choose a publicly accessible dataset that others can work with to check anything.



                  Comment


                  • #10
                    Thank you very much. With respect to point 1 in #9, here is the difference between the results in posts #5 and #8:
                    1. I created deciles using
                      Code:
                       sumdist income [aw=popwt2] if year==1980, n(10) qgp(quantgp80)
                      . The variable representing the deciles is labeled "quantgp80"
                    2. In #8, I obtained the average income over quantgp80, i.e the deciles. Is this Okay if one is comparing the average incomes of the poorest 10% 11320.03 with the richest's 10%, 273538.6? using
                      Code:
                       	
                       mean income [aw=popwt2] if year==1980, over(quantgp80)
                    Dapel

                    Comment


                    • #11
                      Dear All,
                      any further help on this?

                      Thanks,
                      Dapel

                      Comment


                      • #12
                        Back to the issue

                        Comment


                        • #13
                          Hello

                          Comment


                          • #14
                            I'm not sure what your exact question is, but I made up some toy data to test things out:

                            The only unfortunate thing is that I made it with n==100, so each bin will have exactly 10 obs.
                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input byte id int year long income
                              8 2010  10146
                             26 2010  11174
                             67 2010  12490
                             95 2010  14750
                             17 2010  15392
                             96 2010  15481
                             68 2010  16039
                             39 2010  16896
                              3 2010  17656
                             31 2010  17974
                             78 2010  18385
                             92 2010  18417
                             64 2010  21173
                             21 2010  22585
                             93 2010  25789
                             41 2010  29347
                             52 2010  32706
                             71 2010  33138
                              9 2010  33310
                             55 2010  33631
                             35 2010  34246
                             74 2010  34562
                             47 2010  36497
                             12 2010  39458
                             42 2010  44494
                             28 2010  44914
                             50 2010  45368
                             69 2010  46077
                             24 2010  46679
                             32 2010  47606
                             83 2010  48673
                             88 2010  50659
                              5 2010  53302
                             86 2010  53347
                             73 2010  54214
                             54 2010  54732
                             85 2010  55895
                             38 2010  56922
                             29 2010  57893
                             72 2010  60131
                             16 2010  63376
                             65 2010  64058
                             90 2010  65238
                             46 2010  66863
                             76 2010  67018
                            100 2010  68745
                             15 2010  69031
                             63 2010  69923
                              6 2010  71757
                             53 2010  72277
                             49 2010  74093
                             79 2010  74296
                             87 2010  75773
                             40 2010  76920
                             22 2010  77845
                             33 2010  78005
                             97 2010  80743
                             27 2010  80949
                             19 2010  83517
                              4 2010  83694
                             77 2010  83710
                             82 2010  84529
                             11 2010  84599
                             44 2010  87208
                             18 2010  87913
                             48 2010  88552
                             91 2010  88939
                              7 2010  89292
                             56 2010  89751
                             81 2010  89834
                             80 2010  90717
                             70 2010  92030
                             57 2010  94179
                             14 2010  95399
                             34 2010  95427
                             99 2010  99290
                             25 2010  99516
                              1 2010 101805
                             13 2010 102272
                             66 2010 103682
                             37 2010 104408
                             98 2010 104637
                             60 2010 105810
                             20 2010 108473
                             36 2010 110640
                             62 2010 111550
                             61 2010 111954
                             30 2010 112570
                             58 2010 113010
                             23 2010 113743
                             43 2010 113936
                             94 2010 114439
                             89 2010 115663
                             84 2010 116281
                              2 2010 116677
                             45 2010 116889
                             59 2010 121481
                             10 2010 121729
                             75 2010 122364
                             51 2010 122740
                            end
                            Code:
                            * Summary statistics
                            format income %10.0gc
                            summ income, detail format
                            
                                                       income
                            -------------------------------------------------------------
                                  Percentiles      Smallest
                             1%       10,660         10,146
                             5%     15,436.5         11,174
                            10%     18,179.5         12,490       Obs                 100
                            25%       44,704         14,750       Sum of Wgt.         100
                            
                            50%       73,185                      Mean           69,739.1
                                                    Largest       Std. Dev.      33,496.6
                            75%     97,358.5        121,481
                            90%      113,840        121,729       Variance       1.12e+09
                            95%      116,783        122,364       Skewness      -.1833772
                            99%      122,552        122,740       Kurtosis       1.858154
                            
                            
                            sumdist income if year==2010, n(10) qgp(gp)  // if year==2010 isn't going to matter here, since they are all year==2010
                            * So this gives the upper bound of each decile (i.e. top earner in lowest decile earned $17,974 in my madeup data)
                            *  $17,974 / $73,185 = 24.56% of the median
                            Distributional summary statistics, 10 quantile groups
                            
                            ---------------------------------------------------------------------------
                            Quantile  |
                            group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
                            ----------+----------------------------------------------------------------
                                    1 |   17974.000       24.560        2.122        2.122     1479.980
                                    2 |   33631.000       45.953        3.850        5.972     4164.790
                                    3 |   47606.000       65.049        6.021       11.993     8363.800
                                    4 |   60131.000       82.163        7.826       19.819    13821.480
                                    5 |   72277.000       98.759        9.726       29.545    20604.340
                                    6 |   83694.000      114.359       11.268       40.813    28462.690
                                    7 |   89834.000      122.749       12.537       53.350    37205.960
                                    8 |  103682.000      141.671       13.971       67.321    46949.130
                                    9 |  113743.000      155.418       15.727       83.048    57917.080
                                   10 |                                16.952      100.000    69739.070
                            ---------------------------------------------------------------------------
                            Share = quantile group share of total income;
                            L(p)=cumulative group share; GL(p)=L(p)*mean(income)
                            
                            
                            mean income if year==2010, over(gp)
                            
                            Mean estimation                   Number of obs   =        100
                            
                            --------------------------------------------------------------
                                    Over |       Mean   Std. Err.     [95% Conf. Interval]
                            -------------+------------------------------------------------
                            income       |
                                       1 |    14799.8   850.6192      13111.99    16487.61
                                       2 |    26848.1   2005.814      22868.13    30828.07
                                       3 |    41990.1    1660.62      38695.07    45285.13
                                       4 |    54576.8    1067.61      52458.43    56695.17
                                       5 |    67828.6   967.4502      65908.97    69748.23
                                       6 |    78583.5     1109.6      76381.81    80785.19
                                       7 |    87432.7   735.7576       85972.8     88892.6
                                       8 |    97431.7   1423.667      94606.84    100256.6
                                       9 |   109679.5   1131.362      107434.6    111924.4
                                      10 |   118219.9   1093.665      116049.8      120390
                            --------------------------------------------------------------
                            
                            
                            tabstat income, by(gp) stats(n mean median min max sum) format(%10.1gc)
                            // this confirms that bottom decile goes from $10,146 to $17,974, with mean==$14,799.8
                            // 147,998 / 6,973,907 = 0.02122 which matches the Share% from the sumdist table above
                            Summary for variables: income
                                 by categories of: gp (Quantile group)
                            
                                  gp |         N      mean       p50       min       max       sum
                            ---------+-----------------------------------------------------------------
                                   1 |        10    14,800    15,437    10,146    17,974     147,998
                                   2 |        10    26,848    27,568    18,385    33,631     268,481
                                   3 |        10    41,990    44,704    34,246    47,606     419,901
                                   4 |        10    54,577    54,473    48,673    60,131     545,768
                                   5 |        10    67,829    67,882    63,376    72,277     678,286
                                   6 |        10    78,584    77,925    74,093    83,694     785,835
                                   7 |        10    87,433    88,233    83,710    89,834     874,327
                                   8 |        10    97,432    97,359    90,717   103,682     974,317
                                   9 |        10   109,680   111,095   104,408   113,743   1,096,795
                                  10 |        10   118,220   116,783   113,936   122,740   1,182,199
                            ---------+-----------------------------------------------------------------
                               Total |       100    69,739    73,185    10,146   122,740   6,973,907
                            ---------------------------------------------------------------------------
                            Code:
                            * Creating my own summary stats for each gp (decile, in this case)
                            sort income id
                            bysort gp (income): gen n = _n
                            ​​​​​​​egen gp_avg = mean(income), by(gp)
                            egen gp_count = count(income), by(gp)
                            egen gp_min = min(income), by(gp)
                            egen gp_max = max(income), by(gp)
                            
                            . list n income gp gp_avg gp_count gp_min gp_max if gp<=2, sepby(gp) noobs
                            
                              +--------------------------------------------------------------+
                              |  n   income   gp     gp_avg   gp_count     gp_min     gp_max |
                              |--------------------------------------------------------------|
                              |  1   10,146    1   14,799.8         10   10,146.0   17,974.0 |
                              |  2   11,174    1   14,799.8         10   10,146.0   17,974.0 |
                              |  3   12,490    1   14,799.8         10   10,146.0   17,974.0 |
                              |  4   14,750    1   14,799.8         10   10,146.0   17,974.0 |
                              |  5   15,392    1   14,799.8         10   10,146.0   17,974.0 |
                              |  6   15,481    1   14,799.8         10   10,146.0   17,974.0 |
                              |  7   16,039    1   14,799.8         10   10,146.0   17,974.0 |
                              |  8   16,896    1   14,799.8         10   10,146.0   17,974.0 |
                              |  9   17,656    1   14,799.8         10   10,146.0   17,974.0 |
                              | 10   17,974    1   14,799.8         10   10,146.0   17,974.0 |
                              |--------------------------------------------------------------|
                              |  1   18,385    2   26,848.1         10   18,385.0   33,631.0 |
                              |  2   18,417    2   26,848.1         10   18,385.0   33,631.0 |
                              |  3   21,173    2   26,848.1         10   18,385.0   33,631.0 |
                              |  4   22,585    2   26,848.1         10   18,385.0   33,631.0 |
                              |  5   25,789    2   26,848.1         10   18,385.0   33,631.0 |
                              |  6   29,347    2   26,848.1         10   18,385.0   33,631.0 |
                              |  7   32,706    2   26,848.1         10   18,385.0   33,631.0 |
                              |  8   33,138    2   26,848.1         10   18,385.0   33,631.0 |
                              |  9   33,310    2   26,848.1         10   18,385.0   33,631.0 |
                              | 10   33,631    2   26,848.1         10   18,385.0   33,631.0 |
                              +--------------------------------------------------------------+
                            
                            
                            . list n income gp gp_avg gp_count gp_min gp_max if inlist(gp, 9, 10), sepby(gp) noobs
                            
                              +------------------------------------------------------------------+
                              |  n    income   gp      gp_avg   gp_count      gp_min      gp_max |
                              |------------------------------------------------------------------|
                              |  1   104,408    9   109,679.5         10   104,408.0   113,743.0 |
                              |  2   104,637    9   109,679.5         10   104,408.0   113,743.0 |
                              |  3   105,810    9   109,679.5         10   104,408.0   113,743.0 |
                              |  4   108,473    9   109,679.5         10   104,408.0   113,743.0 |
                              |  5   110,640    9   109,679.5         10   104,408.0   113,743.0 |
                              |  6   111,550    9   109,679.5         10   104,408.0   113,743.0 |
                              |  7   111,954    9   109,679.5         10   104,408.0   113,743.0 |
                              |  8   112,570    9   109,679.5         10   104,408.0   113,743.0 |
                              |  9   113,010    9   109,679.5         10   104,408.0   113,743.0 |
                              | 10   113,743    9   109,679.5         10   104,408.0   113,743.0 |
                              |------------------------------------------------------------------|
                              |  1   113,936   10   118,219.9         10   113,936.0   122,740.0 |
                              |  2   114,439   10   118,219.9         10   113,936.0   122,740.0 |
                              |  3   115,663   10   118,219.9         10   113,936.0   122,740.0 |
                              |  4   116,281   10   118,219.9         10   113,936.0   122,740.0 |
                              |  5   116,677   10   118,219.9         10   113,936.0   122,740.0 |
                              |  6   116,889   10   118,219.9         10   113,936.0   122,740.0 |
                              |  7   121,481   10   118,219.9         10   113,936.0   122,740.0 |
                              |  8   121,729   10   118,219.9         10   113,936.0   122,740.0 |
                              |  9   122,364   10   118,219.9         10   113,936.0   122,740.0 |
                              | 10   122,740   10   118,219.9         10   113,936.0   122,740.0 |
                              +------------------------------------------------------------------+
                            ​​​​​​​​​​​​​​

                            ​​​​​​​Hope that helps!
                            Last edited by David Benson; 02 Jan 2019, 22:45.

                            Comment


                            • #15
                              Wow! This is really comprehensive. Will study it. Thanks a million!

                              Comment

                              Working...
                              X