Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unequal Quintiles using xtile command

    I have to create five quintiles from monthly expenditure data. I used the following code

    xtile quintile=expenditure [aw=sample weights], n(5)

    I also made without weights but both gave un usually low % for second quintile. It looks like this:


    5 quantiles |
    of mpce | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 30,205,892 32.10 32.10
    2 | 9,371,869 9.96 42.06
    3 | 18,560,534 19.73 61.79
    4 | 17,351,454 18.44 80.23
    5 | 18,604,018 19.77 100.00
    ------------+-----------------------------------
    Total | 94,093,767 100.00

    .
    what should I do?

  • #2
    Values that are equal (to each other) must be assigned to the same bin. The problem is discussed in (e.g.) https://www.stata-journal.com/article.html?article=pr0054 and https://www.stata-journal.com/articl...article=dm0095

    Comment


    • #3
      Thank you for the references.
      So just a question for understanding, xtile basically doesn't divide households into 5 equal parts rather decides cut-off points of each quintile based on cumulative percent?

      Comment


      • #4
        Exactly. That's what's explained in the references, and even if dm0095 is behind a paywall pr0054 will not be.

        Let's first mention that -- with divisibility into 5 -- equal groups may be impossible because the number of values may not be a multiple of 5, as when a sample of 19 could at best go into 4 bins of 4 and one of 3. That's a small deal. (And clearly the same problem can bite with any other number of groups.)

        The bigger deal is that xtile follows the rule that the same value must be assigned to the same bin. Researchers are often surprised at how often that bites and yields bins of very unequal size, but ties can be common for all sorts of reasons, including integer values and rounding conventions. Skewed distributions typically make the problem worse.

        The only legitimate trick in town is negating the variable and making xtile reverse direction by assigning bins from the top down.

        Running this script and looking at the tables and graphs will help make it clear.



        Code:
         webuse nlswork, clear
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . xtile age10=age, nq(10)
        
        . tab age10
        
                 10 |
          quantiles |
             of age |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |      4,122       14.46       14.46
                  2 |      3,062       10.74       25.20
                  3 |      1,636        5.74       30.94
                  4 |      2,980       10.45       41.39
                  5 |      2,567        9.00       50.39
                  6 |      3,614       12.68       63.07
                  7 |      2,357        8.27       71.34
                  8 |      3,543       12.43       83.76
                  9 |      1,824        6.40       90.16
                 10 |      2,805        9.84      100.00
        ------------+-----------------------------------
              Total |     28,510      100.00
        
        . histogram age, discrete freq by(age10, row(2))
        
        . gen negage = -age
        (24 missing values generated)
        
        . xtile age10_2=negage, nq(10)
        
        . replace age10_2 = 11 - age10_2
        (28,510 real changes made)
        
        . tab age10_2
        
                 10 |
          quantiles |
          of negage |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |      2,805        9.84        9.84
                  2 |      2,775        9.73       19.57
                  3 |      1,604        5.63       25.20
                  4 |      3,202       11.23       36.43
                  5 |      2,731        9.58       46.01
                  6 |      3,662       12.84       58.85
                  7 |      2,314        8.12       66.97
                  8 |      3,677       12.90       79.87
                  9 |      2,067        7.25       87.12
                 10 |      3,673       12.88      100.00
        ------------+-----------------------------------
              Total |     28,510      100.00
        
        . histogram age, discrete freq by(age10_2, row(2))
        An even bigger deal is why this binning is thought a good thing to do in any case.

        Comment


        • #5
          This was really helpful. Thank you so much.

          Regards

          Comment

          Working...
          X