Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • quintile construction for panel data

    Dear forum members-

    I have panel data and wanted to create tercile of wlscore for each country by rounds.
    Example data is below. I am trying with
    Code:
     bys country rounds: xtile wanted = wlscore, nq(3)
    syntax but it does not allow by option.

    Any help is appreciated

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 country byte(rounds id) float wlscore
    "EP" 2 14 .25
    "EP" 2 13 .55
    "EP" 2 12 .34
    "EP" 2 11 .95
    "EP" 2 10 .86
    "EP" 2  9 .73
    "EP" 2  8 .94
    "EP" 2  7 .63
    "EP" 2  6 .36
    "EP" 2  5 .57
    "EP" 2  4 .55
    "EP" 2  3 .71
    "EP" 2  2 .81
    "EP" 2  1 .95
    "EP" 1 14 .72
    "EP" 1 13  .2
    "EP" 1 12 .94
    "EP" 1 11 .73
    "EP" 1 10 .64
    "EP" 1  9 .18
    "EP" 1  8 .47
    "EP" 1  7 .51
    "EP" 1  6  .4
    "EP" 1  5 .64
    "EP" 1  4 .17
    "EP" 1  3 .61
    "EP" 1  2 .81
    "EP" 1  1 .58
    "IN" 2 14 .19
    "IN" 2 13 .59
    "IN" 2 12 .58
    "IN" 2 11 .36
    "IN" 2 10 .13
    "IN" 2  9 .77
    "IN" 2  8 .78
    "IN" 2  7 .18
    "IN" 2  6  .9
    "IN" 2  5 .33
    "IN" 2  4 .89
    "IN" 2  3 .13
    "IN" 2  2 .98
    "IN" 2  1 .77
    "IN" 1 14 .61
    "IN" 1 13 .63
    "IN" 1 12 .31
    "IN" 1 11 .05
    "IN" 1 10  .1
    "IN" 1  9 .06
    "IN" 1  8 .14
    "IN" 1  7 .58
    "IN" 1  6  .7
    "IN" 1  5 .07
    "IN" 1  4 .14
    "IN" 1  3 .66
    "IN" 1  2 .08
    "IN" 1  1 .92
    end
    Last edited by Mukesh Punia; 18 Jun 2025, 05:21.
    Best regards,
    Mukesh

  • #2
    tercile or tertile => two summary measures, three groups. First third, second third, third third.

    quartile = three summary measures, four groups

    quintile = four summary measures, five groups.

    More detail: https://stats.stackexchange.com/ques...half-a-percent

    Which is it that you want?

    My own personal view is that quantile binning is vastly oversold and over-used, unless the context is something like best .. worst performing stocks, but an easy recipe is to bin the percentile ranks.

    More detail: https://www.stata.com/support/faqs/s...ing-positions/

    Percentile ranks, carefully calculated, ignore missing values and treat tied values identically. Here is an example of tertile groups. For any other number of quantile bins,
    change 3 as desired.

    Code:
    . webuse nlswork, clear
    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
    
    . egen rank = rank(ln_wage), by(union)
    
    . egen count = count(ln_wage), by(union)
    
    . gen pcrank = (rank - 0.5) / count
    
    . gen tertile = ceil(3 * pcrank)
    
    . tab tertile union
    
               |      1 if union
       tertile |         0          1 |     Total
    -----------+----------------------+----------
             1 |     4,911      1,503 |     6,414 
             2 |     4,910      1,504 |     6,414 
             3 |     4,907      1,503 |     6,410 
    -----------+----------------------+----------
         Total |    14,728      4,510 |    19,238

    Comment


    • #3
      Aah! Do I really missed something relevant that creates confusion here!!

      Dear Nick my query is -

      I want to create a new variable (index) that categories (from lowest to highest values of wlscore) in three groups by each round for each country.

      simple but too lengthy solution I can see is-

      Code:
       xtile wantedR1IN = wlscore if country=="IN" & rounds ==1 , nq(3)
      // this will create a variable that classify observation of round '1' of country 'IN' in three terciles (bommton, middle, top)

      Then again for country "IN" round 2
      Code:
       xtile wantedR2IN = wlscore if country=="IN" & rounds ==2 , nq(3)
      // and so on..

      but in this way i will end up with (two country * two rounds) 2*2 =4 variables

      Hope this clarifies the issue.
      Best regards,
      Mukesh

      Comment


      • #4
        You can have any number of variables fed to the by() options. That detail is up to you.

        I think you're likely to lose a great deal of detail and are in effect degrading the data, which is a different set of issues.

        Comment


        • #5
          Dear Nick really sorry. I still think there is confusion regarding what I wanted to and what you are suggesting. May be, I am not able to communicate clearly.

          This is a common practice in survey data. For example, for wealth index a wealth score (using PCA or other relevant method) is created, then five quintile are created indicating lowest to highest. Usually with equal number of observations.

          Can you please have a quick look at bottom of the table 3 page number -4. This is a small 4 page document.

          https://www.younglives.org.uk/sites/...et-Oct17_0.pdf

          Thank you
          Best regards,
          Mukesh

          Comment


          • #6
            Originally posted by Nick Cox View Post
            You can have any number of variables fed to the by() options. That detail is up to you.

            I think you're likely to lose a great deal of detail and are in effect degrading the data, which is a different set of issues.
            No,

            Code:
            bys country rounds: xtile wanted = wlscore, nq(3)
            Stata gives; xtile may not be combined with by I am using Stata 18
            Best regards,
            Mukesh

            Comment


            • #7
              Not no, because we are referring to different commands.

              I naturally agree: xtile does not support a by() option or indeed a by: prefix.

              I am saying: So what? The calculation you want is essentially trivial once you have calculated a percentile rank.

              I hoped that it was clear that my reference to by() options cannot apply to any command that does not support them.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str2 country byte(rounds id) float wlscore
              "EP" 2 14 .25
              "EP" 2 13 .55
              "EP" 2 12 .34
              "EP" 2 11 .95
              "EP" 2 10 .86
              "EP" 2  9 .73
              "EP" 2  8 .94
              "EP" 2  7 .63
              "EP" 2  6 .36
              "EP" 2  5 .57
              "EP" 2  4 .55
              "EP" 2  3 .71
              "EP" 2  2 .81
              "EP" 2  1 .95
              "EP" 1 14 .72
              "EP" 1 13  .2
              "EP" 1 12 .94
              "EP" 1 11 .73
              "EP" 1 10 .64
              "EP" 1  9 .18
              "EP" 1  8 .47
              "EP" 1  7 .51
              "EP" 1  6  .4
              "EP" 1  5 .64
              "EP" 1  4 .17
              "EP" 1  3 .61
              "EP" 1  2 .81
              "EP" 1  1 .58
              "IN" 2 14 .19
              "IN" 2 13 .59
              "IN" 2 12 .58
              "IN" 2 11 .36
              "IN" 2 10 .13
              "IN" 2  9 .77
              "IN" 2  8 .78
              "IN" 2  7 .18
              "IN" 2  6  .9
              "IN" 2  5 .33
              "IN" 2  4 .89
              "IN" 2  3 .13
              "IN" 2  2 .98
              "IN" 2  1 .77
              "IN" 1 14 .61
              "IN" 1 13 .63
              "IN" 1 12 .31
              "IN" 1 11 .05
              "IN" 1 10  .1
              "IN" 1  9 .06
              "IN" 1  8 .14
              "IN" 1  7 .58
              "IN" 1  6  .7
              "IN" 1  5 .07
              "IN" 1  4 .14
              "IN" 1  3 .66
              "IN" 1  2 .08
              "IN" 1  1 .92
              end
              
              egen rank = rank(wlscore), by(country rounds) 
              
              egen count = count(wlscore), by(country rounds)
              
              gen tertile = ceil(3 * (rank - 0.5) / count)
              
              sort country rounds wlscore 
              
              list, sepby(country rounds tertile) 
              
              
                   +----------------------------------------------------------+
                   | country   rounds   id   wlscore   rank   count   tertile |
                   |----------------------------------------------------------|
                1. |      EP        1    4       .17      1      14         1 |
                2. |      EP        1    9       .18      2      14         1 |
                3. |      EP        1   13        .2      3      14         1 |
                4. |      EP        1    6        .4      4      14         1 |
                5. |      EP        1    8       .47      5      14         1 |
                   |----------------------------------------------------------|
                6. |      EP        1    7       .51      6      14         2 |
                7. |      EP        1    1       .58      7      14         2 |
                8. |      EP        1    3       .61      8      14         2 |
                9. |      EP        1   10       .64    9.5      14         2 |
               10. |      EP        1    5       .64    9.5      14         2 |
                   |----------------------------------------------------------|
               11. |      EP        1   14       .72     11      14         3 |
               12. |      EP        1   11       .73     12      14         3 |
               13. |      EP        1    2       .81     13      14         3 |
               14. |      EP        1   12       .94     14      14         3 |
                   |----------------------------------------------------------|
               15. |      EP        2   14       .25      1      14         1 |
               16. |      EP        2   12       .34      2      14         1 |
               17. |      EP        2    6       .36      3      14         1 |
               18. |      EP        2    4       .55    4.5      14         1 |
               19. |      EP        2   13       .55    4.5      14         1 |
                   |----------------------------------------------------------|
               20. |      EP        2    5       .57      6      14         2 |
               21. |      EP        2    7       .63      7      14         2 |
               22. |      EP        2    3       .71      8      14         2 |
               23. |      EP        2    9       .73      9      14         2 |
                   |----------------------------------------------------------|
               24. |      EP        2    2       .81     10      14         3 |
               25. |      EP        2   10       .86     11      14         3 |
               26. |      EP        2    8       .94     12      14         3 |
               27. |      EP        2   11       .95   13.5      14         3 |
               28. |      EP        2    1       .95   13.5      14         3 |
                   |----------------------------------------------------------|
               29. |      IN        1   11       .05      1      14         1 |
               30. |      IN        1    9       .06      2      14         1 |
               31. |      IN        1    5       .07      3      14         1 |
               32. |      IN        1    2       .08      4      14         1 |
               33. |      IN        1   10        .1      5      14         1 |
                   |----------------------------------------------------------|
               34. |      IN        1    4       .14    6.5      14         2 |
               35. |      IN        1    8       .14    6.5      14         2 |
               36. |      IN        1   12       .31      8      14         2 |
               37. |      IN        1    7       .58      9      14         2 |
                   |----------------------------------------------------------|
               38. |      IN        1   14       .61     10      14         3 |
               39. |      IN        1   13       .63     11      14         3 |
               40. |      IN        1    3       .66     12      14         3 |
               41. |      IN        1    6        .7     13      14         3 |
               42. |      IN        1    1       .92     14      14         3 |
                   |----------------------------------------------------------|
               43. |      IN        2   10       .13    1.5      14         1 |
               44. |      IN        2    3       .13    1.5      14         1 |
               45. |      IN        2    7       .18      3      14         1 |
               46. |      IN        2   14       .19      4      14         1 |
               47. |      IN        2    5       .33      5      14         1 |
                   |----------------------------------------------------------|
               48. |      IN        2   11       .36      6      14         2 |
               49. |      IN        2   12       .58      7      14         2 |
               50. |      IN        2   13       .59      8      14         2 |
               51. |      IN        2    1       .77    9.5      14         2 |
               52. |      IN        2    9       .77    9.5      14         2 |
                   |----------------------------------------------------------|
               53. |      IN        2    8       .78     11      14         3 |
               54. |      IN        2    4       .89     12      14         3 |
               55. |      IN        2    6        .9     13      14         3 |
               56. |      IN        2    2       .98     14      14         3 |
                   +----------------------------------------------------------+

              Comment


              • #8
                Thank you - Nick Cox for your great support.
                Best regards,
                Mukesh

                Comment


                • #9
                  Thanks for your appreciation, but I want to flag that if placing values in context is important, percentile rank is a good summary and degrading that to quantile groups is, as said, a degradation. For example, a very small change might mean a jump between quantile bins, but a rather large one no jump at all.

                  The popularity of such a method is no guide to its merits.

                  Comment

                  Working...
                  X