Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ranking A Variable on Tercile ( Top, Middle and Bottom)

    Greetings Everyone! Dear Statalist Members I want to divide my two variables from my data set into Tercile (Rank) and then want to do regression analysis. How can I Rank a variable i-e Firm Size or Firm Age? Kindly guide me in this regard Regards Sattar Khan

  • #2
    Sattar, is the following example code what you want? The variable "rank" stores tercile indicators for variable "mpg".

    Code:
    sysuse auto, clear
    sort mpg
    gen rank = ceil(_n*3/_N)
    Code:
    . list mpg rank
    
         +------------+
         | mpg   rank |
         |------------|
      1. |  12      1 |
      2. |  12      1 |
      3. |  14      1 |
      4. |  14      1 |
      5. |  14      1 |
         |------------|
      6. |  14      1 |
      7. |  14      1 |
      8. |  14      1 |
      9. |  15      1 |
     10. |  15      1 |
         |------------|
     11. |  16      1 |
     12. |  16      1 |
     13. |  16      1 |
     14. |  16      1 |
     15. |  17      1 |
         |------------|
     16. |  17      1 |
     17. |  17      1 |
     18. |  17      1 |
     19. |  18      1 |
     20. |  18      1 |
         |------------|
     21. |  18      1 |
     22. |  18      1 |
     23. |  18      1 |
     24. |  18      1 |
     25. |  18      2 |
         |------------|
     26. |  18      2 |
     27. |  18      2 |
     28. |  19      2 |
     29. |  19      2 |
     30. |  19      2 |
         |------------|
     31. |  19      2 |
     32. |  19      2 |
     33. |  19      2 |
     34. |  19      2 |
     35. |  19      2 |
         |------------|
     36. |  20      2 |
     37. |  20      2 |
     38. |  20      2 |
     39. |  21      2 |
     40. |  21      2 |
         |------------|
     41. |  21      2 |
     42. |  21      2 |
     43. |  21      2 |
     44. |  22      2 |
     45. |  22      2 |
         |------------|
     46. |  22      2 |
     47. |  22      2 |
     48. |  22      2 |
     49. |  23      2 |
     50. |  23      3 |
         |------------|
     51. |  23      3 |
     52. |  24      3 |
     53. |  24      3 |
     54. |  24      3 |
     55. |  24      3 |
         |------------|
     56. |  25      3 |
     57. |  25      3 |
     58. |  25      3 |
     59. |  25      3 |
     60. |  25      3 |
         |------------|
     61. |  26      3 |
     62. |  26      3 |
     63. |  26      3 |
     64. |  28      3 |
     65. |  28      3 |
         |------------|
     66. |  28      3 |
     67. |  29      3 |
     68. |  30      3 |
     69. |  30      3 |
     70. |  31      3 |
         |------------|
     71. |  34      3 |
     72. |  35      3 |
     73. |  35      3 |
     74. |  41      3 |
         +------------+

    Comment


    • #3
      Originally posted by Fei Wang View Post
      Sattar, is the following example code what you want? The variable "rank" stores tercile indicators for variable "mpg".

      Code:
      sysuse auto, clear
      sort mpg
      gen rank = ceil(_n*3/_N)
      Code:
      . list mpg rank
      
      +------------+
      | mpg rank |
      |------------|
      1. | 12 1 |
      2. | 12 1 |
      3. | 14 1 |
      4. | 14 1 |
      5. | 14 1 |
      |------------|
      6. | 14 1 |
      7. | 14 1 |
      8. | 14 1 |
      9. | 15 1 |
      10. | 15 1 |
      |------------|
      11. | 16 1 |
      12. | 16 1 |
      13. | 16 1 |
      14. | 16 1 |
      15. | 17 1 |
      |------------|
      16. | 17 1 |
      17. | 17 1 |
      18. | 17 1 |
      19. | 18 1 |
      20. | 18 1 |
      |------------|
      21. | 18 1 |
      22. | 18 1 |
      23. | 18 1 |
      24. | 18 1 |
      25. | 18 2 |
      |------------|
      26. | 18 2 |
      27. | 18 2 |
      28. | 19 2 |
      29. | 19 2 |
      30. | 19 2 |
      |------------|
      31. | 19 2 |
      32. | 19 2 |
      33. | 19 2 |
      34. | 19 2 |
      35. | 19 2 |
      |------------|
      36. | 20 2 |
      37. | 20 2 |
      38. | 20 2 |
      39. | 21 2 |
      40. | 21 2 |
      |------------|
      41. | 21 2 |
      42. | 21 2 |
      43. | 21 2 |
      44. | 22 2 |
      45. | 22 2 |
      |------------|
      46. | 22 2 |
      47. | 22 2 |
      48. | 22 2 |
      49. | 23 2 |
      50. | 23 3 |
      |------------|
      51. | 23 3 |
      52. | 24 3 |
      53. | 24 3 |
      54. | 24 3 |
      55. | 24 3 |
      |------------|
      56. | 25 3 |
      57. | 25 3 |
      58. | 25 3 |
      59. | 25 3 |
      60. | 25 3 |
      |------------|
      61. | 26 3 |
      62. | 26 3 |
      63. | 26 3 |
      64. | 28 3 |
      65. | 28 3 |
      |------------|
      66. | 28 3 |
      67. | 29 3 |
      68. | 30 3 |
      69. | 30 3 |
      70. | 31 3 |
      |------------|
      71. | 34 3 |
      72. | 35 3 |
      73. | 35 3 |
      74. | 41 3 |
      +------------+
      Thank You, Dear Wang that's work, if you please guide me that how to run regression on Rank Variable as I have made given by your code and this rank variable is an independent variable. Thanks in Advance

      Comment


      • #4
        Satter, the rank variable is a categorical variable, you may need to split it to dummies for independent variables. Below is an example following my code in #2, and the purpose of the code is to show the mean difference of mpg among rank groups -- though not sure if it's what you asked for.

        Code:
        sysuse auto, clear
        
        sort mpg
        gen rank = ceil(_n*3/_N)
        
        reg mpg i.rank

        Comment


        • #5
          Originally posted by Fei Wang View Post
          Satter, the rank variable is a categorical variable, you may need to split it to dummies for independent variables. Below is an example following my code in #2, and the purpose of the code is to show the mean difference of mpg among rank groups -- though not sure if it's what you asked for.

          Code:
          sysuse auto, clear
          
          sort mpg
          gen rank = ceil(_n*3/_N)
          
          reg mpg i.rank
          HTML Code:
          . reg RM familyceo familycfo familycoo acindep acexp acsize acmeet i.rank
          note: familycfo omitted because of collinearity
          note: familycoo omitted because of collinearity
          
                Source |       SS           df       MS      Number of obs   =       996
          -------------+----------------------------------   F(7, 988)       =     38.25
                 Model |  15889449.6         7  2269921.37   Prob > F        =    0.0000
              Residual |  58638964.8       988   59351.179   R-squared       =    0.2132
          -------------+----------------------------------   Adj R-squared   =    0.2076
                 Total |  74528414.4       995  74902.9291   Root MSE        =    243.62
          
          ------------------------------------------------------------------------------
                    RM |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
             familyceo |  -67.92872   15.82893    -4.29   0.000    -98.99092   -36.86653
             familycfo |          0  (omitted)
             familycoo |          0  (omitted)
               acindep |   83.61437   12.85515     6.50   0.000     58.38782    108.8409
                 acexp |   51.88908    8.15929     6.36   0.000     35.87755    67.90061
                acsize |   93.11662   18.39982     5.06   0.000      57.0094    129.2238
                acmeet |   58.95815   19.83276     2.97   0.003     20.03898    97.87731
                       |
                  rank |
                    2  |  -71.91733   19.61793    -3.67   0.000    -110.4149   -33.41973
                    3  |  -21.64432   19.24852    -1.12   0.261    -59.41699    16.12836
                       |
                 _cons |  -890.5891   95.97802    -9.28   0.000    -1078.933   -702.2449
          ------------------------------------------------------------------------------
          Dear, Wang the above is the result of regression but it doesn't showing the result of rank one, how we can display the result of all three ranks.
          Thank You in Advance

          Comment


          • #6
            Beware of missing values

            Code:
            . sysuse nlsw88, clear
            (NLSW, 1988 extract)
            
            .
            . // not taking care of missing values
            . sort hours
            
            . gen rank = ceil(_n*3/_N)
            
            . list hours rank in -10/l
            
                  +--------------+
                  | hours   rank |
                  |--------------|
            2237. |    70      3 |
            2238. |    70      3 |
            2239. |    75      3 |
            2240. |    75      3 |
            2241. |    80      3 |
                  |--------------|
            2242. |    80      3 |
            2243. |     .      3 |
            2244. |     .      3 |
            2245. |     .      3 |
            2246. |     .      3 |
                  +--------------+
            
            .
            . drop rank
            
            .
            . // taking care of missing values
            . sort hours
            
            . count if !missing(hours)
              2,242
            
            . gen rank = ceil(_n*3/r(N)) in 1/`r(N)'
            (4 missing values generated)
            
            . list hours rank in -10/l
            
                  +--------------+
                  | hours   rank |
                  |--------------|
            2237. |    70      3 |
            2238. |    70      3 |
            2239. |    75      3 |
            2240. |    75      3 |
            2241. |    80      3 |
                  |--------------|
            2242. |    80      3 |
            2243. |     .      . |
            2244. |     .      . |
            2245. |     .      . |
            2246. |     .      . |
                  +--------------+
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Maarten Buis View Post
              Beware of missing values

              Code:
              . sysuse nlsw88, clear
              (NLSW, 1988 extract)
              
              .
              . // not taking care of missing values
              . sort hours
              
              . gen rank = ceil(_n*3/_N)
              
              . list hours rank in -10/l
              
              +--------------+
              | hours rank |
              |--------------|
              2237. | 70 3 |
              2238. | 70 3 |
              2239. | 75 3 |
              2240. | 75 3 |
              2241. | 80 3 |
              |--------------|
              2242. | 80 3 |
              2243. | . 3 |
              2244. | . 3 |
              2245. | . 3 |
              2246. | . 3 |
              +--------------+
              
              .
              . drop rank
              
              .
              . // taking care of missing values
              . sort hours
              
              . count if !missing(hours)
              2,242
              
              . gen rank = ceil(_n*3/r(N)) in 1/`r(N)'
              (4 missing values generated)
              
              . list hours rank in -10/l
              
              +--------------+
              | hours rank |
              |--------------|
              2237. | 70 3 |
              2238. | 70 3 |
              2239. | 75 3 |
              2240. | 75 3 |
              2241. | 80 3 |
              |--------------|
              2242. | 80 3 |
              2243. | . . |
              2244. | . . |
              2245. | . . |
              2246. | . . |
              +--------------+
              Sir, How can we run the regression with all the three ranks displaying?

              Comment


              • #8
                Originally posted by Sattar Khan View Post
                it doesn't showing the result of rank one, how we can display the result of all three ranks.
                That is how indicator (dummy) variables work: they show the difference in expected outcome relative to the baseline. The difference in expected outcome of the baseline with respect to the baseline is obviously 0, so that is usually not shown. You can show it, and thus show all all levels by adding the baselevels option. The parameter is trivial, as said above, but it can be useful to remind us what the reference category is.

                You can look at this Stata tip for an alternative way of presenting all three categories: https://journals.sagepub.com/doi/pdf...867X1201200111 . Although I do think that this way of presenting results can occasionally be useful, I don't think it is right for you. You really need to understand what you are doing, or else you are going to make errors without realizing it and everything you have written has suddenly become wrong. Better be safe, and stick to the default way of dealing with categorical variables.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Maarten Buis , thanks for pointing out the issue of missing value and the interpretation in #8.

                  Comment


                  • #10
                    Unfortunately, ranking needs to account for ties as well as missing values.

                    It makes no sense to assign the same value to different groups.

                    This graph makes the problem with code like that in #2 explicit (the point is also clear from the listing in #2 also)

                    xtile does have limitations, yet it does address these points!
                    Click image for larger version

Name:	tertile.png
Views:	1
Size:	21.1 KB
ID:	1635850

                    Last edited by Nick Cox; 10 Nov 2021, 07:50.

                    Comment


                    • #11
                      Yes, Nick, totally agree. -xtile- is better!

                      Comment


                      • #12
                        [QUOTE=Nick Cox;n1635849]Unfortunately, ranking needs to account for ties as well as missing values.

                        It makes no sense to assign the same value to different groups.

                        This graph makes the problem with code like that in #2 explicit (the point is also clear from the listing in #2 also)

                        xtile does have limitations, yet it does address these points! [ATTACH=CONFIG]n1635850[/ATTACH]
                        [/Q
                        Dear Nick, can you provide us the procedure how to do it in my case?

                        Comment


                        • #13
                          Maybe you did not realize that the xtile Nick referred to is just a regular Stata command. No need to install anything; if you have Stata, then you have xtile.

                          So the procedure is the same as with any other Stata command: type help xtile, try if you can get it to work. If that does not work you write us again telling us what you tried ( exactly) what Stata told you in return and why you think that is not what you want.
                          ---------------------------------
                          Maarten L. Buis
                          University of Konstanz
                          Department of history and sociology
                          box 40
                          78457 Konstanz
                          Germany
                          http://www.maartenbuis.nl
                          ---------------------------------

                          Comment

                          Working...
                          X