Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ranking means from highest to lowest after bysort occupation: sum

    Dear all,

    I would like to know how to rank means from highest to lowest after the following command:
    Code:
    bysort occupation: sum im
    where im is a immigrant status indicator. The command gives me the proportion of immigrants in a given occupation.
    I would like to rank these different proportions from highest to lowest.

    Moreover, I would like to know the overall cumulative percentage of immigrants for a given occupation given I ranked the proportions from top to bottom. An example might help.
    occupation, number of immigrants, number of natives, proportion of immigrants, cumulative percentage of immigrants (defined as number of immigrants in a given occupation divided by total number of immigrants in all occupations (say 1000):
    teacher, 50, 50, 0.5, 0.05.

    I do not think I can get one single command that does it all, but any help would be greatly appreciated.

    Best wishes,

    Nico

  • #2
    This is a common problem that should be written up somewhere as an FAQ or Stata Journal Tip (if it isn't already).

    First off, summarize tells you the means but at most the mean last calculated is still in memory so that doesn't help much, if at all.

    Step 1: Calculate the means in a new variable.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . egen mean = mean(price), by(rep78)
    
    . tabdisp rep78, c(mean)
    
    ----------------------
    Repair    |
    Record    |
    1978      |       mean
    ----------+-----------
            1 |     4564.5
            2 |   5967.625
            3 |   6429.233
            4 |     6071.5
            5 |       5913
            . |     6430.4
    ----------------------
    What should you do if you want the ranking to run the other way? It is easy: you can call up
    mean(- price) (or reverse the ranking from below; what may or may not be obvious is that ranks from 1 to 5 (say) are reversed by 6 - rank.

    Step 2: Rank groups of observations on the means. Twist that may bite: sometimes you need to break ties according to the different categories they occur for. Another twist that may bite: if you DON'T want missings on the category to be included, don't specify the option.

    Code:
      
    
    . egen group = group(mean rep78), missing
    
    . tabdisp group , c(mean)
    
    ----------------------
    group(mea |
    n rep78)  |       mean
    ----------+-----------
            1 |     4564.5
            2 |       5913
            3 |   5967.625
            4 |     6071.5
            5 |   6429.233
            6 |     6430.4
    ----------------------
    So far so good, but which group is which in terms of the original categories?

    You can write code to loop over the categories and assign value labels accordingly, and that would be the most fun you could have without laughing. Alternatively,
    labmask from the Stata Journal will do that for you. If you have not installed it, search labmask will give you links.

    Code:
    . labmask group, values(rep78)
    
    . tabdisp group , c(mean)
    
    ----------------------
    group(mea |
    n rep78)  |       mean
    ----------+-----------
            1 |     4564.5
            5 |       5913
            2 |   5967.625
            4 |     6071.5
            3 |   6429.233
            . |     6430.4
    ----------------------

    Optionally, clean up the variable label for the new variable..
    Last edited by Nick Cox; 12 Feb 2021, 06:49.

    Comment


    • #3
      I lost OP after he said "Moreover," but another approach apart from what Nick showed us might be built around -list-. And a very useful Stata Tip on how to make sophisticated tables using -list- is this one:
      Harrison, David A. "Stata tip 34: Tabulation by listing." Stata Journal 6, no. 199-2016-2623 (2006): 425-427.

      E.g., what OP wants before he said "Moreover":

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . egen mean = mean(price), by(rep)
      
      . egen tag = tag(rep), missing
      
      . gsort -mean
      
      . list mean if tag, sep(0)
      
           +----------+
           |     mean |
           |----------|
        5. |   6430.4 |
       21. | 6429.233 |
       39. |   6071.5 |
       57. | 5967.625 |
       71. |     5913 |
       74. |   4564.5 |
           +----------+

      Comment


      • #4
        Joro's excellent idea can be taken a little further, as in


        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . egen mean = mean(price), by(rep)
        
        . egen tag = tag(rep), missing
        
        . gsort -mean
        
        . format mean %2.1f 
        
        . list rep78 mean if tag, sep(0) noobs 
        
          +----------------+
          | rep78     mean |
          |----------------|
          |     .   6430.4 |
          |     3   6429.2 |
          |     4   6071.5 |
          |     2   5967.6 |
          |     5   5913.0 |
          |     1   4564.5 |
          +----------------+
        and naturally you can omit the missing category if you wish.

        I see now that I seized on part of #1 only, but there is more technique at https://www.stata.com/support/faqs/d...e-frequencies/

        Comment


        • #5
          Dear Nick and Joro,

          thank you so much for taking the time, I very much appreciate your precise suggestions.

          Great help, thanks.

          Best wishes and stay healthy!

          Nico





          Comment

          Working...
          X