Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • myaxis available from SSC: reorder categorical variables, especially for later table or graph use

    Thanks to Kit Baum as always, a new command called myaxis is now downloadable from SSC. Stata 8.2 is required.

    I will subvert the usual order and give examples first and then the overall story. Naturally you can bail out whenever you wish, and I lost some people already at the title.

    rep78 in the auto data is an ordered (ordinal, grade) variable but for my purposes I will pretend that it isn't. myaxis maps such a variable to a new variable according to some sort criterion. Here we will just sort on counts (frequencies), largest first. tabulate already has a handle to do that, but this is just to get us started.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . myaxis wanted=rep78, sort(count) descending
    
    . tab wanted
    
         Repair |
    Record 1978 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              3 |         30       43.48       43.48
              4 |         18       26.09       69.57
              5 |         11       15.94       85.51
              2 |          8       11.59       97.10
              1 |          2        2.90      100.00
    ------------+-----------------------------------
          Total |         69      100.00
    
    . tab wanted, nola
    
         Repair |
    Record 1978 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         30       43.48       43.48
              2 |         18       26.09       69.57
              3 |         11       15.94       85.51
              4 |          8       11.59       97.10
              5 |          2        2.90      100.00
    ------------+-----------------------------------
          Total |         69      100.00
    Let's suppose instead that we wanted to sort on mean mpg for each category.

    .
    Code:
    . myaxis wanted2=rep78, sort(mean mpg) descending
    
    . tab wanted2, su(mpg)
    
         Repair |      Summary of Mileage (mpg)
    Record 1978 |        Mean   Std. Dev.       Freq.
    ------------+------------------------------------
              5 |   27.363636   8.7323849          11
              4 |   21.666667   4.9348699          18
              1 |          21   4.2426407           2
              3 |   19.433333   4.1413252          30
              2 |      19.125   3.7583241           8
    ------------+------------------------------------
          Total |   21.289855   5.8664085          69
    .
    As yet a further twist, we might want to sort on a subset's values, because we are looking ahead to a two-way table or graph. And yes, we should have insisted on a less barbarous display format:

    Code:
    . myaxis wanted3=rep78, sort(mean mpg) subset(foreign==1) descending
    
    . format mpg %2.1f
    
    . tab wanted3 foreign , su(mpg) nost nofreq
    
                              Means of Mileage (mpg)
    
        Repair |
        Record |      Car type
          1978 |  Domestic    Foreign |     Total
    -----------+----------------------+----------
             5 |      32.0       26.3 |      27.4
             4 |      18.4       24.9 |      21.7
             3 |      19.0       23.3 |      19.4
             1 |      21.0          . |      21.0
             2 |      19.1          . |      19.1
    -----------+----------------------+----------
         Total |      19.5       25.3 |      21.3
    .

    Note that myaxis does not fall over when there is nothing to summarize, as with foreign cars for repair record 1 and 2.

    One more. Here is another categorical variable:

    Code:
    . webuse nlsw88, clear
    (NLSW, 1988 extract)
    
    . 
    . myaxis wanted=industry, sort(median wage) descending
    
    . tabstat wage, s(median mean) by(wanted) format(%3.2f)
    
    Summary for variables: wage
         by categories of: wanted (industry)
    
              wanted |       p50      mean
    -----------------+--------------------
    Transport/Comm/U |     10.12     11.44
    Public Administr |      8.40      9.15
              Mining |      8.09     15.35
    Finance/Ins/Real |      7.05      9.84
    Professional Ser |      6.70      7.87
        Construction |      6.69      7.56
       Manufacturing |      6.19      7.50
    Business/Repair  |      5.33      7.52
    Ag/Forestry/Fish |      4.53      5.62
    Wholesale/Retail |      4.53      6.13
    Entertainment/Re |      4.23      6.72
    Personal Service |      3.89      4.40
    -----------------+--------------------
               Total |      6.28      7.78
    --------------------------------------
    All the examples are table output but FWIW my personal motivation was mostly graphical, hence the name of the command.

    There is already a graphical example at https://www.statalist.org/forums/for...using-by/page2 where a myaxis call replaces three lines of code with one. See posts #16 and #17.

    (I did toy with the idea of calling the command just axis but a predictable disadvantage of that would be that (say) search axis would reasonably turn up much other stuff. A few people may recall that an egen function axis() has been in egenmore for a while (2004), but I didn't feel obliged to pick up all its functionality and I did feel obliged to extend support in other directions.

    So, the deal here is


    myaxis maps an existing "categorical" variable, meaning usually a numeric variable with integer codes and value labels, or equivalently a string variable, to a new variable with integer values 1 up and with value labels, sorted according to a specified criterion.

    The command name
    myaxis is to be parsed "my axis". The second element "axis" arises from a leading application of the command. You have a categorical variable that would define an axis of a graph, or one dimension of a table (the rows, or the columns, say), but the existing order of categories is not ideal. Some graph and table commands offer sorting on the fly, but this command may help wherever other commands do not offer that.

    The problem is split by
    myaxis into these parts:

    1. Calculation of a numeric variable on which to sort categories.
    myaxis treats this as an application of egen. Note: If a variable already exists that defines the sort order and is constant within categories, then asking for (say) its minimum, mean, or maximum within each category will suffice.

    2. Deciding whether you want ascending order (the default) or descending order (highest value goes first). Descending order requires negation of the variable from #1.

    3. Mapping your categorical variable to integers 1 up. The
    group() function of egen does the work here, but myaxis is careful to split ties according to the original variable. (For example: suppose nominal categories A, B, C, D, E have frequencies 7, 7, 42, 3, 1 and you want them sorted by frequency. You don't want A and B lumped together because they have the same frequency.)

    4. Fixing a variable label.
    myaxis uses a new variable label if supplied; otherwise, the original variable label; and, if that does not exist, the original variable name.

    5. Fixing value labels. This is even more important than #4 for helpful display in a graph or table.
    myaxis uses the original value labels if defined and otherwise the original string or numeric values.

    All of those steps are easy in principle, but some are fiddly in practice, so
    myaxis bundles them together on your behalf.

  • #2
    Now written up in the Stata Journal and immediately accessible at https://journals.sagepub.com/doi/pdf...6867X211045582

    Comment


    • #3
      How to order weighted percentages? Using "myaxis" or "collect get"

      sysuse auto,clear

      myaxis wanted=rep78, sort(count) descending
      . tab wanted
      . tab wanted, nola

      collect get: table () (wanted) () [aweight = weight], statistic(frequency) statistic(percent) nformat(%9.1f)

      Comment


      • #4
        myaxis doesn't support weights, but if you calculate (e.g.) a weighted mean in advance you could feed that variable as the sorting criterion.

        Comment

        Working...
        X