Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting twoway table by row percent (not column frequency) of “no” responses

    I am using tabulate Stata 15.1
    I have generated a twoway table using the variables irsd_decile and australian_born

    tab irsd_decile australian_born
    /*
    SEIFA IRSD | born in Australia
    decile | no yes | Total
    -----------+----------------------+----------
    1 | 32,574 16,802 | 49,376
    2 | 7,163 8,471 | 15,634
    3 | 18,445 19,662 | 38,107
    4 | 3,163 7,888 | 11,051
    5 | 27,803 17,212 | 45,015
    6 | 21,791 18,719 | 40,510
    7 | 24,494 24,310 | 48,804
    8 | 14,295 17,095 | 31,390
    9 | 13,830 19,886 | 33,716
    10 | 3,701 8,648 | 12,349
    -----------+----------------------+----------
    Total | 167,259 158,693 | 325,952
    */

    *I can manually see, when looking at column frequency, that the top 5 "no" responses correspond to deciles 1,5,7,6
    *the following code, using a matrix, will do the sort and list deciles in order of descending frequency of "no" responses
    tab irsd_decile australian_born, matrow(R) matcell(C)
    matrix A=R,C,C[1...,1]+C[1...,2]
    mata: st_matrix("A", sort(st_matrix("A"),-2))
    matrix list A

    /*
    A[10,4]
    c1 c2 c3 c4
    r1 1 32574 16802 49376
    r2 5 27803 17212 45015
    r3 7 24494 24310 48804
    r4 6 21791 18719 40510
    r5 3 18445 19662 38107
    r6 8 14295 17095 31390
    r7 9 13830 19886 33716
    r8 2 7163 8471 15634
    r9 10 3701 8648 12349
    r10 4 3163 7888 11051
    */


    *However, I am actually wanting to sort by row percent of "no" responses, not by column frequency
    tabulate irsd_decile australian_born, row nofreq
    /*
    SEIFA IRSD | born in Australia
    decile | no yes | Total
    -----------+----------------------+----------
    1 | 65.97 34.03 | 100.00
    2 | 45.82 54.18 | 100.00
    3 | 48.40 51.60 | 100.00
    4 | 28.62 71.38 | 100.00
    5 | 61.76 38.24 | 100.00
    6 | 53.79 46.21 | 100.00
    7 | 50.19 49.81 | 100.00
    8 | 45.54 54.46 | 100.00
    9 | 41.02 58.98 | 100.00
    10 | 29.97 70.03 | 100.00
    -----------+----------------------+----------
    Total | 51.31 48.69 | 100.00
    */

    *I can manually see, when looking at row percent, that the top 5 "no" responses correspond to deciles 1,5,6,7
    *I have tried using the following code, but get a 'conformability error' message
    *While I can continue to the end of the code, the sort is by column frequency
    tab irsd_decile australian_born, row nofreq matcol(C) matrow(R)
    matrix A=R,C,C[1...,1]+C[1...,2]
    mata: st_matrix("A", sort(st_matrix("A"),-2))
    matrix list A


    Is there a way to sort by row percent without using matrices?
    If not, I would appreciate any advice on where I am going wrong with my matrix code.

    If anyone is able to help, thanks in advance,
    Angela Joe

    A small sample of data from my large data set is provided:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(irsd_decile australian_born)
     7 0
     7 0
     7 0
     7 0
     7 0
     6 0
     7 1
    10 0
    10 0
     9 1
     7 1
     3 1
     3 1
     6 1
     6 1
     6 1
     9 1
     9 1
     6 0
    10 0
    10 0
     2 1
     7 0
     7 0
     7 0
     6 1
     6 1
     6 1
     7 0
     6 1
     1 0
     1 0
     8 1
     5 0
     8 0
     8 0
     3 1
     2 1
     7 1
     2 1
     7 1
     1 1
     1 1
     6 0
     6 0
     6 0
     6 0
     6 0
     6 0
     1 0
    end
    label values australian_born aussie
    label def aussie 0 "no", modify
    label def aussie 1 "yes", modify


  • #2
    This is what myaxis from the Stata Journal can do.

    Announced here within https://www.statalist.org/forums/for...e-or-graph-use on 19 March 2021

    and written up and immediately accessible (no paywall) at https://journals.sagepub.com/doi/pdf...6867X211045582

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float irsd_decile byte australian_born float freq
     1 0 32574
     1 1 16802
     2 0  7163
     2 1  8471
     3 0 18445
     3 1 19662
     4 0  3163
     4 1  7888
     5 0 27803
     5 1 17212
     6 0 21791
     6 1 18719
     7 0 24494
     7 1 24310
     8 0 14295
     8 1 17095
     9 0 13830
     9 1 19886
    10 0  3701
    10 1  8648
    end
    
    expand freq 
    
    myaxis wanted=irsd_decile, sort(mean australian_born) 
    
    tab wanted australian_born, row 
    
    
    +----------------+
    | Key            |
    |----------------|
    |   frequency    |
    | row percentage |
    +----------------+
    
    irsd_decil |    australian_born
             e |         0          1 |     Total
    -----------+----------------------+----------
             1 |    32,574     16,802 |    49,376 
               |     65.97      34.03 |    100.00 
    -----------+----------------------+----------
             5 |    27,803     17,212 |    45,015 
               |     61.76      38.24 |    100.00 
    -----------+----------------------+----------
             6 |    21,791     18,719 |    40,510 
               |     53.79      46.21 |    100.00 
    -----------+----------------------+----------
             7 |    24,494     24,310 |    48,804 
               |     50.19      49.81 |    100.00 
    -----------+----------------------+----------
             3 |    18,445     19,662 |    38,107 
               |     48.40      51.60 |    100.00 
    -----------+----------------------+----------
             2 |     7,163      8,471 |    15,634 
               |     45.82      54.18 |    100.00 
    -----------+----------------------+----------
             8 |    14,295     17,095 |    31,390 
               |     45.54      54.46 |    100.00 
    -----------+----------------------+----------
             9 |    13,830     19,886 |    33,716 
               |     41.02      58.98 |    100.00 
    -----------+----------------------+----------
            10 |     3,701      8,648 |    12,349 
               |     29.97      70.03 |    100.00 
    -----------+----------------------+----------
             4 |     3,163      7,888 |    11,051 
               |     28.62      71.38 |    100.00 
    -----------+----------------------+----------
         Total |   167,259    158,693 |   325,952 
               |     51.31      48.69 |    100.00

    Comment


    • #3
      Thanks so much Nick, myaxis does exactly what I want - thanks for providing code to get me started. Best regards, Angela

      Comment


      • #4
        You're welcome, but here a graph seems easier to think about than either version of the table.


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float irsd_decile byte australian_born float freq
         1 0 32574
         1 1 16802
         2 0  7163
         2 1  8471
         3 0 18445
         3 1 19662
         4 0  3163
         4 1  7888
         5 0 27803
         5 1 17212
         6 0 21791
         6 1 18719
         7 0 24494
         7 1 24310
         8 0 14295
         8 1 17095
         9 0 13830
         9 1 19886
        10 0  3701
        10 1  8648
        end
        
        expand freq 
        
        statsby, by(irsd_decile) clear : ci proportions australian_born 
        
        twoway scatter mean irsd, mc(blue) || rcap ub lb irsd , lcolor(blue) scheme(s1color) xla(1/10)  ///
        yla(.30 "30" .4 "40" .5 "50" .6 "60" .7 "70", ang(h)) ytitle(% Australian born) xtitle(decile bin) ///
        subtitle(means and 95% confidence intervals) legend(off)
        Click image for larger version

Name:	angela_joe.png
Views:	2
Size:	16.2 KB
ID:	1655770
        Attached Files

        Comment


        • #5
          Great idea – much better for immediate impact than displaying a table. Thanks for introducing me to the statsby command which I’ve not used before.

          Comment


          • #6
            Watch out; your decile bins are far from equally populated.

            Comment


            • #7
              Here's a basic histogram. Such unevenness is diagnostic of a very lumpy or spiky distribution fed into a binning command. Much more discussion in the posts (and especially links) in a recent thread. https://www.statalist.org/forums/for...ach-percentile


              Click image for larger version

Name:	bad_binning.png
Views:	1
Size:	22.8 KB
ID:	1656016

              Comment


              • #8
                Dear Nick Cox, How can I do with command myaxis if I am wanting to sort by row percent of "missing" responses?

                A small sample of data based on Angela Joe is provided as below:
                Code:
                clear
                input float irsd_decile byte australian_born float freq
                 1 0 32574
                 1 1 16802
                 1 . 23
                 2 0  7163
                 2 1  8471
                 2 .  34
                 3 0 18445
                 3 1 19662
                 3 . 56
                 4 0  3163
                 4 1  7888
                 4 .  45
                 5 0 27803
                 5 1 17212
                 5 . 47
                 6 0 21791
                 6 1 18719
                 6 . 64
                 7 0 24494
                 7 1 24310
                 7 . 39
                 8 0 14295
                 8 1 17095
                 9 0 13830
                 9 1 19886
                 9 . 99
                10 0  3701
                10 1  8648
                10 .  1000
                end
                
                expand freq


                Code:
                . tab irsd_decile australian_born,row missing
                
                +----------------+
                | Key            |
                |----------------|
                |   frequency    |
                | row percentage |
                +----------------+
                
                irsd_decil |         australian_born
                         e |         0          1          . |     Total
                -----------+---------------------------------+----------
                         1 |    32,574     16,802         23 |    49,399 
                           |     65.94      34.01       0.05 |    100.00 
                -----------+---------------------------------+----------
                         2 |     7,163      8,471         34 |    15,668 
                           |     45.72      54.07       0.22 |    100.00 
                -----------+---------------------------------+----------
                         3 |    18,445     19,662         56 |    38,163 
                           |     48.33      51.52       0.15 |    100.00 
                -----------+---------------------------------+----------
                         4 |     3,163      7,888         45 |    11,096 
                           |     28.51      71.09       0.41 |    100.00 
                -----------+---------------------------------+----------
                         5 |    27,803     17,212         47 |    45,062 
                           |     61.70      38.20       0.10 |    100.00 
                -----------+---------------------------------+----------
                         6 |    21,791     18,719         64 |    40,574 
                           |     53.71      46.14       0.16 |    100.00 
                -----------+---------------------------------+----------
                         7 |    24,494     24,310         39 |    48,843 
                           |     50.15      49.77       0.08 |    100.00 
                -----------+---------------------------------+----------
                         8 |    14,295     17,095          0 |    31,390 
                           |     45.54      54.46       0.00 |    100.00 
                -----------+---------------------------------+----------
                         9 |    13,830     19,886         99 |    33,815 
                           |     40.90      58.81       0.29 |    100.00 
                -----------+---------------------------------+----------
                        10 |     3,701      8,648      1,000 |    13,349 
                           |     27.72      64.78       7.49 |    100.00 
                -----------+---------------------------------+----------
                     Total |   167,259    158,693      1,407 |   327,359 
                           |     51.09      48.48       0.43 |    100.00
                We can see that the missing percent of irsd_decile: 8 is zero, the lowest one. How to reorder the categorical variable irsd_decile according the percent of missing option? Many thanks.

                Comment


                • #9
                  Code:
                  . egen pc_missing = mean(100 * missing(australian_born)), by(irsd_decile)
                  
                  . 
                  . myaxis order=irsd_decile, sort(mean pc_missing)
                  
                  . 
                  . tabdisp order, c(pc_missing) format(%4.3f)
                  
                  ----------------------
                  irsd_deci |
                  le        | pc_missing
                  ----------+-----------
                          8 |      0.000
                          1 |      0.047
                          7 |      0.080
                          5 |      0.104
                          3 |      0.147
                          6 |      0.158
                          2 |      0.217
                          9 |      0.293
                          4 |      0.406
                         10 |      7.491
                  ----------------------

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Code:
                    . egen pc_missing = mean(100 * missing(australian_born)), by(irsd_decile)
                    
                    .
                    . myaxis order=irsd_decile, sort(mean pc_missing)
                    
                    .
                    . tabdisp order, c(pc_missing) format(%4.3f)
                    
                    ----------------------
                    irsd_deci |
                    le | pc_missing
                    ----------+-----------
                    8 | 0.000
                    1 | 0.047
                    7 | 0.080
                    5 | 0.104
                    3 | 0.147
                    6 | 0.158
                    2 | 0.217
                    9 | 0.293
                    4 | 0.406
                    10 | 7.491
                    ----------------------
                    Dear Nick, your codes solved my question perfectly. -myaxis- is a very useful command. Thanks your command and your code examples for my question.

                    Comment

                    Working...
                    X