Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapsing to get column percentages

    I always get column vs. row percentages backwards, but I am trying to calculate row percentages (I think) and when I am collapsing (see code below) it gives me column percentages (I think).

    For example, with this code:
    Code:
    collapse (percent) perc = id2 (mean) white black hispanic male group1 group2 group3 (count) id2 if !missing(attend), by(attend)
    I get this as my output:
    attend perc white black hispanic male group1 group2 group3
    0 92.98 .13 .34 .457 .501 .38 .316 .297
    1 7.016 .081 .44 .402 .50 .190 .304 .50
    So this reads as: "of those who attended, 8% were white, 44% were black, and 40% were hispanic".

    But what I really want is this (which I just made up):
    attend perc white black hispanic male group1 group2 group3
    0 92 95 90 94 50 88 90 96
    1 8 5 10 6 50 12 10 4

    So this reads as "5% of white individuals attended". Is there a way to achieve this with a modified version of the collapse code that I used above? Example data here below.


    (Note: the numbers in the top table may be different than this example data, since this is a very large dataset with roughly 1M observations and we just have a very small slice of data below).

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(attend black hispanic white male group1 group2 group3 id2)
    1 1 0 0 0 0 0 1  3498678
    0 1 0 0 1 0 1 0  498298
    0 1 0 0 0 0 1 0  598558
    1 1 0 0 1 0 0 1  498568
    0 0 0 0 0 0 0 1  908788
    0 1 0 0 1 0 0 1  928798
    1 1 0 0 0 0 1 0  498948
    1 1 0 0 0 0 0 1  499108
    0 1 0 0 1 0 0 1 1007078
    0 1 0 0 0 0 0 1 1007148
    0 1 0 0 1 0 1 0  999988
    0 0 0 1 0 0 0 1  999398
    0 1 0 0 0 0 1 0 1007778
    0 1 0 0 0 0 1 0 1007878
    1 1 0 0 1 0 1 0 1010178
    0 1 0 0 0 0 0 1 1000248
    0 0 0 1 0 0 1 0  999758
    0 0 1 0 0 0 1 0 1008358
    0 1 0 0 0 0 1 0 1000668
    0 1 0 0 0 0 0 1 1008418
    0 0 0 1 0 0 0 1 1010718
    0 0 0 1 1 0 1 0 1010928
    0 0 0 1 0 0 1 0 1001658
    0 0 1 0 0 0 0 1 1011018
    0 1 0 0 1 0 1 0 1003168
    0 1 0 0 0 0 0 1 1008968
    1 1 0 0 0 0 0 1 103528
    0 1 0 0 1 0 0 1 12088
    0 1 0 0 0 0 1 0 111398
    1 1 0 0 1 0 1 0 1017768
    0 1 0 0 1 0 1 0 1011468
    0 1 0 0 0 0 1 0 105538
    0 0 0 0 0 0 0 1 118098
    0 1 0 0 0 0 0 1 1018208
    1 0 0 1 1 0 0 1 1005868
    0 0 0 1 1 0 0 1 1013788
    0 1 0 0 0 0 0 1 1022618
    1 0 0 0 1 0 1 0 1013928
    0 0 1 0 0 0 1 0 1005888
    0 0 0 0 0 0 1 0 1002808
    0 0 0 0 1 0 1 0 1014548
    0 0 0 1 1 0 1 0 1014558
    1 0 1 0 1 0 0 1 1014588
    0 0 0 0 1 0 0 1 1014638
    0 0 1 0 0 0 1 0 1006078
    0 0 1 0 0 0 0 1 1018898
    0 0 0 1 0 0 0 1 1015458
    0 0 1 0 1 0 1 0 1070248
    1 1 0 0 1 0 1 0 1019198
    0 0 1 0 1 0 1 0 1019288
    0 0 1 0 0 0 1 0 1019778
    0 0 1 0 0 0 0 1 1019888
    0 0 0 1 1 0 1 0 1015938
    0 1 0 0 1 0 0 1 1020278
    0 0 1 0 1 0 1 0 1016028
    1 0 1 0 1 0 1 0 1016238
    0 0 1 0 0 0 0 1 1045148
    0 0 1 0 0 0 0 1 1105568
    0 0 1 0 1 0 1 0 1105578
    1 0 1 0 0 0 1 0 1074808
    1 1 0 0 0 0 0 1 1048028
    0 1 0 0 0 0 0 1 1033758
    0 0 1 0 0 0 1 0 1037358
    1 1 0 0 0 0 0 1 1098818
    0 0 0 1 1 0 0 1 1045328
    0 1 0 0 1 0 0 1 1105588
    0 1 0 0 0 0 0 1 1035038
    1 0 0 0 0 0 1 0 1030148
    0 0 0 1 1 0 1 0 1076538
    1 0 0 0 1 0 0 1 1030238
    0 0 1 0 0 0 1 0 1031428
    0 0 1 0 1 0 1 0 1035228
    0 0 1 0 1 0 1 0 1098248
    0 0 1 0 1 0 1 0 1035438
    0 0 0 0 0 0 0 1 1086638
    0 1 0 0 0 0 0 1 1064348
    0 1 0 0 0 0 1 0 1053608
    0 1 0 0 0 0 0 1 1078298
    0 1 0 0 0 0 0 1 1048478
    0 1 0 0 0 0 1 0 1051638
    0 1 0 0 0 0 0 1 1062848
    0 0 1 0 1 0 1 0 1101448
    0 0 1 0 0 0 1 0 1027228
    0 1 0 0 0 0 1 0 1066538
    0 1 0 0 0 0 0 1 1025848
    0 1 0 0 0 0 0 1 1024628
    0 1 0 0 0 0 1 0 1026928
    0 1 0 0 1 0 0 1 1026258
    0 1 0 0 1 0 1 0 1025888
    0 0 1 0 1 0 1 0 1024668
    0 1 0 0 0 0 1 0 1076174
    0 1 0 0 1 0 1 0 1040404
    0 0 1 0 1 0 0 1 1017804
    0 0 1 0 1 0 1 0 1040485
    0 0 1 0 1 0 1 0 1061646
    0 0 0 1 0 0 1 0 1089547
    0 0 1 0 0 0 0 1 2052265
    0 0 1 0 1 0 0 1 2034166
    0 0 1 0 1 0 0 1 1054195
    0 0 0 0 0 0 0 1 1048755
    end

  • #2
    why collapse?

    Code:
    tab2 attend white , co nofreq

    Comment


    • #3
      With your example data, I did this:


      Code:
      collapse (sum) black hispanic white male group1 group2 group3, by(attend)
      
      rename (black-group3) (freq=)
      
      reshape long freq, i(attend) j(which) string 
      
      tab which attend [fw=freq], row 
      
      +----------------+
      | Key            |
      |----------------|
      |   frequency    |
      | row percentage |
      +----------------+
      
                 |        attend
           which |         0          1 |     Total
      -----------+----------------------+----------
           black |        37         10 |        47 
                 |     78.72      21.28 |    100.00 
      -----------+----------------------+----------
          group2 |        44          8 |        52 
                 |     84.62      15.38 |    100.00 
      -----------+----------------------+----------
          group3 |        39          9 |        48 
                 |     81.25      18.75 |    100.00 
      -----------+----------------------+----------
        hispanic |        27          3 |        30 
                 |     90.00      10.00 |    100.00 
      -----------+----------------------+----------
            male |        34          9 |        43 
                 |     79.07      20.93 |    100.00 
      -----------+----------------------+----------
           white |        12          1 |        13 
                 |     92.31       7.69 |    100.00 
      -----------+----------------------+----------
           Total |       193         40 |       233 
                 |     82.83      17.17 |    100.00
      .

      Comment


      • #4
        Sorry, I left out an important aspect that is relevant to both of your responses: I am doing this for 3 different variables (attended, commented, replied), and so the reason I collapse is that I essentially create a new data frame. I then save each of the 3 data frames and append them to each other to create one final data frame, which can be exported as a table.

        I could, of course, use -tab- and then just copy and paste each of the 3 resulting tables (or use estout or something similar). Nick Cox, your solution seems to be somewhere in the middle, and I will continue to pursue it. Thanks both.

        Comment


        • #5
          I think the collapse gets you the correct mean when X = 1. Then you can compute notX as 1-mean(X when X = 1). The funky number on notX is due to the fact that not all observations are either X or notX.

          Comment


          • #6
            The collapse isn't incorrect, it's just calculating what is effectively the row percentage instead of the column percentage.

            Comment


            • #7
              Nick Cox two brief formatting option questions: 1) is it possible to control the order that the variables are shown in the table with the final -tab-? 2) is it possible to flip the order of the columns? To have 1 be the first column shown and 0 be the second column?

              Comment


              • #8
                You need new variables, such as 1 - attend and a re-ordering of your row variable.

                In each case define value labels for your new order.

                Suppose you had a string variable old with values apple banana orange and you prefer orange to go first.

                Code:
                label def new 1 "orange" 2 "apple" 3 "banana"
                
                encode old, gen(new) label(new)
                For more elaborate re-orderings, see https://journals.sagepub.com/doi/pdf...6867X211045582
                Last edited by Nick Cox; 24 Jan 2023, 14:55.

                Comment

                Working...
                X