Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any way to save row percentages output as a matrix?

    Hello Statalists,
    I am wondering if there is any way to save -tabulate var1 var2, nofreq row- output as a matrix?
    In this case, the cell values in the matrix are the row percentages instead of frequency.
    And I would like to have the figures with two digits following the decimal point and no total values included in the matrix.

    For example,

    . tab var1 var2, row nofreq

    var2
    var1 1 2 3 4 5 Total

    1 48.83 13.93 5.30 6.66 25.28 100.00
    2 42.02 16.26 5.83 9.51 26.38 100.00
    3 37.57 14.02 8.09 10.40 29.91 100.00
    4 41.48 15.31 5.56 11.36 26.30 100.00
    5 32.68 15.39 6.75 13.56 31.63 100.00

    Total 39.08 14.93 6.39 10.95 28.66 100.00


    The matrix I am expecting:

    . mat list mat1

    mat1[5,5]
    var11 var12 var13 var14 var15
    var21 .49 .14 .05 .07 .25
    var22 .42 .16 .06 .10 .26
    var23 .39 .14 .08 .10 .30
    var24 .41 .15 .06 .11 .26
    var25 .33 .15 .07 .14 .32



    I have been working on this issue (and also searching for solution) for days and cannot figure out.
    This is my first time to post. If I posted in the wrong place or got the format wrong, please let me know, I would like to edit it.
    Thank you very much in advance.

  • #2
    Perhaps someone else on Statalist knows of a direct way of doing what you need and can give you a better answer than this one.

    My reading of help tabulate is that the best you can do is use the matcell option to save the frequencies into a matrix, and then use matrix operators to calculate the row percentages, as the example below does for a 3x3 matrix of counts.

    Code:
    . matrix list freq
    
    freq[3,3]
        c1  c2  c3
    r1  10  20  30
    r2   5   5   5
    r3   9   4   1
    
    . matrix input one = (1\1\1)
    
    . matrix rowtot = freq*one
    
    . matrix list rowtot
    
    rowtot[3,1]
        c1
    r1  60
    r2  15
    r3  14
    
    . matrix rowpct = inv(diag(rowtot))*freq
    
    . matrix list rowpct
    
    rowpct[3,3]
               c1         c2         c3
    r1  .16666667  .33333333         .5
    r2  .33333333  .33333333  .33333333
    r3  .64285714  .28571429  .07142857
    ​

    Comment


    • #3
      Hi William,

      It's working! Problem solved. I was thinking about creating a rowtotal variable or matrix but could not sort out the later part.
      Thank you very much.

      Comment


      • #4
        Here's another way to do it, cosmetic decoration aside.

        Code:
         
        . sysuse auto, clear 
        (1978 Automobile Data)
        
        . tab for rep78, matcell(freq)
        
                   |                   Repair Record 1978
          Car type |         1          2          3          4          5 |     Total
        -----------+-------------------------------------------------------+----------
          Domestic |         2          8         27          9          2 |        48 
           Foreign |         0          0          3          9          9 |        21 
        -----------+-------------------------------------------------------+----------
             Total |         2          8         30         18         11 |        69 
        
        
        . mata: st_matrix("freq", (st_matrix("freq")  :/ rowsum(st_matrix("freq"))))
        
        . mat li freq , format("%3.2f")
        
        freq[2,5]
              c1    c2    c3    c4    c5
        r1  0.04  0.17  0.56  0.19  0.04
        r2  0.00  0.00  0.14  0.43  0.43

        Comment


        • #5
          Hi Nick,
          That's a very neat solution. I have tried and it's working perfectly.

          Thank you very much indeed.

          Comment


          • #6
            Nick's solution has one little problem that may or may not affect you, Min Zhang.
            If your freq is zero because of some condition, the tabulate command will omit it from output, and having a saved matrix as in above you would not be able to tell from the resulting matrix, which one (or multiple) were omitted.

            Compare Nick's output with:
            Code:
            . tab for rep78, matcell(freq), if price>6000
            
                       |             Repair Record 1978
              Car type |         2          3          4          5 |     Total
            -----------+--------------------------------------------+----------
              Domestic |         2          8          3          0 |        13
               Foreign |         0          1          5          2 |         8
            -----------+--------------------------------------------+----------
                 Total |         2          9          8          2 |        21
            
            
            . mat l freq
            
            freq[2,4]
                c1  c2  c3  c4
            r1   2   8   3   0
            r2   0   1   5   2
            Depending on whether you are doing a single tabulation or multiple, this may or may not be a problem. But if you ever decide to "stitch" those multiple matrices together, you need to make sure they are not only of same dimensions, but also matching categories. See also help for mat_rapp and mat_capp.

            Best, Sergiy Radyakin

            Comment


            • #7
              Sergiy is naturally right. I was answering the question asked, which was about matrices. But a different answer would point out that you might be much better off with a reduced dataset of counts obtained e.g. contract

              Comment


              • #8
                So just to add to this thread,

                If i want to use
                Code:
                putdocx
                with Nick Cox code but want both frequencies and persent in same matrix. How would i go about that?

                I have done:
                Code:
                .
                sysuse auto, clear
                
                tab for rep78, matcell(freq)
                
                mata: st_matrix("freq", (st_matrix("freq") :/ rowsum(st_matrix("freq"))*100))
                
                mat li freq
                
                putdocx begin
                
                putdocx table sum = matrix("freq"), rownames colnames
                
                putdocx table sum(1,1) = ("car type")
                
                putdocx table sum(1,2) = ("DOM")
                
                putdocx table sum(1,3) = ("FOR")
                
                putdocx table sum(2,1) = ("1")
                
                putdocx table sum(3,1) = ("2")
                
                putdocx save test.docx, replace
                But I would like to have a row between DOM and FOR with the frequencies for DOM and below FOR with frequencies for FOR in total numbers.

                How is that done?

                Thank you,

                Lars

                Comment

                Working...
                X