Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cross frequencies of dummy

    Dear All,
    I'm looking for a solution for this issue:
    I have a dataset with a set of 500 dummy variables for around 60.000 observation.
    I need to create a symmetric matrix where for each row and column I have the dummy variables, and the intersection should give me the count of how many observations have both dummy equal to 1. Such like this:
    Dummy1 Dummy2 Dummy3
    Dummy1 0 4 6
    Dummy2 4 0 2
    Dummy3 6 2 0
    Thank you for your help

  • #2
    It appears that you are trying to do some form of matrix multiplication here. Your diagonal entries are all wrong, the number of times a dummy is simultaneously equal to one with itself is precisely the count of ones for the dummy. Maybe the following helps


    Code:
    set obs 20
    set seed 1234
    forvalues i=1/5{
    gen dummy`i'= runiformint(0, 1)
     }
    
    mkmat dummy1-dummy5, matrix(D)
    mat list D
    mat define DD= D' *D
    mat list DD


    Code:
    . mat list D
    
    D[20,5]
         dummy1  dummy2  dummy3  dummy4  dummy5
     r1       1       0       0       1       1
     r2       0       1       1       1       0
     r3       1       0       1       1       1
     r4       1       1       1       0       1
     r5       0       0       0       1       0
     r6       1       1       0       0       1
     r7       1       1       0       1       1
     r8       1       0       0       0       0
     r9       0       0       1       1       0
    r10       1       1       1       0       0
    r11       1       1       1       0       0
    r12       0       0       1       0       0
    r13       0       1       0       0       0
    r14       1       1       1       0       1
    r15       0       1       0       1       1
    r16       1       0       0       0       0
    r17       0       0       0       0       0
    r18       0       1       0       1       1
    r19       0       1       0       0       1
    r20       1       1       1       0       0
    
    
    
    . mat list DD
    
    symmetric DD[5,5]
            dummy1  dummy2  dummy3  dummy4  dummy5
    dummy1      11
    dummy2       7      12
    dummy3       6       6       9
    dummy4       3       4       3       8
    dummy5       6       7       3       5       9

    Comment


    • #3
      Thank you Andrew,
      It works, the only problem is that it gives me an error for matsize. apparently, the maximum number of matrix rows is 1100, while I have 60.000

      Comment


      • #4
        Use Mata for the computation and return the result to Stata. With 500 variables, the matrix that you need will be of dimension 500 \(\times\) 500 which is quite manageable. My computer takes less than 30 seconds to complete the following


        Code:
        clear
        set obs 60000
        set seed 1234
        forvalues i=1/500{
        gen dummy`i'= runiformint(0, 1)
        }
        
        mata
        D= st_data(., .)
        DD= D' *D
        st_matrix("DD", DD)
        end
        
        *Best to save the matrix as a dataset
        gen row= _n
        svmat DD

        Comment

        Working...
        X