Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation of x and y, for all combinations of categorical variables A and B

    I would like to calculate the Pearson (or other) correlation between two variables, for every combination of two categorical variables A and B. For example, the correlation between education and income, for every combination of decade of birth and seven categories of religious affiliation. If the sample included respondents born in 7 different decades, then there would be 49 correlations -- one for each cell defined by the intersection of A & B.

    I know I can loop through each combination and format the saved estimates, but I am wondering if there is an already-constructed option, ado or utility that would generate an A x B table, with the corresponding pairwise correlation (always Rxy) in each cell. Berkeley's SDA analysis tool does precisely this (called "Comparison of Correlations" and is analogous to the old SPSS crossbreak procedure for means). If someone has already created such a utility, I hope someone can point me in the right direction.

  • #2
    Here are two methods. statsby is classical and remains underused. The dataset is in effect collapsed to one of results. rangestat is from SSC. New variables are added to an existing dataset.


    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . statsby, by(foreign rep78) : corr mpg weight
    (running correlate on estimation sample)
    
          Command: correlate mpg weight
                N: r(N)
              rho: r(rho)
               By: foreign rep78
    
    Statsby groups
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
    ........
    
    . tabdisp rep78 foreign, c(N rho)
    
    --------------------------------
    Repair    |
    record    |      Car origin     
    1978      |  Domestic    Foreign
    ----------+---------------------
            1 |         2           
              |        -1           
              | 
            2 |         8           
              | -.9548609           
              | 
            3 |        27          3
              | -.8002311   -.976221
              | 
            4 |         9          9
              | -.8794044  -.6801057
              | 
            5 |         2          9
              |        -1  -.8518452
    --------------------------------
    
    . sysuse auto, clear
    (1978 automobile data)
    
    . egen group = group(foreign rep78)
    (5 missing values generated)
    
    . rangestat (corr) mpg weight, int(group 0 0)
    
    . tabdisp rep78 foreign, c(corr_nobs corr_x)
    
    ----------------------------------
    Repair    |
    record    |       Car origin      
    1978      |   Domestic     Foreign
    ----------+-----------------------
            1 |          2            
              |         -1            
              | 
            2 |          8            
              | -.95486089            
              | 
            3 |         27           3
              |  -.8002311  -.97622104
              | 
            4 |          9           9
              | -.87940444  -.68010571
              | 
            5 |          2           9
              |         -1   -.8518452
              | 
            . |                       
              |                       
    ----------------------------------
    Code:
    
    

    Comment


    • #3
      Perfect! That's exactly what I was looking for.
      Thanks Nick!

      Comment

      Working...
      X