Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • table of correlations between same two variables, but on different subsamples

    My goal is to have Stata display (ideally, export to Latex) an M-by-N "correlation table", where however each cell is not the correlation between variables m and n, but always between the same two variables x and y where x is in categories 1...M and y is in categories 1...N. (The cells should contain the correlation coefficient and t- or p-value/significance stars.)

    In the below example, I would want to investigate how the correlation between price and mpg depends varies in longer vs. shorter, and foreign vs. domestic cars. The categories M and N are not categories of x and y, but of other variables, as below:
    Code:
    set more off
    clear*
    sysuse auto
    
    gen large = 0
    replace large = 1 if length > 190
    
    sort large foreign
    by large foreign: pwcorr price mpg
    Effectively, I need just the results from
    Code:
    by large foreign: pwcorr price mpg
    but displayed in table form and exported.
    I followed the suggestion here: https://www.statalist.org/forums/for...ed-as-a-matrix
    Code:
    clear*
    sysuse auto
    
    capture program drop mycorr
    program define mycorr
         corr price weight
         gen rho = r(rho)
         keep rep78 rho
         keep in 1
         exit
    end
    
    runby mycorr, by(rep78)
    
    list, noobs clean
    and then outputting to excel. But this is for categories along one dimension, and I cannot figure out how this would be extended to two dimensions.

  • #2
    You can put more than one variable in the -by()- option of -runby-. Read -help runby-. You will see that illustrated in the example "partitioning a file into subfiles."
    Last edited by Clyde Schechter; 04 Sep 2018, 13:01.

    Comment


    • #3
      Thank you. Yes, this works:
      Code:
      set more off
      clear*
      sysuse auto
      
      gen large = 0
      replace large = 1 if length > 190
      
      capture program drop mycorr
      program define mycorr
          corr price mpg
          gen rho = r(rho)
          keep large foreign rho
          keep in 1
          exit
      end
      
      runby mycorr, by(large foreign)
      
      list, noobs clean
      I am now wondering if there is a way to use e.g. a modified -corrtex- command to have a 2x2 table instead of a 4x1 column list of coefficients?
      I see that the "partitioning a file into subfiles" section explains how to save two 2x1-columns that I would then manually combine, but that seems unpractical, especially as the number of categories increase.

      Comment


      • #4
        My reference to that example in -help runby- was just to illustrate that you can use more than one variable in the -by()- option.

        I don't know of any -corrtext- command. There's no such command in official Stata. If it's community contributed, you could modify it if you know how to do that. But you can get your correlations re-arranged without that as follows:

        Code:
        set more off
        clear*
        sysuse auto
        
        gen large = 0
        replace large = 1 if length > 190
        
        capture program drop mycorr
        program define mycorr
            corr price mpg
            gen rho = r(rho)
            keep large foreign rho
            keep in 1
            exit
        end
        
        runby mycorr, by(large foreign)
        
        list, noobs clean
        
        reshape wide rho, i(foreign) j(large)
        forvalues i = 0/1 {
            rename rho`i' large`i'
        }
        If your goal here is to create a cross-tabulation for human eyes, this will do it. You can export it to a text file or a spreadsheet if you like (-help export delimited-, -help export excel-).

        If, however, you plan to do additional analysis on these results, it is probably better to leave them the way they were at the end of -runby-, because most Stata commands work better (or only) with data in long layout.

        Comment


        • #5
          Thank you, that works. Can you also identify how to print the significance of the correlation coefficient, similar to -rho- and -N-?
          I can replace -corr- with -pwcorr- in your code, which has the -sig- option, but that seems not to be stored under -r-:
          Code:
          . pwcorr ind_HHI $fam , sig
          
                       |  ind_HHI   ownfam
          -------------+------------------
               ind_HHI |   1.0000
                       |
                       |
                ownfam |   0.0475   1.0000
                       |   0.0000
                       |
          
          . return list
          
          scalars:
                            r(N) =  20595
                          r(rho) =  .0474957089324448

          Comment


          • #6
            Are you using an old version of Stata? In my setup, -pwcorr- also returns two matrices, r(C) and r(sig), the latter containing the p-value.

            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . pwcorr price mpg, sig
            
                         |    price      mpg
            -------------+------------------
                   price |   1.0000 
                         |
                         |
                     mpg |  -0.4686   1.0000 
                         |   0.0000
                         |
            
            . return list
            
            scalars:
                              r(N) =  74
                            r(rho) =  -.4685966881951871
            
            matrices:
                              r(C) :  2 x 2
                            r(sig) :  2 x 2
            
            . matrix list r(sig)
            
            symmetric r(sig)[2,2]
                       price        mpg
            price          0
              mpg  .00002546
            So, you can do this:

            Code:
            set more off
            clear*
            sysuse auto
            
            gen large = 0
            replace large = 1 if length > 190
            
            capture program drop mycorr
            program define mycorr
                pwcorr price mpg, sig
                return list
                 matrix S = r(sig)
                gen rho = r(rho)
                keep large foreign rho
                gen pvalue = S[2, 1]
                keep in 1
                exit
            end
            
            runby mycorr, by(large foreign)
            
            list, noobs clean
            
            reshape wide rho pvalue, i(foreign) j(large)
            forvalues i = 0/1 {
                rename rho`i' large`i'_rho
                rename pvalue`i' large`i'_p
            }

            Comment

            Working...
            X