Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Export correlations by group (row by row and with a label for each group)

    Hello,

    I need to run correlations between variables - for simplicity let's say only between two variables X and Y - by company (I have around 200 companies).
    I would like to export just the correlation the two variables (without the 1s) in a separate row for each company, and include the number of obs used and the company name.
    Also, if there are no observations, I would like to keep the row with the label so that I know which companies don't have enough obs.

    I've tried this, among other things, but cannot make it vertical (row by row) and I cannot add firm names, and for the missing ones I just get an error and the program stops.


    foreach i in 111 112 122 124 {
    eststo: estpost corr X Y if firm==`i'
    }
    esttab using "Table2_Corr15.csv", wide not nonumber nonotes b(%3.2f) nolines obslast label compress


    Thanks a lot!



  • #2
    One way to do this is
    Code:
    capture program drop one_firm
    program define one_firm
        corr X Y
        gen rho = r(rho)
        gen n_obs = r(N)
        keep firm rho n_obs
        keep in 1
        exit
    end
    
    runby one_firm, by(firm)
    
    format rho %3.2f
    export delimited using "Table2_Corr15.csv", replace datafmt
    -runby- is written by Robert Picard and me, and is available from SSC.

    Comment


    • #3
      That works great! Thanks so much!!
      One last thing, how can I add the significance, either as a p-value in a separate column or with stars?

      Comment


      • #4
        Code:
        capture program drop one_firm
        program define one_firm
            tempname M
            pwcorr X Y, sig
            gen rho = r(rho)
            gen n_obs = r(N)
            matrix `M' = r(sig)
            gen p_value = `M'[2, 1]
            keep firm rho n_obs p_value
            keep in 1
            exit
        end
        
        runby one_firm, by(firm)
        
        format rho %3.2f
        format p_value %05.3f
        export delimited using "Table2_Corr15.csv", replace datafmt
        This will give you the p-values. I am among those people who has little use for p-values and almost none for the concept of statistical significance. I regard "significance stars" to be such an egregious form of statistical malpractice that I will not show anybody how to do them.

        The American Statistical Association has recommended that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.


        Comment


        • #5
          I see, point well taken. Thank you!
          In case I am asked to provide them anyway, I tried the code but the runby command output says that all by-groups have errors. I was able to run them with the earlier code though.

          Comment


          • #6
            I tested the code using the -auto.dta- data set that comes with Stata, just rename some variables. It ran with no errors showing.

            Code:
            .  sysuse auto, clear
            (1978 Automobile Data)
            
            .  rename price X
            
            .  rename mpg Y
            
            .  tostring rep78, gen(firm)
            firm generated as str1
            
            . 
            .  capture program drop one_firm
            
            . program define one_firm
              1.     tempname M
              2.     pwcorr X Y, sig
              3.     gen rho = r(rho)
              4.     gen n_obs = r(N)
              5.     matrix `M' = r(sig)
              6.     gen p_value = `M'[2, 1]
              7.     keep firm rho n_obs p_value
              8.     keep in 1
              9.     exit
             10. end
            
            .
            . runby one_firm, by(firm)
            
            --------------------------------------
            Number of by-groups    =             6
            by-groups with errors  =             0
            by-groups with no data =             0
            Observations processed =            74
            Observations saved     =             6
            --------------------------------------
            So there is something different about your data, or something went wrong in your copying the code. Please post back with a data example that produces errors and the exact code you ran so I can try to troubleshoot. Use -dataex- to show the example, and place the code between code delimiters. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            Comment


            • #7
              Thank you for the prompt responses.
              I was able to figure it out!
              Much appreciated!
              Last edited by Anna Star; 19 Feb 2020, 20:25.

              Comment

              Working...
              X