Export correlations by group (row by row and with a label for each group)

Anna Star

Join Date: May 2018

Posts: 12
#1

Export correlations by group (row by row and with a label for each group)

19 Feb 2020, 17:07

Hello,

I need to run correlations between variables - for simplicity let's say only between two variables X and Y - by company (I have around 200 companies).
I would like to export just the correlation the two variables (without the 1s) in a separate row for each company, and include the number of obs used and the company name.
Also, if there are no observations, I would like to keep the row with the label so that I know which companies don't have enough obs.

I've tried this, among other things, but cannot make it vertical (row by row) and I cannot add firm names, and for the missing ones I just get an error and the program stops.

foreach i in 111 112 122 124 {
eststo: estpost corr X Y if firm==`i'
}
esttab using "Table2_Corr15.csv", wide not nonumber nonotes b(%3.2f) nolines obslast label compress

Thanks a lot!
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30091

19 Feb 2020, 18:22

One way to do this is

Code:

capture program drop one_firm
program define one_firm
    corr X Y
    gen rho = r(rho)
    gen n_obs = r(N)
    keep firm rho n_obs
    keep in 1
    exit
end

runby one_firm, by(firm)

format rho %3.2f
export delimited using "Table2_Corr15.csv", replace datafmt

-runby- is written by Robert Picard and me, and is available from SSC.

Comment

Anna Star

Join Date: May 2018

Posts: 12
#3

19 Feb 2020, 18:37

That works great! Thanks so much!!
One last thing, how can I add the significance, either as a p-value in a separate column or with stars?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#4

19 Feb 2020, 19:16

Code:

capture program drop one_firm program define one_firm tempname M pwcorr X Y, sig gen rho = r(rho) gen n_obs = r(N) matrix `M' = r(sig) gen p_value = `M'[2, 1] keep firm rho n_obs p_value keep in 1 exit end runby one_firm, by(firm) format rho %3.2f format p_value %05.3f export delimited using "Table2_Corr15.csv", replace datafmt

This will give you the p-values. I am among those people who has little use for p-values and almost none for the concept of statistical significance. I regard "significance stars" to be such an egregious form of statistical malpractice that I will not show anybody how to do them.

The American Statistical Association has recommended that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.
1 like
Comment
Anna Star

Join Date: May 2018

Posts: 12
#5

19 Feb 2020, 19:35

I see, point well taken. Thank you!
In case I am asked to provide them anyway, I tried the code but the runby command output says that all by-groups have errors. I was able to run them with the earlier code though.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#6

19 Feb 2020, 19:41

I tested the code using the -auto.dta- data set that comes with Stata, just rename some variables. It ran with no errors showing.

Code:

. sysuse auto, clear (1978 Automobile Data) . rename price X . rename mpg Y . tostring rep78, gen(firm) firm generated as str1 . . capture program drop one_firm . program define one_firm 1. tempname M 2. pwcorr X Y, sig 3. gen rho = r(rho) 4. gen n_obs = r(N) 5. matrix `M' = r(sig) 6. gen p_value = `M'[2, 1] 7. keep firm rho n_obs p_value 8. keep in 1 9. exit 10. end . . runby one_firm, by(firm) -------------------------------------- Number of by-groups = 6 by-groups with errors = 0 by-groups with no data = 0 Observations processed = 74 Observations saved = 6 --------------------------------------

So there is something different about your data, or something went wrong in your copying the code. Please post back with a data example that produces errors and the exact code you ran so I can try to troubleshoot. Use -dataex- to show the example, and place the code between code delimiters. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Anna Star

Join Date: May 2018

Posts: 12
#7

19 Feb 2020, 20:07

Thank you for the prompt responses.
I was able to figure it out!
Much appreciated!

Last edited by Anna Star; 19 Feb 2020, 20:25.
Comment

Announcement

Export correlations by group (row by row and with a label for each group)

Comment

Comment

Comment

Comment

Comment

Comment