Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Summary Statistics and Correlations across time and then across firms

    Hi all,

    My panel data has N = 745 firms, T = 226 weeks but it is unbalanced. I am following Da, Engelberg and Dao (2011), In Search of Attention, The Journal of Finance.

    I want to compute the correlation among several variables as follows: "correlations are first computed in the time series for each stock with a minimum of 1 year of data and then averaged across stocks".

    If I do this - by permno, sort : correlate svi log_size log_abs_ret - Stata reports 745 correlation matrices but I don't know how to average each correlation coefficient across all 745 firms and report 1 single matrix. Perhaps this requires coding. In addition, I also don't know how to use [if] so that I ignore firms with a total number of weeks smaller than 52.

    Finally, I'd also like to apply that method to several summary statistics (e.g. Q1, Q3, skewness and kurtosis). xtsummarize has the between and within data but this does not give me what I need.

    Thank you,
    Joao

  • #2
    -help statsby-

    In Stata, saving results as matrices tends not to be very helpful unless you are planning to do some matrix algebra with the results. The situation you describe sounds like it is better handled by saving the results in a data set that has one variable that identifies the firm, and the other variables are the various correlations you want to calculate. -statsby- will get you there. Since you want to exclude firms with fewer than 52 observations, you should include the number of observations as another variable you ask -statsby- to put in the results data set. Then you can just -summ if n_obs >= 52- (where I'm assuming that you have called the variable with number of observations n_obs).

    You can use this same approach for your summary statistics. To get the quartiles, skewness, and kurtosis you need to specify the -detail- option to -summarize-. All of these results are returned in r().

    Comment


    • #3
      Thank you for the quick reply.
      I've just tried using stasby and it works. However, it is not effective with correlate because r(rho) from correlate only stores results for the 1st and 2nd variables. Therefore, I can only use statsby with 2 variables at a time. However, in total I need 70 correlation coefficients, and so this is only feasible if I code Stata to do all this automatically. Do you agree? Or do you see a better solution?

      Comment


      • #4
        So instead of -statsby: corr...- you write a wrapper program that loops over the correlations and run that under -statsby-. Just to give you a sense of how it might look, it would go something like this:

        Code:
        capture program drop multicorr
        program define multicorr, rclass
             syntax varlist [if]
             marksample touse
             local nvars: word count `varlist'
             forvalues i = 1/`nvars' {
                forvalues j = `=`i'+1'/`nvars' {
                    local v1: word `i' of `varlist'
                    local v2: word `j' of `varlist'
                    corr `v1' `v2' if `touse'
                    return scalar N`i'_`j' = r(N)
                    return scalar r`i'_`j' = r(rho)
                }
            }
            exit
        end
        
        local vbles svi log_size log_abs_ret // etc.
        local nvars: word count `vbles'
        
        local exp_list
        forvalues i = 1/`nvars' {
            forvalues j = `=`i'+1'/`nvars' {
                local exp_list `exp_list' N`i'_`j' = r(N`i'_`j')
                local exp_list `exp_list' r`i'_`j' = r(r`i'_`j')  
            }
        }
        
        statsby `exp_list' , saving(results, replace) by(firmid):  multicorr `vbles'
        Note: This is just an illustration of how it might look. I have not checked this code carefully for syntax errors or typos, but it should point you in the right direction for automating the handling of a large number of variables.

        Comment


        • #5
          Thank you, I will work based on that code.

          Comment

          Working...
          X