Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate correlation within groups

    Dear Statafriends,

    Upfront apologies if this question has been asked before.

    I want to know how to calculate correlations within a variable by group. My data is set as panel data and I am interested what the correlations are for a single variable per group.
    Code:
    by sick, sort: correlate amihud
    This code does not work. Then I'll get the correlation of the variable by group on itself.

    To clarify, I would like to get this result for my Amihud-variable:
    SICCD 1311 5432 7011 etc..
    1311 [corr] [corr] [corr] [corr]
    5432 [corr] [corr] [corr] [corr]
    etc.. [corr] [corr] [corr] [corr]

    Hopefully one of you could help me on my way. Many thanks in advance!

  • #2
    sorry, but I don't understand at all; by definition, a correlation measures a linear relationship between two variables; please clarify possibly by showing a small amount of data (within CODE blocks - see the FAQ) with the "correlation" that you want

    Comment


    • #3
      Thanks for your reply Rich Goldstein. I understand that correlation measures a linear relationship between two variables. What I basically want is to calculate the correlation between Amihud when siccd=1311 and Amihud when sccd=2834. YM is obviously the time indicator. And that for all the siccd codes I have (about 30)
      Code:
      ym    siccd    amihud
      1970m1     1311    .0000999
      1970m2     1311    .0001162
      1970m3     1311    .0001302
      1970m4     1311    .0001816
      1970m5     1311    .0002902
      1970m6     1311    .0003375
      1970m7     1311    .0003942
      1970m8     1311    .0003738
      1970m9     1311    .0001559
      1970m10    1311    .0002205
      1970m11    1311    .0002894
      1970m12    1311    .0001932
      1970m1     2834    .0000301
      1970m2     2834    .0000221
      1970m3     2834    .0000269
      1970m4     2834    .0000352
      1970m5     2834    .0000535
      1970m6     2834    .0000463
      1970m7     2834    .0000553
      1970m8     2834    .0000512
      1970m9     2834    .0000447
      1970m10    2834    .00004
      1970m11    2834    .0000493
      1970m12    2834    .0000278
      I hope I stated it more understandable this time.

      Comment


      • #4
        For this purpose, and unusually, you would find it easier after reshape wide

        That said, watch out. Correlations between time series are just correlations. Any P-values are garbage because independence of observations no longer holds.

        Comment


        • #5
          Thanks Nick Cox! It's exactly what I needed. Statistical significance is no issue for the purpose of the correlations luckily but thanks to mention it.

          Comment


          • #6
            In this case you could consider using the intraclass correlation coefficient (ICC), which would be given as "rho" in the model output from running the code below.
            Code:
            version 14.0
            
            clear *
            set more off
            input str7 ym   int siccd  float  amihud
            1970m1     1311    .0000999
            1970m2     1311    .0001162
            1970m3     1311    .0001302
            1970m4     1311    .0001816
            1970m5     1311    .0002902
            1970m6     1311    .0003375
            1970m7     1311    .0003942
            1970m8     1311    .0003738
            1970m9     1311    .0001559
            1970m10    1311    .0002205
            1970m11    1311    .0002894
            1970m12    1311    .0001932
            1970m1     2834    .0000301
            1970m2     2834    .0000221
            1970m3     2834    .0000269
            1970m4     2834    .0000352
            1970m5     2834    .0000535
            1970m6     2834    .0000463
            1970m7     2834    .0000553
            1970m8     2834    .0000512
            1970m9     2834    .0000447
            1970m10    2834    .00004
            1970m11    2834    .0000493
            1970m12    2834    .0000278
            end
            
            generate int tim = monthly(ym, "YM")
            
            xtreg amihud i.tim, i(siccd) fe
            display in smcl as text "ICC = " as result e(rho)
            
            exit

            Comment


            • #7
              Apologies for my late response Joseph. Thanks for your information!

              Comment

              Working...
              X