Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation with Panel Data

    Hi, I'm new in using Stata and working with Panel Data. I want to report a descriptive statistic table in which I need to enter the correlation between variables. I'm not sure how we can estimate correlation between two variables using a panel data set. I think it must be different with the "corr" command that is used for cross sectional data.

    Can anybody help me?

    Thank you!

  • #2
    statauseroperations (please, as per FAQ, notice the preference for full real names in this forum):
    do you mean correlation between variables or between coefficients?
    If the latter is what you're after:
    Code:
    estat vce, corr
    after panel data regression will do the trick.

    Kind regards,
    Carlo
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      I meant correlation between variables. Thank you!

      Comment


      • #4
        statauseroperations (please, as per FAQ, notice the preference for full real names in this forum):
        admittedly, I was never presented with such an issue. Hence, please take what follows as a temptative answer:
        Code:
        collapse (sum) var1 var2 , by(panel_id)
        corr var1 var2​
        if your data suggest a non-parametric correlation test (i.e., rank correlation coefficient), you can take a look at -help spearman- or -help ktau-.

        Kind regards,
        Carlo
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          thank you very much Carlo!

          Here is the question: imagine we have a panel data that contains annual asset (var1) and annual performance (var2). we simply need to find the correlation between var1 and var2. if we collapse these variables, i think, we'll lose information because it converts our panel data to a cross sectional.

          I also briefly read about Spearman's rank correlation coefficient. it seems it is fine for the cases where there is no repeated data values e.g., cross sectional format; right?

          Best regards,
          p.s. I understood your comment related my full real name in this forum.

          Comment


          • #6
            As you have been asked repeatedly to change your identifier, please do so. Some of us care about this as good practice, and won't support members who differ.

            That said, nothing stops you calculating a correlation for all the values of two variables in a dataset, including several different panels. It's hard to know how to interpret that unless you keep track also of whether the panels are similar or different in their correlation properties. Even pooling panels with similar correlation can be messy if the panels differ in their means.

            Precisely nothing makes Spearman correlation either more or less appropriate for panel data than Pearson correlation.

            Comment


            • #7
              statauseroperations (please, as per FAQ and after three undestood kind reminders, it's time to re-register with your full real names. Just click on the Contact us button at the bottom of the screen page and follow the instructions):
              -after some practice with a Stata dataset, if you're interested in two variables only, -collapse- is probably out of debate:
              Code:
              use "http://www.stata-press.com/data/r13/nlswork.dta", clear
              corr ln_wage union
              display r(rho)^2
              xtreg ln_wage union, re /// same result with the fixed-effect specification
              display e(r2_o)
              As you will see -display r(rho)^2- and -display e(r2_o)- give the same result.

              Kind regards,
              Carlo
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Basic idea is that when running regression models, covariances among the same variable/case across time/space will mess up estimates. But simple bi-variate correlations are just the total covariance, standardized. So for this purpose, it doesn't matter how much is due to time/space issues, and how much is due to substantive issues. It only gets chopped up into separate pieces with more complex models.

                Comment


                • #9
                  you can use -xtsum- command

                  then you can find mean, SD, min, max and observations

                  - mean value is all same

                  - standard deviation differ among overall/ between/ within. #4 (Carlo) suggest 'between' calculation.

                  Comment


                  • #10
                    Generally (in most cases) what you want is to study the within panel variation of the variables over time. This is why you do the fixed effects command. If this is the case then you would want a within panel correlation coefficient.

                    To do this you just demean the variables within panels to set them all on the same playing field.
                    There are two ways to do this. The first is with coding:

                    1) Suppose you have 3 variables x1, x2, x3 and the panel is region then type:

                    by region: egen m1 = mean(x1)
                    by region: egen m2 = mean(x2)
                    by region: egen m3 = mean(x3)
                    gen dx1 = x1 - m1
                    gen dx2 = x2 - m2
                    gen dx3 = x3 - m3
                    reg dx1 dx2 dx3
                    corr dx1 dx2 dx3

                    2) You can simply use the xtdata command, but save under a different name first as the command deletes everything
                    xtdata x1 x2 x3, fe


                    Comment

                    Working...
                    X