Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pairwise correlations on groups of variables

    Hello!

    Initially, I had a dataset in long format with the following variables: year, country, region, GDP_pc_change.

    I'm interested in computing the pairwise correlations of the change in GDP per capita across pairs of regions. Hence, I reshaped the data into wide format, resulting in the following structure (see below).

    Code:
    clear
    input int year float(GDP_pc_changeAL011 GDP_pc_changeAL012 GDP_pc_changeAL013 GDP_pc_changeAT127 GDP_pc_changeAT130 GDP_pc_changeAT211)
    1981 . . . .41 -1.69 .64
    1982 . . . 4.5 4.23 1.2
    1983 . . . 1.97 5.94 .27
    1984 . . . 2.82 .29 2.03
    1985 . . . 2.06 1.6 .28
    1986 . . . 5.05 4.61 2.31
    1987 . . . 1.82 2.79 -1.38
    1988 . . . 5.52 .57 3.24
    1989 . . . 5.17 .96 7.52
    1990 . . . 8.37 1.87 5.16
    1991 . . . 5.06 .56 5.09
    1992 . . . 3.53 .49 1.64
    1993 . . . 1.34 3.44 -.68
    1994 . . . 6.4 -1.63 3.46
    1995 . . . 3.11 19.78 -3.07
    1996 . . . 3.11 -1.28 3.06
    1997 . . . 5.43 -2.44 1.63
    1998 . . . 4.94 1.67 4.77
    1999 . . . 4.33 2.99 3.8
    2000 . . . 8.32 1.15 -.4
    2001 . . . -4.51 1.04 .53
    2002 . . . .6 2.51 1.63
    2003 . . . -.35 -.67 .97
    2004 . . . 7.04 1.12 4.09
    2005 . . . 1.86 1.88 2.74
    2006 . . . 4.77 3.8 1.43
    2007 . . . 5.17 2.41 5.24
    2008 . . . 1.71 .82 1.87
    2009 5.29 3.93 4.92 -2.67 -1.62 -5.69
    2010 8.13 9.67 4.72 -.38 1.44 3.6
    2011 -5.79 -2.99 6.79 1.58 2.24 4.13
    2012 12.96 .93 -3.61 -.43 -.2 -1.68
    2013 .13 1.5 5.2 -1.14 -.4 -.98
    2014 2.09 -1.35 -12.86 3.48 -.5 .45
    2015 5.35 7.53 -5.24 3.62 .5 -.5
    2016 4.56 3.2 -1.96 1.32 2.8 .36
    2017 -.61 7.11 2.62 4.53 .29 4.49
    2018 4.48 2.25 4.23 4.27 3.39 3.63
    2019 -2.74 4.76 -2.89 3.5 .66 1.19
    2020 -6.08 -4.89 -.74 -13.58 -6.62 -4.15
    2021 9.59 7.83 6.79 8.7 3.89 6.27
    2022 -.4 1.39 .71 4.27 5.81 2.87
    2023 1.14 2.99 2.3 .41 1.11 1.88
    2024 1.41 3.3 2.6 -.24 .46 1.22
    end

    I have a few issues where I very much would appreciate help on.
    • I need the change in pairwise correlation coefficients between a first and a second time period. Do I achieve this by applying the command pwcorr GDP_pc_change* to each of the sub-sample separately and then subtracting one matrix from the other?
    • I need the correlations of GDP_pc_change first for only the pairs of regions that belong to the same country and second only for the pairs of regions that belong to different countries. The country code is included in all variable names, e.g., in the variable name "GDP_pc_changeAL011" AL is the country code. I'm unsure how to do this.
    • Lastly, I have issues with transforming the matrix with correlation coefficients into a variable that I can use for the kdensity command.
    Sorry for being clueless on these fronts. I tried around for many hours but can't solve it. Thanks!!


  • #2
    In your example, you don't explain or show how countries and regions are indicated in your data, so I don't know how to help with that.

    As for your question about differences in the correlation matrix over time, yes, I'd do it as you describe:
    Code:
    // All variables, before/after 2001, maybe not what you want, but should show some technique.
    pwcorr GDP* if year <= 2000
    mat Time1 = r(C)
    pwcorr GDP* if year > 2000
    mat Time2 = r(C)
    mat Diff = Time2 - Time1

    "...transforming the matrix with correlation coefficients into a variable"

    Your observations pertain to different years, but your correlation matrices would go across many different years, and we don't know how you would want them to match up. Also, your matrices are two-dimensional, but a variable is a column vector, which creates a similar conceptual difficulty for me. If you showed a small example of how you would want the correlations to be represented as variables, and therefore match up with observations, that would make it possible to help you.

    Comment


    • #3
      Thanks a lot for your response!

      Maybe the best I can do is explain better what I'm trying to achieve: I want to estimate a matrix of pairwise linear correlation coefficients of the yearly GDP growth rate between European regions. I want to do so for two time periods. Then I want to compute the change in these correlation coefficients (between the two time periods) and illustrate the kernel density of these changes, first for pairs of regions belonging to the same country and second for pairs of regions belonging to different countries.

      I have a country variable that, if the data is in long format, would be next to a region variable and a GDP pc change variable. But when I reshape into wide format which I thought is necessary in order to run the pwcorr command on the correlation between the GDP pc change in all regions, the country variable gets lost. Nevertheless, I could still read out the country association from the name of the variables as all variable names are structured GDP_pc_change_[2 letter country code]_region.

      I hope this helps?

      Thanks a lot for reading this!!

      Comment

      Working...
      X