Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation Coefficient and logarithmic variables

    Hi,

    I am working on some data containing 1400 observations over a 7 year period, whereas the observations is for weekdays.

    I am looking at historical price data on both commodity and the index, meaning I have 2 variables. I have understood that Pearsons correlation coefficient is the most common and that logarithmic returns is the most common for prices over time. We wish to find evidence that the two variables are uncorrelated over the 7 year period, and that the two variables are negatively correlated when examining sub-periods from the 7 year period.

    While reading on the Internet I find that trustworthy economists say that the historical correlation between these two variables should be ´uncorrelated" (close to 0), which means from the start of commodity being traded in higher volumes from 2013.

    When doing a Pearson correlation coefficient in Stata, after generating logarithmic variables, for the whole period I find that the correlation is 0.747, statistically significant at a 5 % level. Meaning the value is way too high according to what economists say.

    In contrast, when I am not using logarithmic variables I get that the Pearsons correlation coefficient is 0.041, which is a number I really wish to want. This coefficient is in contrast not statistically significant at any level.

    So my question is, is it in this case wrong generating logarithmic variables in this case? Or am I better off eventually using another correlation method? Generally I am wondering if logarithmic returns is the right path for me here or not, also regarding what correlation coefficient to use.

    Thank you in advance.

    Guest
    Last edited by sladmin; 16 Nov 2020, 05:37. Reason: anonymize original poster

  • #2
    I forgot to paste a data example. Here is a data example for my data below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str13 date double(commodity index) float(gold Ln commodity Ln index Ln Gold)
    "Oct 06, 2020"  10621.66 3360.95 1877.13  9.270651  8.119979 7.537499
    "Oct 05, 2020"     10804 3408.63  1912.9  9.287672  8.134066 7.556376
    "Oct 02, 2020"  10585.16 3348.44  1898.7  9.267208   8.11625 7.548925
    "Oct 01, 2020"  10623.33  3380.8 1905.05  9.270808  8.125868 7.552264
    "Sep 30, 2020"  10787.62    3363 1885.44  9.286155  8.120588 7.541916
    "Sep 29, 2020"  10848.83 3335.47 1897.69  9.291813  8.112369 7.548393
    "Sep 28, 2020"  10721.33  3351.6 1880.85   9.27999  8.117193 7.539479
    "Sep 25, 2020"  10702.29 3298.46 1860.05  9.278213  8.101211 7.528358
    "Sep 24, 2020"  10745.55 3246.59 1868.04  9.282247  8.085361 7.532645
    "Sep 23, 2020"  10225.86 3236.92 1863.55  9.232676  8.082377 7.530239
    "Sep 22, 2020"  10538.46 3315.57 1899.28  9.262787  8.106385  7.54923
    "Sep 21, 2020"  10462.26 3281.06 1912.57  9.255529 8.0959215 7.556203
    "Sep 18, 2020"  10944.59 3319.47 1949.69  9.300601   8.10756 7.575426
    "Sep 17, 2020"  10948.99 3357.01  1942.6 9.3010025  8.118806 7.571783
    "Sep 16, 2020"   10974.9 3385.49 1959.32  9.303366  8.127254 7.580353
    "Sep 15, 2020"  10796.95  3401.2 1955.52  9.287019  8.131884 7.578412
    "sep. 14, 2020" 10680.84 3383.54 1956.37  9.276207 8.1266775 7.578846
    "sep. 11, 2020" 10400.91 3340.97  1941.5  9.249649  8.114017 7.571216
    "sep. 10, 2020" 10363.14 3339.19 1954.07  9.246011  8.113483  7.57767
    "sep. 09, 2020" 10242.35 3398.96 1946.66  9.234286  8.131225  7.57387
    "sep. 08, 2020" 10131.52 3331.84 1930.93  9.223407   8.11128 7.565757
    "sep. 07, 2020" 10369.56 3380.59 1928.82   9.24663  8.125806 7.564664
    "sep. 04, 2020" 10511.81 3426.96 1932.45  9.260255  8.139429 7.566544
    "sep. 03, 2020"  10245.3 3455.06 1930.52  9.234574  8.147595 7.565545
    "sep. 02, 2020" 11414.03 3580.84 1942.62  9.342599  8.183352 7.571793
    "sep. 01, 2020" 11970.48 3526.65 1970.14  9.390199  8.168103  7.58586
    "aug. 31, 2020" 11680.82 3500.31 1969.75  9.365704  8.160606 7.585662
    "aug. 28, 2020"  11542.5 3508.01 1964.49  9.353791  8.162805 7.582988
    "aug. 27, 2020"  11323.4 3484.55 1928.91  9.334626  8.156095  7.56471
    "aug. 26, 2020" 11488.36 3478.73 1953.45   9.34909  8.154423 7.577353
    "aug. 25, 2020" 11366.14 3443.62 1927.94  9.338394  8.144279 7.564208
    "aug. 24, 2020"  11774.6 3431.28 1932.45    9.3737  8.140689 7.566544
    "aug. 21, 2020" 11592.49 3397.16  1939.4  9.358112  8.130695 7.570134
    "aug. 20, 2020" 11878.37 3385.51  1942.6  9.382474   8.12726 7.571783
    "aug. 19, 2020" 11758.28 3374.85 1929.54  9.372313  8.124106 7.565037
    "aug. 18, 2020" 11991.23 3389.78 2000.95  9.391931   8.12852 7.601377
    "aug. 17, 2020"  12254.4 3381.99 1986.45   9.41364   8.12622 7.594104
    "aug. 14, 2020" 11768.87 3372.85 1943.76  9.373213  8.123513  7.57238
    "aug. 13, 2020" 11784.14 3373.43 1953.03   9.37451  8.123685 7.577137
    "aug. 12, 2020" 11584.93 3380.35 1917.82   9.35746  8.125734 7.558944
    "aug. 11, 2020" 11410.53 3333.69 1911.25  9.342292 8.1118355 7.555513
    "aug. 10, 2020" 11878.11 3360.47 2027.26  9.382453  8.119836  7.61444
    "aug. 07, 2020" 11601.47 3351.28 2034.62  9.358887  8.117098 7.618064
    "aug. 06, 2020" 11779.77 3349.16 2063.19  9.374139  8.116465 7.632009
    "aug. 05, 2020" 11747.02 3327.77  2039.4  9.371355  8.110058 7.620411
    "aug. 04, 2020" 11205.89 3306.51 2018.07  9.324195  8.103648 7.609897
    "aug. 03, 2020" 11246.35 3294.61 1977.26  9.327799  8.100043 7.589468
    "jul. 31, 2020" 11323.47 3271.12  1975.9  9.334633  8.092888 7.588779
    "jul. 30, 2020" 11111.21 3246.22  1960.5   9.31571  8.085246 7.580955
    "jul. 29, 2020" 11100.47 3258.44 1970.73  9.314743  8.089004 7.586159
    "jul. 28, 2020" 10912.82 3218.44 1959.51  9.297693  8.076652  7.58045
    "jul. 27, 2020" 10990.87 3239.41 1942.42   9.30482  8.083146  7.57169
    "jul. 24, 2020"  9536.89 3215.63 1901.46  9.162923  8.075779 7.550377
    "jul. 23, 2020"  9581.07 3235.66 1887.44  9.167544  8.081988 7.542977
    "jul. 22, 2020"  9525.36 3276.02 1872.07  9.161713  8.094384   7.5348
    "jul. 21, 2020"  9374.89  3257.3 1842.22   9.14579  8.088654 7.518727
    "jul. 20, 2020"  9164.23 3251.84 1816.09  9.123063  8.086976 7.504441
    "jul. 17, 2020"  9151.39 3224.73  1809.9  9.121661  8.078605 7.501027
    "jul. 16, 2020"  9132.23 3215.57 1797.05  9.119565   8.07576 7.493902
    "jul. 15, 2020"  9192.84 3226.56  1811.9  9.126181  8.079172 7.502131
    "jul. 14, 2020"  9243.21 3197.52 1807.79  9.131644   8.07013  7.49986
    "jul. 13, 2020"  9243.61 3155.22 1803.39  9.131688  8.056813 7.497424
    "jul. 10, 2020"  9278.81 3185.04  1798.9 9.1354885   8.06622 7.494931
    "jul. 09, 2020"  9277.97 3152.05 1803.19  9.135398  8.055808 7.497313
    "jul. 08, 2020"  9428.33 3169.94 1810.78  9.151474  8.061468 7.501513
    "jul. 07, 2020"  9252.28 3145.32 1795.26  9.132626  8.053671 7.492905
    "jul. 06, 2020"  9375.47 3179.72  1783.9  9.145852 8.0645485 7.486557
    "jul. 03, 2020"   9087.3 3150.94 1775.05  9.114634  8.055456 7.481584
    "jul. 02, 2020"  9123.41 3130.01 1776.14  9.118599  8.048792 7.482198
    "jul. 01, 2020"  9228.33 3115.86 1770.31 9.1300335   8.04426  7.47891
    "jun. 30, 2020"  9137.99 3100.29 1781.67  9.120195  8.039251 7.485306
    "jun. 29, 2020"  9190.85 3053.24 1772.28  9.125964  8.023958 7.480022
    "jun. 26, 2020"  9162.92 3009.05 1771.47   9.12292  8.009379 7.479565
    "jun. 25, 2020"  9264.81 3083.76 1761.81  9.133979  8.033905 7.474097
    "jun. 24, 2020"  9313.61 3050.33 1762.43  9.139232 8.0230055 7.474449
    "jun. 23, 2020"  9629.66 3131.29 1767.54  9.172604    8.0492 7.477344
    "jun. 22, 2020"  9648.72 3117.86 1755.44  9.174581  8.044902 7.470475
    "jun. 19, 2020"  9288.02 3097.74 1743.61   9.13648  8.038428 7.463713
    "jun. 18, 2020"  9411.84 3115.34 1722.99  9.149724  8.044093 7.451817
    "jun. 17, 2020"  9480.26 3113.49 1727.04  9.156967    8.0435 7.454164
    "jun. 16, 2020"  9538.02 3124.74 1727.46  9.163041  8.047107 7.454407
    "jun. 15, 2020"   9450.7 3066.59 1725.53  9.153844  8.028321  7.45329
    "jun. 12, 2020"  9480.84 3041.31  1730.8  9.157028  8.020043 7.456339
    "jun. 11, 2020"  9321.78  3002.1 1727.51  9.140109  8.007068 7.454436
    "jun. 10, 2020"  9870.09 3190.14  1737.1  9.197265  8.067821 7.459972
    "jun. 09, 2020"   9795.7 3207.18 1714.57  9.189699  8.073148 7.446918
    "jun. 08, 2020"  9771.49 3232.39 1697.96  9.187224  8.080977 7.437183
    "jun. 05, 2020"  9665.53 3193.93 1685.83  9.176321  8.069008 7.430013
    "jun. 04, 2020"  9800.64 3112.35 1711.83  9.190203  8.043134 7.445318
    "jun. 03, 2020"  9656.72 3122.87 1698.64  9.175409  8.046508 7.437583
    "jun. 02, 2020"   9529.8 3080.82 1727.96  9.162179  8.032951 7.454697
    "jun. 01, 2020" 10167.27 3055.73 1740.33  9.226929  8.024774  7.46183
    "May 29, 2020"   9439.12 3044.31  1728.3  9.152618  8.021029 7.454894
    "May 28, 2020"   9525.75 3029.73  1719.4  9.161754  8.016229 7.449731
    "May 27, 2020"   9181.02 3036.13 1709.26  9.124893  8.018339 7.443816
    "May 26, 2020"   8835.05 2991.77 1712.45  9.086482   8.00362  7.44568
    "May 25, 2020"   8906.93  2975.5 1729.57  9.094584  7.998168 7.455628
    "May 22, 2020"   9182.58 2955.45 1736.07  9.125064  7.991406 7.459379
    "May 21, 2020"   9081.76 2948.51 1726.33  9.114023  7.989055 7.453753
    "May 20, 2020"   9522.98 2971.61  1749.5  9.161463  7.996859 7.467085
    end
    The first string variable to the left is the corresponding dates. The two first variables thereafter are commodity and index which I have been referring to. Also the logarithmic values for commodity and index in the fifth and sixth column.

    Comment


    • #3
      That's not dataex output as what is presented as variable names is hacked around. 7 columns of data require 7 distinct and legal variable names.

      I'd look carefully at a plot of your logged data. Occasionally a puzzlingly high correlation is a side-effect of outliers at bottom left created by the transformation.
      Last edited by Nick Cox; 02 Nov 2020, 03:18.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        That's not dataex output as what is presented as variable names is hacked around. 7 columns of data require 7 distinct and legal variable names.

        I'd look carefully at a plot of your logged data. Occasionally a puzzlingly high correlation is a side-effect of outliers at bottom left created by the transformation.
        Thank you for your answer. I am providing a new datatex:

        Code:
              Date              commodity     index
        "Oct 06, 2020"   10621.66.  3360.95
        "Oct 05, 2020"   10804        3408.63
        "Oct 02, 2020"   10585.16   3348.44
        "Oct 01, 2020"   10623.33   3380.8
        "Sep 30, 2020"  10787.62   3363
        "Sep 29, 2020"  10848.83   3335.47
        "Sep 28, 2020"  10721.33   3351.6
        "Sep 25, 2020"  10702.29   3298.46
        "Sep 24, 2020"  10745.55   3246.59
        "Sep 23, 2020"  10225.86   3236.92
        "Sep 22, 2020"  10538.46   3315.57
        "Sep 21, 2020"  10462.26   3281.06
        "Sep 18, 2020"  10944.59   3319.47
        "Sep 17, 2020"  10948.99   3357.01
        "Sep 16, 2020"  10974.9     3385.49
        "Sep 15, 2020"  10796.95   3401.2
        end
        By running a scatter plot I find outliers, how do I detect these in my dataset?

        Comment


        • #5
          By running a scatter plot I find outliers, how do I detect these in my dataset?
          I read that backwards. Given two variables, plot them and think about what you see: then outliers are those data points that cause surprise given your understanding of the data and the generation process.

          Comment


          • #6
            Originally posted by Nick Cox View Post

            I read that backwards. Given two variables, plot them and think about what you see: then outliers are those data points that cause surprise given your understanding of the data and the generation process.
            Ok, makes sense.

            I still do not get to an understanding of my correlation coefficient. I have removed the outliers which where numbers written wrong due to commas.

            Still I do not get the correlation I was expecting. Could the misunderstanding be due to other factors as well?

            Regards
            Guest
            Last edited by sladmin; 16 Nov 2020, 05:35. Reason: anonymize original poster

            Comment

            Working...
            X