How do I test the correlation or difference between two variables in panel data?

Joar Vea

Join Date: Apr 2023

Posts: 4
#1

How do I test the correlation or difference between two variables in panel data?

26 Apr 2023, 12:50

Hi.

I am looking at the correlation between the spot price and price of futures contracts bought 1-, 2- and 3 years in advance, for power.
I have this dataset in a long format, where I use DeliveryYear as the panel variable, and YearsSinceBuy as the time variable, where as you see indicated by the dummy variable, YearsSinceBuy == 0 is Spot price.

The idea is to find out how good a predictor the futures prices are when compared to the achieved spot price for their respective delivery year.
I am not too familiar with panel data or Stata, and I am therefore wondering if anyone could help me with finding simple ways of testing this.

I should mention also, that the spot price for 2022 was abnormally high, is there any way to control for extremes like this or would it be better to omit that year?

What do you think of doing xtreg AvgPrice Spot and using test Spot afterwards?

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int DeliveryYear byte YearsSinceBuy double AvgPrice byte Spot 2012 0 31.197706284152982 1 2012 1 46.745378486055806 0 2012 2 42.54924603174602 0 2012 3 40.13875 0 2013 0 38.10402027397258 1 2013 1 38.534016064257024 0 2013 2 46.045296442687736 0 2013 3 43.40555555555555 0 end label values Spot Spot label def Spot 0 "Termin", modify label def Spot 1 "Spot", modify
Tags: categorical, panel data, regression

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17741

27 Apr 2023, 00:29

Joar:
welcome to this forum.
Provided that I'm not totally clear with what you're after, I would go:

Code:

. xtset DeliveryYear YearsSinceBuy

Panel variable: DeliveryYear (strongly balanced)
 Time variable: YearsSinceBuy, 0 to 3
         Delta: 1 unit


. xtreg AvgPrice i.Spot, fe

Fixed-effects (within) regression               Number of obs     =          8
Group variable: DeliveryYear                    Number of groups  =          2

R-squared:                                      Obs per group:
     Within  = 0.5870                                         min =          4
     Between =      .                                         avg =        4.0
     Overall = 0.5747                                         max =          4

                                                F(1,5)            =       7.11
corr(u_i, Xb) = 0.0000                          Prob > F          =     0.0446

------------------------------------------------------------------------------
    AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        Spot |
       Spot  |  -8.252177   3.095585    -2.67   0.045    -16.20963   -.2947236
       _cons |   42.90304   1.547792    27.72   0.000     38.92431    46.88177
-------------+----------------------------------------------------------------
     sigma_u |  .96481318
     sigma_e |  3.7913014
         rho |  .06082161   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(1, 5) = 0.26                        Prob > F = 0.6324


. mat list e(b)

e(b)[1,3]
            0b.          1.            
          Spot        Spot       _cons
y1           0  -8.2521772    42.90304

. testparm 1.Spot

 ( 1)  1.Spot = 0

       F(  1,     5) =    7.11
            Prob > F =    0.0446

.

In addition:
1) a single predictor is surely nit enough to have a correctly specified model;
2) depending on the number of your panels (say=>30), clustered-robust standard errors may be worth considering.

Kind regards,
Carlo
(Stata 19.0)

Comment

Joar Vea

Join Date: Apr 2023

Posts: 4
#3

29 Apr 2023, 10:28

Thank you for the reply.

To clarify I'd like to see the relationship/correlation between Spot and Futures prices, all else equal, to determine whether they are the same, which could mean that futures prices are good estimators of the spot price.
I am not familiar with panel data, and my first idea was to run a simple t-test (ttest AvgPrice, by(Spot)) but that would just take the means of both for all years and compare them to eachother.

There are 11 panels, for 44 observations in the dataset. When running the regression you supplied, I get very low R-squared, and the Spot coefficient's p-value is 0.110.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17741
#4

30 Apr 2023, 05:43

Joar:
1) as 11 panels and 4 waves of data are not a big deal to share, you could post your whole dataset on the list via CODE delimiters, showing what you typed/would like to type.
As an aside, a low R-sq (BTW: please clarify the R-sq you're referring to, as Stata returns 3 of them after -xtreg-) may depend on a poor specification of the regression model.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Joar Vea

Join Date: Apr 2023
Posts: 4

02 May 2023, 12:27

Sure, Carlo.
I have updated the dataset to contain panels for delivery years 2006 to 2022, this gives 17 panels for a total of 68 observations.

As I mentioned, I am not too familiar with these types of datasets, and I am therefore open for suggestions as to what tests can be performed to give answers to the abovementioned question.
When referring to the R-sq of the regressions, I looked at the within value, the R-sq-between value was just a dot, I don't know what that means but I've also encountered it when trying other tests such as -xtcointtest- on the same data.

To try and clarify what I want; I'd like to give an answer to whether futures prices can be considered an accurate/good predictor of the spot price. Additionally I'd like to examine how this differs between contracts bought 1-, 2- or 3 years before the delivery year.
I will also mention the spot price for 2022 which is very extreme (partly because of the ongoing war in Ukraine) and since N is so low it skews the data a lot, would you suggest to exclude this year?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int DeliveryYear byte YearsSinceBuy double AvgPrice byte Spot
2006 1 32.998353413654606 0
2007 1   46.4149193548387 0
2008 1  45.23298387096774 0
2009 1  54.70235999999999 0
2010 1  35.94349593495935 0
2011 1  45.22139999999998 0
2012 1 46.745378486055806 0
2013 1 38.534016064257024 0
2014 1  36.24012145748988 0
2015 1  31.52004032258065 0
2016 1 25.253453815261043 0
2017 1  22.58972111553786 0
2018 1 25.118473895582326 0
2019 1   34.9610931174089 0
2020 1 36.205182186234815 0
2021 1  22.97255999999999 0
2022 1 35.875119999999995 0
2006 2 28.265617529880473 0
2007 2 32.222270916334665 0
2008 2  43.23183999999997 0
2009 2 45.720359999999964 0
2010 2  53.43777777777774 0
2011 2 37.955020242914976 0
2012 2  42.54924603174602 0
2013 2 46.045296442687736 0
2014 2  39.56282868525894 0
2015 2   34.7657831325301 0
2016 2 30.809399999999982 0
2017 2 25.682470119521923 0
2018 2 20.745019762845857 0
2019 2  23.32350597609561 0
2020 2 30.588634538152622 0
2021 2 33.672730923694786 0
2022 2 25.413650793650802 0
2006 3 24.951481481481494 0
2007 3 28.410478087649402 0
2008 3   32.0741434262948 0
2009 3  42.67704000000001 0
2010 3  45.87084000000001 0
2011 3 53.712341269841275 0
2012 3           40.13875 0
2013 3  43.40555555555555 0
2014 3 45.853003952569175 0
2015 3  39.64872509960158 0
2016 3  33.89502008032127 0
2017 3             29.751 0
2018 3 25.721713147410348 0
2019 3  19.90885375494073 0
2020 3 23.373705179282865 0
2021 3 29.863815261044163 0
2022 3  32.69654618473897 0
2006 0  48.59288136986301 1
2007 0 27.925720000000005 1
2008 0  44.72556229508197 1
2009 0  35.01665589041095 1
2010 0  53.05859835616436 1
2011 0  47.04855150684931 1
2012 0 31.197706284152982 1
2013 0  38.10402027397258 1
2014 0 29.607325205479455 1
2015 0  20.97740493150685 1
2016 0 26.910692349726762 1
2017 0  29.40819835616437 1
2018 0  43.98806465753426 1
2019 0  38.94020054794522 1
2020 0 10.927720765027312 1
2021 0 62.310626301369915 1
2022 0  135.8672789041096 1
end
label values YearsSinceBuy spotyr
label def spotyr 0 "Spot", modify
label values Spot spot
label def spot 0 "Futures", modify
label def spot 1 "Spot", modify

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17741

03 May 2023, 00:29

Joar:
the issue here is that you do not have a panel-wise effect:

Code:

. xtset DeliveryYear YearsSinceBuy

Panel variable: DeliveryYear (strongly balanced)
 Time variable: YearsSinceBuy, 0 to 3
         Delta: 1 unit

. xtreg AvgPrice i.Spot, fe

Fixed-effects (within) regression               Number of obs     =         68
Group variable: DeliveryYear                    Number of groups  =         17

R-squared:                                      Obs per group:
     Within  = 0.0570                                         min =          4
     Between =      .                                         avg =        4.0
     Overall = 0.0408                                         max =          4

                                                F(1,50)           =       3.02
corr(u_i, Xb) = 0.0000                          Prob > F          =     0.0882

------------------------------------------------------------------------------
    AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        Spot |
       Spot  |   7.281343   4.187125     1.74   0.088    -1.128744    15.69143
       _cons |   35.34261   2.093562    16.88   0.000     31.13757    39.54765
-------------+----------------------------------------------------------------
     sigma_u |  8.5968476
     sigma_e |  14.951025
         rho |  .24847381   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(16, 50) = 1.32                      Prob > F = 0.2212

However, switchng to pooled regression does not change anything, as the model is poorly specified (the fitted^2 shows informative power; therefore, something is missing in your model):

Code:

. reg AvgPrice i.Spot i.DeliveryYear

      Source |       SS           df       MS      Number of obs   =        68
-------------+----------------------------------   F(17, 50)       =      1.42
       Model |   5405.9494        17  317.997024   Prob > F        =    0.1660
    Residual |   11176.658        50   223.53316   R-squared       =    0.3260
-------------+----------------------------------   Adj R-squared   =    0.0968
       Total |  16582.6074        67  247.501603   Root MSE        =    14.951

------------------------------------------------------------------------------
    AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        Spot |
       Spot  |   7.281343   4.187125     1.74   0.088    -1.128744    15.69143
             |
DeliveryYear |
       2007  |   .0412636   10.57197     0.00   0.997    -21.19317    21.27569
       2008  |   7.614049   10.57197     0.72   0.475    -13.62038    28.84848
       2009  |   10.82702   10.57197     1.02   0.311    -10.40741    32.06145
       2010  |   13.37559   10.57197     1.27   0.212    -7.858835    34.61002
       2011  |   12.28224   10.57197     1.16   0.251    -8.952185    33.51667
       2012  |   6.455687   10.57197     0.61   0.544    -14.77874    27.69012
       2013  |   7.820139   10.57197     0.74   0.463    -13.41429    29.05457
       2014  |   4.113736   10.57197     0.39   0.699    -17.12069    25.34817
       2015  |  -1.974095   10.57197    -0.19   0.853    -23.20852    19.26033
       2016  |  -4.484942   10.57197    -0.42   0.673    -25.71937    16.74949
       2017  |  -6.844236   10.57197    -0.65   0.520    -28.07867    14.39019
       2018  |  -4.808766   10.57197    -0.45   0.651     -26.0432    16.42566
       2019  |   -4.41867   10.57197    -0.42   0.678     -25.6531    16.81576
       2020  |  -8.428273   10.57197    -0.80   0.429     -29.6627    12.80616
       2021  |    3.50285   10.57197     0.33   0.742    -17.73158    24.73728
       2022  |   23.76107   10.57197     2.25   0.029     2.526636     44.9955
             |
       _cons |   31.88175   7.548446     4.22   0.000     16.72025    47.04325
------------------------------------------------------------------------------

. linktest

      Source |       SS           df       MS      Number of obs   =        68
-------------+----------------------------------   F(2, 65)        =     19.20
       Model |  6159.06473         2  3079.53236   Prob > F        =    0.0000
    Residual |  10423.5427        65  160.362195   R-squared       =    0.3714
-------------+----------------------------------   Adj R-squared   =    0.3521
       Total |  16582.6074        67  247.501603   Root MSE        =    12.663

------------------------------------------------------------------------------
    AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        _hat |  -1.649454    1.23465    -1.34   0.186    -4.115221    .8163119
      _hatsq |   .0334331   .0154275     2.17   0.034     .0026222     .064244
       _cons |   49.62967   23.82854     2.08   0.041     2.040781    97.21856
------------------------------------------------------------------------------

That said, you can stick with the (2-sided) -ttest-, that confirms what above:

Code:

. ttest AvgPrice, by(Spot)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
 Futures |      51    35.34261    1.275976    9.112292    32.77974    37.90548
    Spot |      17    42.62395    6.573939    27.10504    28.68783    56.56008
---------+--------------------------------------------------------------------
Combined |      68    37.16295    1.907808    15.73218    33.35495    40.97095
---------+--------------------------------------------------------------------
    diff |           -7.281343    4.347724               -15.96185    1.399169
------------------------------------------------------------------------------
    diff = mean(Futures) - mean(Spot)                             t =  -1.6747
H0: diff = 0                                     Degrees of freedom =       66

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0494         Pr(|T| > |t|) = 0.0987          Pr(T > t) = 0.9506

Eventually, I do not get your statement about the mean comparison in -ttest-, as OLS too coinsiders the conditional mean of the regressand.

Kind regards,
Carlo
(Stata 19.0)

Comment

Joar Vea

Join Date: Apr 2023

Posts: 4
#7

03 May 2023, 10:43

Many thanks for the in-depth reply Carlo.

I can see that my initial wording regarding -ttest- was a bit misleading. What I was referring to is that it would not take into account the grouping of panels by DeliveryYear. I was under the impression that using panel data regression would measure the spot price for each panel against the futures price related to it.
Also I was worried about the assumptions for using -ttest- not being met, since Spot is skewed a bit by the extreme values in 2021 and 22, as well as only having 17 observations (i.e. no CLT).

Anyhow, I used -robvar- and did four Welch's t-tests, one for each YearsSinceBuy and one for the whole dataset (like your's, above). And concluded with H0 not being discarded in any of them.
I guess I was looking for something that would give a significant result.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17741
#8

03 May 2023, 11:01

Joar:
while -ttest- is robust about non-normality, you may want to confirm your results via a -bootstrap- ttest.
As an aside, if I may, please note that "fishing expeditions" to find out something statistical signiifcant are not considered scientific. Results are what they are and with a sample size of 17 observations, it is wise to stop at descriptive statistics.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement