Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I test the correlation or difference between two variables in panel data?

    Hi.

    I am looking at the correlation between the spot price and price of futures contracts bought 1-, 2- and 3 years in advance, for power.
    I have this dataset in a long format, where I use DeliveryYear as the panel variable, and YearsSinceBuy as the time variable, where as you see indicated by the dummy variable, YearsSinceBuy == 0 is Spot price.

    The idea is to find out how good a predictor the futures prices are when compared to the achieved spot price for their respective delivery year.
    I am not too familiar with panel data or Stata, and I am therefore wondering if anyone could help me with finding simple ways of testing this.

    I should mention also, that the spot price for 2022 was abnormally high, is there any way to control for extremes like this or would it be better to omit that year?

    What do you think of doing xtreg AvgPrice Spot and using test Spot afterwards?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int DeliveryYear byte YearsSinceBuy double AvgPrice byte Spot
    2012 0 31.197706284152982 1
    2012 1 46.745378486055806 0
    2012 2  42.54924603174602 0
    2012 3           40.13875 0
    2013 0  38.10402027397258 1
    2013 1 38.534016064257024 0
    2013 2 46.045296442687736 0
    2013 3  43.40555555555555 0
    end
    label values Spot Spot
    label def Spot 0 "Termin", modify
    label def Spot 1 "Spot", modify

  • #2
    Joar:
    welcome to this forum.
    Provided that I'm not totally clear with what you're after, I would go:
    Code:
    . xtset DeliveryYear YearsSinceBuy
    
    Panel variable: DeliveryYear (strongly balanced)
     Time variable: YearsSinceBuy, 0 to 3
             Delta: 1 unit
    
    
    . xtreg AvgPrice i.Spot, fe
    
    Fixed-effects (within) regression               Number of obs     =          8
    Group variable: DeliveryYear                    Number of groups  =          2
    
    R-squared:                                      Obs per group:
         Within  = 0.5870                                         min =          4
         Between =      .                                         avg =        4.0
         Overall = 0.5747                                         max =          4
    
                                                    F(1,5)            =       7.11
    corr(u_i, Xb) = 0.0000                          Prob > F          =     0.0446
    
    ------------------------------------------------------------------------------
        AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            Spot |
           Spot  |  -8.252177   3.095585    -2.67   0.045    -16.20963   -.2947236
           _cons |   42.90304   1.547792    27.72   0.000     38.92431    46.88177
    -------------+----------------------------------------------------------------
         sigma_u |  .96481318
         sigma_e |  3.7913014
             rho |  .06082161   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(1, 5) = 0.26                        Prob > F = 0.6324
    
    
    . mat list e(b)
    
    e(b)[1,3]
                0b.          1.            
              Spot        Spot       _cons
    y1           0  -8.2521772    42.90304
    
    . testparm 1.Spot
    
     ( 1)  1.Spot = 0
    
           F(  1,     5) =    7.11
                Prob > F =    0.0446
    
    .
    In addition:
    1) a single predictor is surely nit enough to have a correctly specified model;
    2) depending on the number of your panels (say=>30), clustered-robust standard errors may be worth considering.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you for the reply.

      To clarify I'd like to see the relationship/correlation between Spot and Futures prices, all else equal, to determine whether they are the same, which could mean that futures prices are good estimators of the spot price.
      I am not familiar with panel data, and my first idea was to run a simple t-test (ttest AvgPrice, by(Spot)) but that would just take the means of both for all years and compare them to eachother.

      There are 11 panels, for 44 observations in the dataset. When running the regression you supplied, I get very low R-squared, and the Spot coefficient's p-value is 0.110.

      Comment


      • #4
        Joar:
        1) as 11 panels and 4 waves of data are not a big deal to share, you could post your whole dataset on the list via CODE delimiters, showing what you typed/would like to type.
        As an aside, a low R-sq (BTW: please clarify the R-sq you're referring to, as Stata returns 3 of them after -xtreg-) may depend on a poor specification of the regression model.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Sure, Carlo.
          I have updated the dataset to contain panels for delivery years 2006 to 2022, this gives 17 panels for a total of 68 observations.

          As I mentioned, I am not too familiar with these types of datasets, and I am therefore open for suggestions as to what tests can be performed to give answers to the abovementioned question.
          When referring to the R-sq of the regressions, I looked at the within value, the R-sq-between value was just a dot, I don't know what that means but I've also encountered it when trying other tests such as -xtcointtest- on the same data.

          To try and clarify what I want; I'd like to give an answer to whether futures prices can be considered an accurate/good predictor of the spot price. Additionally I'd like to examine how this differs between contracts bought 1-, 2- or 3 years before the delivery year.
          I will also mention the spot price for 2022 which is very extreme (partly because of the ongoing war in Ukraine) and since N is so low it skews the data a lot, would you suggest to exclude this year?

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int DeliveryYear byte YearsSinceBuy double AvgPrice byte Spot
          2006 1 32.998353413654606 0
          2007 1   46.4149193548387 0
          2008 1  45.23298387096774 0
          2009 1  54.70235999999999 0
          2010 1  35.94349593495935 0
          2011 1  45.22139999999998 0
          2012 1 46.745378486055806 0
          2013 1 38.534016064257024 0
          2014 1  36.24012145748988 0
          2015 1  31.52004032258065 0
          2016 1 25.253453815261043 0
          2017 1  22.58972111553786 0
          2018 1 25.118473895582326 0
          2019 1   34.9610931174089 0
          2020 1 36.205182186234815 0
          2021 1  22.97255999999999 0
          2022 1 35.875119999999995 0
          2006 2 28.265617529880473 0
          2007 2 32.222270916334665 0
          2008 2  43.23183999999997 0
          2009 2 45.720359999999964 0
          2010 2  53.43777777777774 0
          2011 2 37.955020242914976 0
          2012 2  42.54924603174602 0
          2013 2 46.045296442687736 0
          2014 2  39.56282868525894 0
          2015 2   34.7657831325301 0
          2016 2 30.809399999999982 0
          2017 2 25.682470119521923 0
          2018 2 20.745019762845857 0
          2019 2  23.32350597609561 0
          2020 2 30.588634538152622 0
          2021 2 33.672730923694786 0
          2022 2 25.413650793650802 0
          2006 3 24.951481481481494 0
          2007 3 28.410478087649402 0
          2008 3   32.0741434262948 0
          2009 3  42.67704000000001 0
          2010 3  45.87084000000001 0
          2011 3 53.712341269841275 0
          2012 3           40.13875 0
          2013 3  43.40555555555555 0
          2014 3 45.853003952569175 0
          2015 3  39.64872509960158 0
          2016 3  33.89502008032127 0
          2017 3             29.751 0
          2018 3 25.721713147410348 0
          2019 3  19.90885375494073 0
          2020 3 23.373705179282865 0
          2021 3 29.863815261044163 0
          2022 3  32.69654618473897 0
          2006 0  48.59288136986301 1
          2007 0 27.925720000000005 1
          2008 0  44.72556229508197 1
          2009 0  35.01665589041095 1
          2010 0  53.05859835616436 1
          2011 0  47.04855150684931 1
          2012 0 31.197706284152982 1
          2013 0  38.10402027397258 1
          2014 0 29.607325205479455 1
          2015 0  20.97740493150685 1
          2016 0 26.910692349726762 1
          2017 0  29.40819835616437 1
          2018 0  43.98806465753426 1
          2019 0  38.94020054794522 1
          2020 0 10.927720765027312 1
          2021 0 62.310626301369915 1
          2022 0  135.8672789041096 1
          end
          label values YearsSinceBuy spotyr
          label def spotyr 0 "Spot", modify
          label values Spot spot
          label def spot 0 "Futures", modify
          label def spot 1 "Spot", modify

          Comment


          • #6
            Joar:
            the issue here is that you do not have a panel-wise effect:
            Code:
            . xtset DeliveryYear YearsSinceBuy
            
            Panel variable: DeliveryYear (strongly balanced)
             Time variable: YearsSinceBuy, 0 to 3
                     Delta: 1 unit
            
            . xtreg AvgPrice i.Spot, fe
            
            Fixed-effects (within) regression               Number of obs     =         68
            Group variable: DeliveryYear                    Number of groups  =         17
            
            R-squared:                                      Obs per group:
                 Within  = 0.0570                                         min =          4
                 Between =      .                                         avg =        4.0
                 Overall = 0.0408                                         max =          4
            
                                                            F(1,50)           =       3.02
            corr(u_i, Xb) = 0.0000                          Prob > F          =     0.0882
            
            ------------------------------------------------------------------------------
                AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    Spot |
                   Spot  |   7.281343   4.187125     1.74   0.088    -1.128744    15.69143
                   _cons |   35.34261   2.093562    16.88   0.000     31.13757    39.54765
            -------------+----------------------------------------------------------------
                 sigma_u |  8.5968476
                 sigma_e |  14.951025
                     rho |  .24847381   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            F test that all u_i=0: F(16, 50) = 1.32                      Prob > F = 0.2212
            However, switchng to pooled regression does not change anything, as the model is poorly specified (the fitted^2 shows informative power; therefore, something is missing in your model):
            Code:
            . reg AvgPrice i.Spot i.DeliveryYear
            
                  Source |       SS           df       MS      Number of obs   =        68
            -------------+----------------------------------   F(17, 50)       =      1.42
                   Model |   5405.9494        17  317.997024   Prob > F        =    0.1660
                Residual |   11176.658        50   223.53316   R-squared       =    0.3260
            -------------+----------------------------------   Adj R-squared   =    0.0968
                   Total |  16582.6074        67  247.501603   Root MSE        =    14.951
            
            ------------------------------------------------------------------------------
                AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    Spot |
                   Spot  |   7.281343   4.187125     1.74   0.088    -1.128744    15.69143
                         |
            DeliveryYear |
                   2007  |   .0412636   10.57197     0.00   0.997    -21.19317    21.27569
                   2008  |   7.614049   10.57197     0.72   0.475    -13.62038    28.84848
                   2009  |   10.82702   10.57197     1.02   0.311    -10.40741    32.06145
                   2010  |   13.37559   10.57197     1.27   0.212    -7.858835    34.61002
                   2011  |   12.28224   10.57197     1.16   0.251    -8.952185    33.51667
                   2012  |   6.455687   10.57197     0.61   0.544    -14.77874    27.69012
                   2013  |   7.820139   10.57197     0.74   0.463    -13.41429    29.05457
                   2014  |   4.113736   10.57197     0.39   0.699    -17.12069    25.34817
                   2015  |  -1.974095   10.57197    -0.19   0.853    -23.20852    19.26033
                   2016  |  -4.484942   10.57197    -0.42   0.673    -25.71937    16.74949
                   2017  |  -6.844236   10.57197    -0.65   0.520    -28.07867    14.39019
                   2018  |  -4.808766   10.57197    -0.45   0.651     -26.0432    16.42566
                   2019  |   -4.41867   10.57197    -0.42   0.678     -25.6531    16.81576
                   2020  |  -8.428273   10.57197    -0.80   0.429     -29.6627    12.80616
                   2021  |    3.50285   10.57197     0.33   0.742    -17.73158    24.73728
                   2022  |   23.76107   10.57197     2.25   0.029     2.526636     44.9955
                         |
                   _cons |   31.88175   7.548446     4.22   0.000     16.72025    47.04325
            ------------------------------------------------------------------------------
            
            . linktest
            
                  Source |       SS           df       MS      Number of obs   =        68
            -------------+----------------------------------   F(2, 65)        =     19.20
                   Model |  6159.06473         2  3079.53236   Prob > F        =    0.0000
                Residual |  10423.5427        65  160.362195   R-squared       =    0.3714
            -------------+----------------------------------   Adj R-squared   =    0.3521
                   Total |  16582.6074        67  247.501603   Root MSE        =    12.663
            
            ------------------------------------------------------------------------------
                AvgPrice | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    _hat |  -1.649454    1.23465    -1.34   0.186    -4.115221    .8163119
                  _hatsq |   .0334331   .0154275     2.17   0.034     .0026222     .064244
                   _cons |   49.62967   23.82854     2.08   0.041     2.040781    97.21856
            ------------------------------------------------------------------------------
            That said, you can stick with the (2-sided) -ttest-, that confirms what above:
            Code:
            . ttest AvgPrice, by(Spot)
            
            Two-sample t test with equal variances
            ------------------------------------------------------------------------------
               Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
            ---------+--------------------------------------------------------------------
             Futures |      51    35.34261    1.275976    9.112292    32.77974    37.90548
                Spot |      17    42.62395    6.573939    27.10504    28.68783    56.56008
            ---------+--------------------------------------------------------------------
            Combined |      68    37.16295    1.907808    15.73218    33.35495    40.97095
            ---------+--------------------------------------------------------------------
                diff |           -7.281343    4.347724               -15.96185    1.399169
            ------------------------------------------------------------------------------
                diff = mean(Futures) - mean(Spot)                             t =  -1.6747
            H0: diff = 0                                     Degrees of freedom =       66
            
                Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
             Pr(T < t) = 0.0494         Pr(|T| > |t|) = 0.0987          Pr(T > t) = 0.9506
            Eventually, I do not get your statement about the mean comparison in -ttest-, as OLS too coinsiders the conditional mean of the regressand.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Many thanks for the in-depth reply Carlo.

              I can see that my initial wording regarding -ttest- was a bit misleading. What I was referring to is that it would not take into account the grouping of panels by DeliveryYear. I was under the impression that using panel data regression would measure the spot price for each panel against the futures price related to it.
              Also I was worried about the assumptions for using -ttest- not being met, since Spot is skewed a bit by the extreme values in 2021 and 22, as well as only having 17 observations (i.e. no CLT).

              Anyhow, I used -robvar- and did four Welch's t-tests, one for each YearsSinceBuy and one for the whole dataset (like your's, above). And concluded with H0 not being discarded in any of them.
              I guess I was looking for something that would give a significant result.

              Comment


              • #8
                Joar:
                while -ttest- is robust about non-normality, you may want to confirm your results via a -bootstrap- ttest.
                As an aside, if I may, please note that "fishing expeditions" to find out something statistical signiifcant are not considered scientific. Results are what they are and with a sample size of 17 observations, it is wise to stop at descriptive statistics.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment

                Working...
                X