Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Addressing autocorrelation in T>N unbalanced panel data

    Good morning and all the best wishes for 2023,

    I have an unbalanced panel data set with T>N. Specifically, it contains 59 forward contracts (panel id) in the gas market with daily data of their price movements from January 2019 until August 2022 (time id). The only difference between these forward contracts is their date of maturity, with the maturity dates ranging from February 2019, March 2019, ..., until December 2023. Since most contracts mature during my data period, the panel data is unbalanced. To explain, the forward maturing in e.g. April 2019 only has data from January 2019 until the end of March 2019, since after this period it does not exist anymore. With this dataset, I am running two models based on FE / Pooled OLS, with one predicting forward prices and one predicting forward prices minus spot prices. However, I am struggling with the issue of autocorrelation.

    It seems quite clear that model suffer from autocorrelation since the price of today depends on the price of yesterday (e.g. due to shocks not captured by the model). To address this, I found that many people use vce(cluster clusterid), for which either the panelid or timeid is chosen as clusterid. However, in my context I am not sure which of the two is more suitable. It seems that if there is a shock in the gas market not captured by the model, all existing forward contracts are affected at the same time for a number of periods, leading me to think that it should be clustered on time. However, forwards near maturity are affected more strongly by such a shock than forwards maturing further in the future, leading to a different effect per panelid, making me lead to think that it should be clustered on panelid.

    What would be the best approach to cluster the standard errors? I have also noticed the newey or newey2 option in time series data, which could be applied to panel data with the force option. Would this be a better approach than choosing between panelid/timeid with clustering? Another alternative would be to include day dummies but since my dataset has >900 days and the magnitude of the shock differs per forward, I believe this is not a suitable solution. In addition, the last days of my dataset only have 16 observations each (that is in August 2022 only the forwards ending between September 2022 and December 2023 are remaining).

    Hopefully, someone can help me on this matter. I want to avoid using lagged forward prices as an explanatory variable, as I believe this significantly biases the remaining estimates.

    Thank you in advance,

    Best,
    Stefan

    Example of data (Note that I have not included all explanatory variables here):
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long bcal_date byte ForwardIDNotchronological str6 Forward double(Last FminS StorageDifferentialpp HDD) float TimeToMaturityinDays
     0 1 "Apr-19" 21.073  -1.2649116516113281  4.154918685181064 14.956843224212369  85
     1 1 "Apr-19"   20.9   -.9370609521865845 3.9909131097125794 17.365878184036973  84
     2 1 "Apr-19" 21.375   -.7146892547607422 3.9044374292301898  17.07259541833382  83
     3 1 "Apr-19" 20.504  -.18523378670215607  4.202203443949603 14.724739683093697  80
     4 1 "Apr-19" 20.958   -.8720952868461609  4.246697818940581  15.69467289307796  79
     5 1 "Apr-19" 20.573   -.6182765364646912 4.0693012133582895 15.787276165430324  78
     6 1 "Apr-19" 20.524  -1.4630166292190552 3.8222247533405196 16.466272011858738  77
     7 1 "Apr-19"  20.94  -.49131256341934204 3.8908445034195216  17.29398776301633  76
     8 1 "Apr-19" 20.525   -.8998981714248657 4.7744742013453685  12.99215511607172  73
     9 1 "Apr-19" 20.898  -.34439921379089355  4.501237387327883  13.51847291901612  72
    10 1 "Apr-19"     21   -.7477397918701172  4.562905460719746 14.261926277053735  71
    11 1 "Apr-19" 21.919   -.8131643533706665  4.664170117632693 13.503087491300732  70
    12 1 "Apr-19" 21.725   -.5850738286972046  4.681253153778364 15.081026880807627  69
    13 1 "Apr-19" 21.162   -.6925017237663269  4.650215965856718 19.088969293989685  66
    14 1 "Apr-19" 21.184   -.5423412322998047  4.411772354184307 18.388939906045923  65
    15 1 "Apr-19" 21.505  -.49548637866973877 3.9245698934585627  18.89483980943654  64
    16 1 "Apr-19" 20.709   -.5355201959609985 3.7186900750836083 17.282377100504643  63
    17 1 "Apr-19" 20.436   -.4014385938644409 3.5360547008440046  18.10689224762214  62
    18 1 "Apr-19" 20.135   -.4560987949371338 3.7687662984285386 14.874321544088811  59
    19 1 "Apr-19" 20.206   -.6376794576644897  3.566352326421418 15.974830294731296  58
    20 1 "Apr-19" 19.802  -.38830000162124634   3.29833195203042 17.493967571237057  57
    21 1 "Apr-19" 19.477   -.5069153308868408  2.981014316737751  16.76584064261391  56
    22 1 "Apr-19" 19.068   -.1927688866853714  2.799392523755795 14.597080048901223  55
    23 1 "Apr-19" 19.149   -.2338964194059372  2.637593057319798 15.876250634486649  52
    24 1 "Apr-19" 18.938    .2525656521320343 2.5209027401734807   17.5473240535079  51
    25 1 "Apr-19" 18.676  -.35440123081207275  2.337756503306138 16.447075361083506  50
    26 1 "Apr-19" 18.631  .023129897192120552 2.6294237438272807  15.99565273565843  49
    27 1 "Apr-19" 17.987   -.3012709617614746 2.8100871063979294 13.490421687925783  48
    28 1 "Apr-19" 17.897  -.39309626817703247 3.4759679360977955 12.706924209106468  45
    29 1 "Apr-19" 17.635   .16231514513492584  3.497102498871546 13.527162965627497  44
    30 1 "Apr-19" 17.897   -.2247193604707718   3.50327612733059 14.245016166563882  43
    31 1 "Apr-19" 17.671  -.09852362424135208  3.608145819989433 14.409795158178559  42
    32 1 "Apr-19" 18.039    .0692562684416771 3.8185903648961848 13.959856574551733  41
    33 1 "Apr-19" 17.379  .011648467741906643  4.835302601213643 12.870984868418315  38
    34 1 "Apr-19" 17.735  -.16109336912631989 5.0138404787663955 13.067572059304243  37
    35 1 "Apr-19" 18.149   .03270895406603813  5.149972499649319 12.789025705567967  36
    36 1 "Apr-19"  17.64    .1207510381937027  4.163221041799048 12.821580226979757  35
    37 1 "Apr-19" 17.382    .4333782494068146  4.583521163617449 10.535112088816401  34
    38 1 "Apr-19" 17.188    .3177798092365265  5.516217922493244 11.949117863930839  31
    39 1 "Apr-19" 17.223  -.12606196105480194   5.86440717274691 10.364240168499009  30
    40 1 "Apr-19" 17.686  -.11533438414335251  6.211936207064883   9.82799481215315  29
    41 1 "Apr-19" 17.814   .03559957817196846   6.55425331481887  9.590275637042588  28
    42 1 "Apr-19" 17.321  -.13209639489650726  6.988544178189588  10.20542306151627  27
    43 1 "Apr-19"   17.4   -.1615520417690277  8.468005955620184   9.06985082134012  24
    44 1 "Apr-19" 17.181   .07196260243654251  8.679538929426261   9.74850269941683  23
    45 1 "Apr-19" 16.851  -.10439474135637283  8.951534139386425  9.303878171684856  22
    46 1 "Apr-19" 16.868  -.07230380177497864  9.341177956424795  7.570277884266103  21
    47 1 "Apr-19" 16.729 -.059389207512140274  9.523276257364127    8.7250182756284  20
    48 1 "Apr-19" 16.623   -.2671271860599518 10.147491233144601 10.819773317361037  17
    49 1 "Apr-19" 16.209    -.305850625038147 10.123554188792166 11.202640743987685  16
    50 1 "Apr-19" 15.725  -.18612995743751526 10.129295641529223 10.754927092076983  15
    51 1 "Apr-19" 15.603   .07213623821735382 10.186246922094943 10.899830951251657  14
    52 1 "Apr-19" 15.252    .3901197016239166 10.384121232516453  8.913920156042902  13
    53 1 "Apr-19"  14.98  -.43232014775276184 10.916931431107729 10.549083139608882  10
    54 1 "Apr-19" 15.003  -.23918463289737701 10.870011381154077 11.520664442985296   9
    55 1 "Apr-19" 15.213   .11791962385177612  10.90696353719413 11.119413828431366   8
    56 1 "Apr-19" 14.563   .00572518166154623 11.049927434990375 10.741380673223665   7
    57 1 "Apr-19" 14.454     .254000186920166 11.391536955374848  8.704985444694895   6
    58 1 "Apr-19" 14.185  -.16500037908554077 12.438116395037829  8.539151775709938   3
    59 1 "Apr-19" 14.384  -.31306058168411255 12.498370361280914 10.685778164916533   2
    60 1 "Apr-19" 14.994  .052969496697187424 12.539270947648069  10.62525399933752   1
    61 1 "Apr-19" 14.428   .23794034123420715 12.561212696076007  9.232591255840084   0
     0 2 "Apr-20" 19.873   -2.464911699295044  4.154918685181064 14.956843224212369 453
     1 2 "Apr-20" 19.768   -2.069061040878296 3.9909131097125794 17.365878184036973 452
     2 2 "Apr-20" 20.279  -1.8106892108917236 3.9044374292301898  17.07259541833382 451
     3 2 "Apr-20" 19.987   -.7022337913513184  4.202203443949603 14.724739683093697 448
     4 2 "Apr-20" 20.263  -1.5670952796936035  4.246697818940581  15.69467289307796 447
     5 2 "Apr-20" 20.078  -1.1132766008377075 4.0693012133582895 15.787276165430324 446
     6 2 "Apr-20" 20.133   -1.854016661643982 3.8222247533405196 16.466272011858738 445
     7 2 "Apr-20" 20.362   -1.069312572479248 3.8908445034195216  17.29398776301633 444
     8 2 "Apr-20" 20.225    -1.19989812374115 4.7744742013453685  12.99215511607172 441
     9 2 "Apr-20" 20.467   -.7753992080688477  4.501237387327883  13.51847291901612 440
    10 2 "Apr-20" 20.737   -1.010739803314209  4.562905460719746 14.261926277053735 439
    11 2 "Apr-20" 21.292    -1.44016432762146  4.664170117632693 13.503087491300732 438
    12 2 "Apr-20" 21.495   -.8150738477706909  4.681253153778364 15.081026880807627 437
    13 2 "Apr-20" 21.247    -.607501745223999  4.650215965856718 19.088969293989685 434
    14 2 "Apr-20"  21.17   -.5563412308692932  4.411772354184307 18.388939906045923 433
    15 2 "Apr-20" 21.369   -.6314863562583923 3.9245698934585627  18.89483980943654 432
    16 2 "Apr-20" 20.846  -.39852020144462585 3.7186900750836083 17.282377100504643 431
    17 2 "Apr-20" 20.859   .02156141586601734 3.5360547008440046  18.10689224762214 430
    18 2 "Apr-20" 20.695    .1039012148976326 3.7687662984285386 14.874321544088811 427
    19 2 "Apr-20" 20.864  .020320571959018707  3.566352326421418 15.974830294731296 426
    20 2 "Apr-20"  20.77    .5796999931335449   3.29833195203042 17.493967571237057 425
    21 2 "Apr-20" 20.654    .6700846552848816  2.981014316737751  16.76584064261391 424
    22 2 "Apr-20" 20.498   1.2372311353683472  2.799392523755795 14.597080048901223 423
    23 2 "Apr-20" 20.455   1.0721036195755005  2.637593057319798 15.876250634486649 420
    24 2 "Apr-20" 20.359   1.6735656261444092 2.5209027401734807   17.5473240535079 419
    25 2 "Apr-20" 20.263   1.2325987815856934  2.337756503306138 16.447075361083506 418
    26 2 "Apr-20" 20.182   1.5741299390792847 2.6294237438272807  15.99565273565843 417
    27 2 "Apr-20" 19.818   1.5297290086746216 2.8100871063979294 13.490421687925783 416
    28 2 "Apr-20" 19.837   1.5469037294387817 3.4759679360977955 12.706924209106468 413
    29 2 "Apr-20" 19.647   2.1743152141571045  3.497102498871546 13.527162965627497 412
    30 2 "Apr-20" 19.932    1.810280680656433   3.50327612733059 14.245016166563882 411
    31 2 "Apr-20"  19.81   2.0404763221740723  3.608145819989433 14.409795158178559 410
    32 2 "Apr-20" 20.141   2.1712563037872314 3.8185903648961848 13.959856574551733 409
    33 2 "Apr-20" 19.692    2.324648380279541  4.835302601213643 12.870984868418315 406
    34 2 "Apr-20" 20.073   2.1769065856933594 5.0138404787663955 13.067572059304243 405
    35 2 "Apr-20" 20.302    2.185708999633789  5.149972499649319 12.789025705567967 404
    36 2 "Apr-20" 20.016    2.496751070022583  4.163221041799048 12.821580226979757 403
    37 2 "Apr-20" 19.904    2.955378293991089  4.583521163617449 10.535112088816401 402
    end
    format %tbcalendar bcal_date


    Last edited by Stefan Tijink; 02 Jan 2023, 04:38.

  • #2
    I wouldn't use Newey-West (1987) standard errors; that's just the Bartlett kernel. Andrews (1991) showed that amongst kernels guaranteeing positive variance estimates, the quadratic spectral performs best (much better than Bartlett). There's also the option of circular block bootstrap (Politis and Romano, 1992) or the wild cluster bootstrap (Roodman, I can't remember the year).

    For clustering standard errors, I would try the community contributed command "summclust" with panelID in order to allow for serial correlation (across time).

    Given that you've got T>N (by a lot), I would definitely also try Driscoll-Kraay (1998) standard errors.

    So you have multiple options.

    Concerning this comment: "Another alternative would be to include day dummies but since my dataset has >900 days and the magnitude of the shock differs per forward, I believe this is not a suitable solution. In addition, the last days of my dataset only have 16 observations each (that is in August 2022 only the forwards ending between September 2022 and December 2023 are remaining)."

    If you're after causal inference, you'll have to account for time fixed-effects; failing to account for time will most certainly bias your coefficients.

    To better help you, we'd need to know a lot more about your research question and the equations you've tried running.

    Comment


    • #3
      For such a dataset the user written command -xtscc- seem appropriate. It calculates linear regression with Driscoll-Kraay standard errors, which are robust to arbitrary cross sectional correlation, and Newey-West time series correlation, and arbitrary heteroskedasticity.

      If you want to do GLS, check out -xtgls-, and if you want robust standard errors, check out my post -xtgls- command -xtglsr-.

      Comment


      • #4
        Thank you for the replies.

        I have read into it and the Driscroll-Kraay standard errors indeed seem most suitable.

        I have also tried xtgls but it did not seem to work due to my panel data being unbalanced.

        Thanks!

        Comment


        • #5
          Originally posted by Stefan Tijink View Post
          Thank you for the replies.

          I have read into it and the Driscroll-Kraay standard errors indeed seem most suitable.

          I have also tried xtgls but it did not seem to work due to my panel data being unbalanced.

          Thanks!
          Yes, -xtgls- is a bit fussy. OLS is robust variance is always a good option.

          Comment

          Working...
          X