Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Impact of one time series variable with a different frequency on another variable with a different frequency

    I am trying to understand the effect of GDP, Inflation and NFP data on the value of the stock market.
    I have daily data on S&P500, quarterly data on GDP and inflation and monthly data on NFP.
    I have combined these together in Stata however, for days when there is no data announcements GDP, CPI and NFP have missing values

    When I do the following:

    regress sp500 yoygdp yoycpi and momnfp

    It only uses observations when there is data available for all the variables.

    Is there a way I can analyse the impact of the data announcement for all days for which I have stock market data?

    see data below:

    input str10 date float(sp500 yoygdp yoycpi) int momnfpchange float date2
    "19/07/2010" 1071.25 . . . 18462
    "20/07/2010" 1083.48 . . . 18463
    "21/07/2010" 1069.59 . . . 18464
    "22/07/2010" 1093.67 . . . 18465
    "23/07/2010" 1102.66 . . . 18466
    "26/07/2010" 1115.01 . . . 18469
    "27/07/2010" 1113.84 . . . 18470
    "28/07/2010" 1106.13 . . . 18471
    "29/07/2010" 1101.53 . . . 18472
    "30/07/2010" 1101.6 . . . 18473
    "02/08/2010" 1125.86 . . . 18476
    "03/08/2010" 1120.46 . . . 18477
    "04/08/2010" 1127.24 . . . 18478
    "05/08/2010" 1125.81 . . . 18479
    "06/08/2010" 1121.64 . . . 18480
    "09/08/2010" 1127.79 . . . 18483
    "10/08/2010" 1121.06 . . . 18484
    "11/08/2010" 1089.47 . . . 18485
    "12/08/2010" 1083.61 . . . 18486
    "13/08/2010" 1079.25 . . . 18487
    "16/08/2010" 1079.38 . . . 18490
    "17/08/2010" 1092.54 . . . 18491
    "18/08/2010" 1094.16 . . . 18492
    "19/08/2010" 1075.63 . . . 18493
    "20/08/2010" 1071.69 . . . 18494
    "23/08/2010" 1067.36 . . . 18497
    "24/08/2010" 1051.87 . . . 18498
    "25/08/2010" 1055.33 . . . 18499
    "26/08/2010" 1047.22 . . . 18500
    "27/08/2010" 1064.59 . . . 18501
    "30/08/2010" 1048.92 . . . 18504
    "31/08/2010" 1049.33 . 1.1 -36 18505
    "01/09/2010" 1080.29 . . . 18506
    "02/09/2010" 1090.1 . . . 18507
    "03/09/2010" 1104.51 . . . 18508
    "06/09/2010" . . . . 18511
    "07/09/2010" 1091.84 . . . 18512
    "08/09/2010" 1098.87 . . . 18513
    "09/09/2010" 1104.18 . . . 18514
    "10/09/2010" 1109.55 . . . 18515
    "13/09/2010" 1121.9 . . . 18518
    "14/09/2010" 1121.1 . . . 18519
    "15/09/2010" 1125.07 . . . 18520
    "16/09/2010" 1124.66 . . . 18521
    "17/09/2010" 1125.59 . . . 18522
    "20/09/2010" 1142.71 . . . 18525
    "21/09/2010" 1139.78 . . . 18526
    "22/09/2010" 1134.28 . . . 18527
    "23/09/2010" 1124.83 . . . 18528
    "24/09/2010" 1148.67 . . . 18529

  • #2
    To do this, you have to supply some values for the variables that are missing on those dates. Assuming that original data sources with daily values of these are not available, you have to decide on some kind of imputation process. Among the candidates for that are last observation carried forward, linear interpolation, regression of the non-daily variables on the daily ones for those dates where they are available and then using predicted values from those regression, and other more complicated models. All of these approaches have drawbacks and add some degree of unreality to the data and the analysis. When you fill in the missing data like this, you create the illusion of an analysis based on complete data, but not the reality: the analysis is based on partial data plus a model (that may or may not be valid) of the missing values. At the very least, the imputed values will have less variation than the real (unobserved) values, and this will upwardly bias your regression coefficients, even if your imputed values are in fact unbiased estimates of the unobserved values.

    Multiple imputation overcomes this last problem, but is probably not really applicable to a situation like yours where only a tiny fraction of the data is actually observed.

    The choice of the least inappropriate way to fill in missing data is a substantive matter, and you should review the literature in your discipline (which I guess is economics or finance) to see how they normally handle this kind of situation. It's not my discipline, so I can't advise you, but it would not surprise me to learn that the usual solution is not to fill in the missing values at all but rather to aggregate the sp500 variable up to a coarser time level that meets the sparser data on the other variables and then regress that.

    Comment


    • #3
      True. I am facing a similar problem where I have daily stock returns but all the firm-level control variables are in annual frequency. For this, I tried to convert daily stock returns into annual by taking the daily average for that year. But I doubt it is correct. Usually, it is multiplied by 252 (trading days) but that is inflating my coefficients. Another thing that can be done is to take the holding period return [(closing price of a year-opening price)/opening price]. But I don't know which is the most appropriate technique. Some suggest using MIDAS (Mixed Data Sampling) which addresses the issue of having variables in different frequencies. And some papers have done this without caring about this issue. I know a paper from JFE where the authors had monthly stock returns and control variables in annual frequency, they reported the regression results and interpreted the coefficient by multiplying by 12. I am eager to know the appropriate solution.

      Comment

      Working...
      X