Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time series with quarterly data


    Apologies if this is a basic question.

    I am looking at time series and modelling data on sales in different regions. Some of the regions have yearly data while others offer quarterly breakdown of sales.

    I want to make sure I can compare the slopes in a meaningful way so was planning to run the analyses on the quarterly data (if available) and rescale the time variable to estimate change in sales by year (to compare the slopes with region with quarterly data). For peace of mind, I then compared the results when using data collapsed at year level. The results are different!

    I am not sure where I am going wrong!

    I provide some simplified code based on available data for reference. Can anyone spot where I am going wrong?

    Code:
    clear
    use http://www.stata-press.com/data/r16/friedman2.dta
    
    g year = yofd(dofq(time))
    keep if year>=1980
    
    g timen=_n
    regress consump timen
    
    * Rescale time for yearly estimate
    replace timen=timen/4
    regress consump timen
    
    * Analyses on year data 
    collapse (sum) consump, by(year)
    g timen=_n
    regress consump timen






  • #2
    You can only sum when collapsing if you have data for four quarters each year. Then the slope coefficient will be scaled by a factor of 4. Taking the mean is safe as it will disregard missing quarters in a year if they exist. In your case, 1998 has 3 quarters.

    Comment


    • #3
      HI Andrew Musau , thanks for your reply and apologies I did not notice that 1998 only had 3 quarters. In my data I have 4 quarters for all years.

      Can I just check though (and again I apologise if this is a silly question), why the results entering each quarter as a timepoint differ from when I collapse(sum) the data by year and then scale the slope by a factor of 4? By rescaling I am obtaining slope by quarter, correct?

      Is there a way to obtain equivalent results to #2 without having to first collapse the data- is approach #3 correct?

      Code:
      clear
      use http://www.stata-press.com/data/r16/friedman2.dta
      g year = yofd(dofq(time))
      keep if year>=1980
      drop if year==1998 //incomplete
      
      * #1: Using raw data
      g timen=_n
      regress consump timen
      
      * #2: Collapsing to yearly data
      collapse (sum) consump, by(year)
      g timen=_n replace timen=timen*4 //rescale here
      regress consump timen
      
      #3
      clear
      use http://www.stata-press.com/data/r16/friedman2.dta
      g year = yofd(dofq(time))
      keep if year>=1980
      drop if year==1998
      g timen=_n
      replace timen=timen/4
      regress consump timen
      Last edited by Jen Ward; 13 Jul 2022, 08:22.

      Comment


      • #4
        Correct. The OLS coefficient represents the change in the outcome for a unit change in the regressor, holding fixed other regressors. So if a unit is a quarter, then that is what you get. The multiplication by 4 approximates the quarter slope coefficient, but we are changing the data and precision starts to bite at some level. We can get closer using integer values. But note that this discussion relates only to the estimated slope coefficient, there are different implications in terms of the estimated standard errors.

        Code:
        clear
        use http://www.stata-press.com/data/r16/friedman2.dta
        g year = yofd(dofq(time))
        keep if year>=1980
        drop if year==1998 //incomplete
        
        * #2: Collapsing to yearly data
        recast double consump
        replace consump=int(consump)
        collapse (sum) consump, by(year)
        g timen=_n
        replace timen=timen*4 //rescale here
        regress consump timen
        
        #3
        clear
        use http://www.stata-press.com/data/r16/friedman2.dta
        replace consump=int(consump)
        g year = yofd(dofq(time))
        keep if year>=1980
        drop if year==1998
        g timen=_n
        replace timen=timen/4
        regress consump timen
        Res.:

        Code:
        #2
        
        . regress consump timen
        
              Source |       SS           df       MS      Number of obs   =        18
        -------------+----------------------------------   F(1, 16)        =   5649.35
               Model |   374422800         1   374422800   Prob > F        =    0.0000
            Residual |  1060434.41        16  66277.1508   R-squared       =    0.9972
        -------------+----------------------------------   Adj R-squared   =    0.9970
               Total |   375483234        17  22087249.1   Root MSE        =    257.44
        
        ------------------------------------------------------------------------------
             consump |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               timen |    219.773   2.923984    75.16   0.000     213.5744    225.9715
               _cons |   5661.294    126.601    44.72   0.000     5392.912    5929.676
        ------------------------------------------------------------------------------
        
        .
        . #3
        
        . regress consump timen
        
              Source |       SS           df       MS      Number of obs   =        72
        -------------+----------------------------------   F(1, 70)        =  23439.76
               Model |  93878608.9         1  93878608.9   Prob > F        =    0.0000
            Residual |   280357.07        70    4005.101   R-squared       =    0.9970
        -------------+----------------------------------   Adj R-squared   =    0.9970
               Total |    94158966        71  1326182.62   Root MSE        =    63.286
        
        ------------------------------------------------------------------------------
             consump |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               timen |   219.7744   1.435491   153.10   0.000     216.9114    222.6374
               _cons |   1497.725   15.07337    99.36   0.000     1467.662    1527.788
        ------------------------------------------------------------------------------

        Comment


        • #5
          Thank you so much for your help with this Andrew Musau

          SO to check #2 and #3 estimate the slope coefficient for quarter. Is the SE smaller in #3 as we are using more granular data which makes for better estimates?

          Is #1 wrong? it originally seemed like the correct way to model the quarter. The time variable being coded 1, 2, 3 etc. I would have thought the change unit would reflect the quarter... why does the -replace timen=timen/4- produce the wanted results instead?

          Comment


          • #6
            Originally posted by Jen Ward View Post
            Thank you so much for your help with this Andrew Musau

            SO to check #2 and #3 estimate the slope coefficient for quarter.
            A unit is a year if we divide by 4. Consider that when we tsset data and are using quarterly data, we need to have a quarterly time variable or declare

            Code:
            xtset time, quarterly
            Absent of such a declaration and having time values in integer values, Stata will treat the value 1 as one time unit. In your data, you want to shift from quarterly to annual data, so your reference is a year which is 4 quarters. One way to let Stata understand that there are 4 quarters in a year with no tssettings is to divide the integer time values by 4.


            Is the SE smaller in #3 as we are using more granular data which makes for better estimates?
            Absolutely. Remember that the calculation of the standard error has the sample size in the denominator, so with more observations, you get smaller standard errors.
            Last edited by Andrew Musau; 13 Jul 2022, 11:59.

            Comment


            • #7
              To add on #6, if you want a quarterly coefficient rather than annual, use integer values \(1, 2, 3, \cdots\) for the time variable on the original data. Then if you collapse (sum) by year, the scaling coefficient will be 16. The collapsing aggregates data over 4 quarters and a year has 4 quarters, hence \(4\times 4\).

              Code:
              clear
              use http://www.stata-press.com/data/r16/friedman2.dta
              g year = yofd(dofq(time))
              keep if year>=1980
              drop if year==1998 //incomplete
              
              * #2: Collapsing to yearly data
              recast double consump
              replace consump=int(consump)
              collapse (sum) consump, by(year)
              g timen=_n*16
              regress consump timen
              
              #3
              clear
              use http://www.stata-press.com/data/r16/friedman2.dta
              replace consump=int(consump)
              g year = yofd(dofq(time))
              keep if year>=1980
              drop if year==1998
              g timen=_n
              regress consump timen
              Res.:

              Code:
              . g timen=_n*16
              
              . regress consump timen
              
                    Source |       SS           df       MS      Number of obs   =        18
              -------------+----------------------------------   F(1, 16)        =   5649.35
                     Model |   374422800         1   374422800   Prob > F        =    0.0000
                  Residual |  1060434.41        16  66277.1508   R-squared       =    0.9972
              -------------+----------------------------------   Adj R-squared   =    0.9970
                     Total |   375483234        17  22087249.1   Root MSE        =    257.44
              
              ------------------------------------------------------------------------------
                   consump |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     timen |   54.94324   .7309961    75.16   0.000      53.3936    56.49288
                     _cons |   5661.294    126.601    44.72   0.000     5392.912    5929.676
              ------------------------------------------------------------------------------
              
              .
              . #3
              
              . g timen=_n
              
              . regress consump timen
              
                    Source |       SS           df       MS      Number of obs   =        72
              -------------+----------------------------------   F(1, 70)        =  23439.76
                     Model |  93878608.9         1  93878608.9   Prob > F        =    0.0000
                  Residual |   280357.07        70    4005.101   R-squared       =    0.9970
              -------------+----------------------------------   Adj R-squared   =    0.9970
                     Total |    94158966        71  1326182.62   Root MSE        =    63.286
              
              ------------------------------------------------------------------------------
                   consump |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     timen |    54.9436   .3588728   153.10   0.000     54.22785    55.65935
                     _cons |   1497.725   15.07337    99.36   0.000     1467.662    1527.788
              ------------------------------------------------------------------------------
              Last edited by Andrew Musau; 13 Jul 2022, 13:05.

              Comment


              • #8
                Thanks Andrew Musau - this is really helpful!

                One final questions, for regions where I have yearly data would I then have to divide the slope by 16? I was dividing by 4 but based on #7, it should be 16 - is this correct?

                Comment


                • #9
                  It depends on how the outcome is measured. For example, if the outcome is income accumulated (summed) over the year, then you multiply by 16. If it is averaged over quarters, then you multiply by 4. Note that this will work for flows but not stocks. Using the same analogy, if the outcome is wealth recorded at the end of the year, then there is no way to recover the quarterly coefficient because the annual amount is measured at a point of time and not over a period of time.

                  Comment


                  • #10
                    Thanks again Andrew Musau !

                    I still can't get my head around the /16 (with accumulated data). Do I first divide by 4 to get an average for quarter sales and further divide by 4 to to get quarter unit and so mimic the tsset command?

                    Comment


                    • #11
                      Let's try one more example. Assume that investment for company 1 in the Grunfeld data is measured quarterly, and we generate a variable that measures the change in investment from one quarter to the next.

                      Code:
                      webuse grunfeld, clear
                      keep if company==1
                      keep invest time
                      tsset time
                      egen year= seq(), block(4)
                      g double change= D.invest/L.invest
                      keep if year>=2
                      replace time=_n
                      *REGRESSION ON QUARTERLY CHANGES (DELTA= 1 QUARTER)
                      regress change time
                      
                      *AGGREGATE (SUM) CHANGE AND COLLAPSE TO GET ANNUAL DATA
                      collapse (sum) change, by(year)
                      gen time= _n*16
                      regress change time
                      Res.:

                      Code:
                      . *REGRESSION ON QUARTERLY RETURNS (DELTA= 1 QUARTER)
                      
                      .
                      . regress change time
                      
                            Source |       SS           df       MS      Number of obs   =        16
                      -------------+----------------------------------   F(1, 14)        =      0.06
                             Model |  .001861061         1  .001861061   Prob > F        =    0.8099
                          Residual |  .433682528        14  .030977323   R-squared       =    0.0043
                      -------------+----------------------------------   Adj R-squared   =   -0.0669
                             Total |  .435543588        15  .029036239   Root MSE        =      .176
                      
                      ------------------------------------------------------------------------------
                            change |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                              time |   .0023396   .0095451     0.25   0.810    -.0181327    .0228119
                             _cons |     .10808   .0922972     1.17   0.261    -.0898777    .3060377
                      ------------------------------------------------------------------------------
                      
                      .
                      .
                      .
                      . *AGGREGATE (SUM) RETURN AND COLLAPSE TO GET ANNUAL DATA
                      
                      .
                      . collapse (sum) change, by(year)
                      
                      .
                      . gen time= _n*16
                      
                      .
                      . regress change time
                      
                            Source |       SS           df       MS      Number of obs   =         4
                      -------------+----------------------------------   F(1, 2)         =      0.03
                             Model |  .007502298         1  .007502298   Prob > F        =    0.8804
                          Residual |  .517037002         2  .258518501   R-squared       =    0.0143
                      -------------+----------------------------------   Adj R-squared   =   -0.4785
                             Total |    .5245393         3  .174846433   Root MSE        =    .50845
                      
                      ------------------------------------------------------------------------------
                            change |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                              time |    .002421   .0142115     0.17   0.880    -.0587263    .0635683
                             _cons |   .4150267    .622718     0.67   0.574    -2.264313    3.094366
                      ------------------------------------------------------------------------------

                      Now to the flip side, if we have annual data on a flow, the question is what does it represent? As above, is it the aggregate change in investment over 4 quarters or it is it the average quarterly change in investment? If the former, you can approximate the quarterly coefficient by multiplying the trend by 16 (first reversing the aggregation and secondly, moving from annual to quarterly). If it is averaged, you multiply the trend by 4 (to move from annual to quarterly - see below).

                      Code:
                      webuse grunfeld, clear
                      keep if company==1
                      keep invest time
                      tsset time
                      egen year= seq(), block(4)
                      g double change= D.invest/L.invest
                      keep if year>=2
                      replace time=_n
                      *REGRESSION ON QUARTERLY CHANGES (DELTA= 1 QUARTER)
                      regress change time
                      
                      *AVERAGE CHANGE AND COLLAPSE TO GET ANNUAL DATA
                      collapse (mean) change, by(year)
                      gen time= _n*4
                      regress change time
                      Res.:

                      Code:
                      . regress change time
                      
                            Source |       SS           df       MS      Number of obs   =        16
                      -------------+----------------------------------   F(1, 14)        =      0.06
                             Model |  .001861061         1  .001861061   Prob > F        =    0.8099
                          Residual |  .433682528        14  .030977323   R-squared       =    0.0043
                      -------------+----------------------------------   Adj R-squared   =   -0.0669
                             Total |  .435543588        15  .029036239   Root MSE        =      .176
                      
                      ------------------------------------------------------------------------------
                            change |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                              time |   .0023396   .0095451     0.25   0.810    -.0181327    .0228119
                             _cons |     .10808   .0922972     1.17   0.261    -.0898777    .3060377
                      ------------------------------------------------------------------------------
                      
                      .
                      .
                      .
                      . *AVERAGE CHANGE AND COLLAPSE TO GET ANNUAL DATA
                      
                      .
                      . collapse (mean) change, by(year)
                      
                      .
                      . gen time= _n*4
                      
                      .
                      . regress change time
                      
                            Source |       SS           df       MS      Number of obs   =         4
                      -------------+----------------------------------   F(1, 2)         =      0.03
                             Model |  .000468894         1  .000468894   Prob > F        =    0.8804
                          Residual |  .032314813         2  .016157406   R-squared       =    0.0143
                      -------------+----------------------------------   Adj R-squared   =   -0.4785
                             Total |  .032783706         3  .010927902   Root MSE        =    .12711
                      
                      ------------------------------------------------------------------------------
                            change |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                              time |    .002421   .0142115     0.17   0.880    -.0587263    .0635683
                             _cons |   .1037567   .1556795     0.67   0.574    -.5660782    .7735915
                      ------------------------------------------------------------------------------

                      Comment


                      • #12
                        Thank you!!! I very much appreciate you taking the time to answer all my question and provide examples. It is all much clearer now

                        Comment

                        Working...
                        X