Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Moving average with panel data

    Hello all,

    Below is an example of the data I am working with:

    Code:
    input float(year quarter miles)
    2000 1 5
    2000 1 2
    2000 1 7
    2000 2 3
    2000 2 6 
    2000 3 8
    2000 3 9
    2000 3 2
    2000 3 1
    2000 3 7
    2000 4 8
    2000 4 9
    2000 4 4
    2000 4 5
    2000 4 3
    2000 4 4
    2001 1 2
    2001 1 2
    2001 2 3
    2001 2 4
    2001 3 5
    2001 3 6
    2001 3 4
    2001 4 6
    2001 4 3
    2001 4 2
    2001 4 5
    2002 1 3
    2002 1 7
    2002 1 7
    2002 1 4
    2002 2 4
    2002 2 6
    2002 3 7
    2002 3 5
    2002 4 5
    2002 4 3
    2002 4 2
    I would like to generate a new variable called "moving_average" that creates a moving average for the miles variable by year and quarter. For instance, 2000Q1 should have its own moving average, 2000Q2 should have its own moving average, etc. I would appreciate any assistance with this!

    Thanks,
    Anoush

  • #2
    I don't understand what you want. A moving average has a window specifying how many lagged and leading observations are included around each observation (and whether the current observation is included). You say nothing about that. Perhaps you mean you want a running average in each year? That would start with the first observation of a year being its own running average. Then for the second observation we take the average of the first two observations. Then for the third observation the average of the first three, etc. until we reach the end of the year. If that's what you meant:
    Code:
    sort year quarter, stable
    by year: gen running_average = sum(miles)/_n

    Comment


    • #3
      Clyde Schechter thank you for your response. I apologize, I must have misunderstood the meaning of moving average.

      I suppose for the moving average I would like to use two lagged observations, the current observation, and two leading observations. Is there a way to calculate this? I would appreciate any help!

      Anoush

      Comment


      • #4
        Yes. It's just a little more complicated:
        Code:
        gen int qdate = yq(year, quarter)
        format qdate %tq
        
        sort qdate, stable
        gen long seq = _n
        tsset seq
        
        levelsof year, local(years)
        gen moving_average = .
        foreach y of local years {
            tssmooth ma temp = miles if year == `y', window(2 1 2)
            replace moving_average = temp if year == `y'
            drop temp
        }

        Comment


        • #5
          Thank you so much, Clyde Schechter. You are always so helpful!

          Anoush

          Comment


          • #6
            There are two important wrinkles here.

            First, there are unequal numbers of observations in the example at each date, which suggests that a (better) alternative is to take means before smoothing. That can be done with a collapse, then a smooth, then a merge.

            Second, equal weights as in

            (1/5) * value at t - 2 + (1/5) * value at t - 1 + (1/5) * value at t + (1/5) * value at t + 1 + (1/5) * value at t + 2

            have no virtue beyond extreme simplicity.

            There are many alternatives of which the method called Hanning by John W. Tukey: (1/4) previous + (1/2) this + (1/4) next) is simple enough, and can extended to any odd number of weights by convolution, so yielding weights in the proportions

            1 2 1
            1 4 6 4 1
            1 5 10 10 5 1

            and so on. These are just binomial coefficients and these smoothers are often called binomial filters or smoothers. Just about any time series book has a chapter on smoothing before proceeding to allowed principles of witchcraft such as ARIMA modelling. The story starts with

            1. If the aim of smoothing is to summarize values around here then values near here carry more information than those further away.

            2. A smoother needs to be thought of in terms of what happens in the frequency domain and Hanning and its binomial siblings are good enough for most fairly simple purposes.

            Here is that done for the data in #1 with a bonus: I show how a spike is smoothed, thus exposing the weights. to view.


            Code:
            clear
            
            input float(year quarter miles)
            2000 1 5
            2000 1 2
            2000 1 7
            2000 2 3
            2000 2 6
            2000 3 8
            2000 3 9
            2000 3 2
            2000 3 1
            2000 3 7
            2000 4 8
            2000 4 9
            2000 4 4
            2000 4 5
            2000 4 3
            2000 4 4
            2001 1 2
            2001 1 2
            2001 2 3
            2001 2 4
            2001 3 5
            2001 3 6
            2001 3 4
            2001 4 6
            2001 4 3
            2001 4 2
            2001 4 5
            2002 1 3
            2002 1 7
            2002 1 7
            2002 1 4
            2002 2 4
            2002 2 6
            2002 3 7
            2002 3 5
            2002 4 5
            2002 4 3
            2002 4 2
            end
            
            tab year quarter
            
            save anoush, replace
            
            collapse miles, by(year quarter)  
            gen qdate = yq(year, quarter)
            tsset qdate
            
            gen spike = _n == floor(_N/2)
            tssmooth nl miles_s=miles, smoother(HH)
            tssmooth nl spike_s=spike, smoother(HH)
            
            list, sepby(year)
            
            merge 1:m year quarter using anoush

            Code:
                +----------------------------------------------------------------+
                 | year   quarter      miles   qdate   spike    miles_s   spike_s |
                 |----------------------------------------------------------------|
              1. | 2000         1   4.666667     160       0   4.666667         0 |
              2. | 2000         2        4.5     161       0       4.85         0 |
              3. | 2000         3        5.4     162       0   4.941667         0 |
              4. | 2000         4        5.5     163       0     4.4125     .0625 |
                 |----------------------------------------------------------------|
              5. | 2001         1          2     164       0       3.65       .25 |
              6. | 2001         2        3.5     165       1    3.65625      .375 |
              7. | 2001         3          5     166       0   4.203125       .25 |
              8. | 2001         4          4     167       0    4.59375     .0625 |
                 |----------------------------------------------------------------|
              9. | 2002         1       5.25     168       0    4.90625         0 |
             10. | 2002         2          5     169       0   5.145833         0 |
             11. | 2002         3          6     170       0   4.703125         0 |
             12. | 2002         4   3.333333     171       0   3.333333         0 |
                 +----------------------------------------------------------------+


            More discussion at

            Code:
            . search bsmplot, sj historical
            
            Search of official help files, FAQs, Examples, and Stata Journals
            
            SJ-4-4 gr22_1 . . . . . . . . . . . . . . . . . Software update for bsmplot
            (help bsmplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
            Q4/04 SJ 4(4):490
            binomial smoothing plot program rewritten so that it now
            produces Stata 8 graphs
            
            STB-35 gr22 . . . . . . . . . . . . . . . . . . . . Binomial smoothing plot
            (help bsmplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
            1/97 pp.7--9; STB Reprints Vol 6, pp.36--38
            produce a plot of both yvar and the result of smoothing yvar
            by a binomial filter against xvar
            Last edited by Nick Cox; 04 Jun 2022, 02:09.

            Comment


            • #7
              Nick Cox thank you so much! Very helpful!

              Comment

              Working...
              X