Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Moving Average in Panel Data

    Hi, I am trying to calculate Moving Average in my panel data. I am a bit confused between various stata packages such as tsegen, tssmooth and rangestat. At present, I am using rangestat using the following code:

    I have the unbalanced panel of around 3 million observations.

    xtset gvkey date
    rangestat (mean) mean_class = class, interval(dom -15 -1) by(gvkey)
    gen class_delta2 = ln(class+1) - ln(mean_class+1)
    replace class_delta2 = 0 if class_delta2 == .

    I get the following result:

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double gvkey float date double class float(dom class_delta1) double mean_class float class_delta2
    1004 18998 1 6 0 7.614545454545454 -1.460305
    1004 19002 1 10 0 7.592369477911647 -1.4577273
    1004 19005 2 13 0 8.172932330827068 -1.1176447
    1004 19008 1 16 0 8.637785800240675 -1.5725442
    1004 19010 2 18 0 8.895432692307692 -1.193461
    1004 19015 2 23 0 9.40621266427718 -1.2437907
    1004 19023 1 31 0 9.390361445783132 -1.6477314
    1004 19024 1 1 0 . 0
    1004 19026 1 3 0 7.743119266055046 -1.47512
    1004 19029 1 6 0 7.614545454545454 -1.460305
    1004 19033 1 10 0 7.592369477911647 -1.4577273
    1004 19038 3 15 0 8.452319587628866 -.8599658
    1004 19040 5 17 0 8.78820697954272 -.4894188
    1004 19043 1 20 0 9.221419975932612 -1.6313384
    1004 19044 1 21 0 9.277844311377246 -1.6368433
    1004 19045 1 22 0 9.342925659472423 -1.6431556
    1004 19046 1 23 0 9.40621266427718 -1.649256
    1004 19047 3 24 .6931472 9.657517899761336 -.9799712
    1004 19050 1 27 0 9.767220902612827 -1.6833593
    1004 19051 1 28 0 9.730310262529834 -1.6799253
    1004 19053 1 1 0 . 0
    1004 19054 1 2 0 7.6 -1.458615
    1004 19058 1 6 0 7.614545454545454 -1.460305
    1004 19061 3 9 0 7.497737556561086 -.7535056
    1004 19062 1 10 -.6931472 7.592369477911647 -1.4577273
    1004 19064 2 12 0 7.891268533772652 -1.0864575
    1004 19070 1 18 0 8.895432692307692 -1.598926
    1004 19071 4 19 .9162908 9.040767386091128 -.6972156
    1004 19072 5 20 .18232156 9.221419975932612 -.53272605
    1004 19073 8 21 .4054651 9.277844311377246 -.13276596
    end
    format %td date

    Problem: I am not sure whether I am using the right package or not. I just need to calculate the average of previous 15 days, what happens to the last observations in the sample? How their average will be calculated? How to deal with the missing values and dates while calculating the average?

    Your time and advise highly appreciated.

  • #2
    You can use any of these programs and get the results you want.

    But I'm not sure you've used -rangestsat- correctly. By specifying -interval(dom, -15 -1)- you are not getting the average of the previous 15 days. You are getting the average of all days where the day of the month is 1 to 15 days lower than the current day of the month, regardless of what month the observation comes from. So, for example, if we have an observation dated 6jan2012, your "moving" observation will include all observations for the same gvkey that occur on the 1st, 2nd, 3rd, 4th, or 5th of any month in any year. That is perfectly legal, but I'd be very surprised if that's what you want. I think what you want is:

    Code:
    rangestat (mean) mean_class = class, interval(date -15 -1) by(gvkey)
    This will calculate the mean value of class for all observations of the same gvkey that have dates that fall between 1 and 15 days before the current date. Since your data has time gaps, the number of such observations will vary from one case to the next.

    The last observations in the panels shouldn't be any particular problem. I'm not even sure why you ask about them. Is there some specific problem you anticipate?

    How to deal with the missing values and dates while calculating the average?
    You tell me. That depends on your specific problem and what you are doing.

    Comment


    • #3
      Hi Clyde, Thanks a lot for your answer. This almost solves my problem as I was wrongly applying the code. However, Can you please explain a little bit further that what happens in case of gaps? Does it skip the observation or count them as zero?

      Comment


      • #4
        When you are calculating a mean in any Stata command I can think of, a missing value is simply ignored. So if you have some observation where there are only 12 available values in the window from 15 days before to 1 day before the current date (whether because there simply is no observation at all, or because there are observations but the value of the variable being averaged is a missing value) you get the sum of those 12 available values, divided by 12.

        Comment


        • #5
          rangestat doesn't even have a concept of gaps. It does what you ask for with the data available in the intervals specified. It's indifferent to spacing of values within. No data at all in an interval and results will be missing, but that's generally true.

          Comment

          Working...
          X