Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to build a moving average, which only executes if at least 3 of the 5 observations needed are not missing values?


    Dear Stata list users:

    I need to create a moving-average over 5 years for a (ratio-)variable in my panel data time series for each company and each year.
    My panel data is sorted by year (time variable) and company_id (panel variable) and declared as a time series with xtset.

    At this, the rolling average over the preceding 5 years for each company in each year, should be only created if at least 3 out of the 5 needed observations are not missing.
    If less than 5 observations are available the average should be computed without the number of missing observations e.g. if 3 observations (x) are available: (x1+x2+x3)/3 (instead of dividing by 5)


    Here is a section of my dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year float(company_id I dummy I_mov_avg_5y I_ma_500 I_ma_5y)
    2003 1         . 1 .         .         .
    2004 1         . 1 .         .         .
    2005 1  .2720588 . .         .         .
    2006 1  .4888889 . .         .  .2720588
    2007 1 .13846155 . .         . .38047385
    2008 1         0 . .         .  .2998031
    2009 1         0 . .         .  .2248523
    2010 1         0 . . .17988186 .17988186
    2011 1         0 . . .12547009 .12547009
    2012 1         . 1 . .02769231 .02769231
    end
    format %ty year

    These are the commands, which I used trying to build the moving average.

    1. Generating of the "ratio variable" R&D Expenditures/Sales (I)

    gen I = RnD_expense/net_sales, after(company_name )

    2. "First try" using tssmooth

    tssmooth ma I_ma_5y = I, window(5 0 0)

    Problem here : moving average is always executed,even if 3 (or more) missing observations exist

    3. "Second try"

    by company_id: generate I_ma_500 = (L5.I + L4.I + L3.I + L2.I + L1.I)/ 5

    Problem here: This command only executes if exactly 5 observations in a sequence are available;

    if there is a break in the sequence (due to a missing observation) the moving average is not calculated
    (even if e.g. 4 remaining observations exist).

    Hence I tried to modify, this command by first creating a dummy which indicates a missing value of the variable I (shows1 for missing value)
    and then setting the condition, that if the sum of the preceding 5 values of the dummy variable isn't higher than 2 this means, that at least 3 observations are available;

    Generating of dummy:

    gen dummy =1 if missing(I)

    by company_id: generate I_mov_avg_5y = (L5.I + L4.I + L3.I + L2.I + L1.I)/ 5 ///
    if (L5.dummy + L4.dummy+ L3.dummy + L2.dummy + L1.dummy)<=2


    Problem with this command: all observations of the moving average took the value "."


    I also tried to use commands like mdesc or asrol, but couldn't quite figure out, how to
    maybe solve this problem with the help of these two commands.
    Thus i would appreciate, if some of you could help me in finding a solution.


    Thank you very much in advance,

    Tamer
    Last edited by Tamer von Steinsdorff; 25 Jul 2018, 19:24.

  • #2
    You can try asrol. Here is one example

    Code:
    * Installation
    
    ssc install asrol
    
    * Download example data
     
    webuse grunfeld, clear
     
    * A 5-period rolling mean for each company with the minimum of 3 observations 
    
      bys company: asrol invest, stat(mean) win(year 5) min(3)
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Hallo Attaullah,

      first of, thanks for your advice.
      But does this example dataset really apply to the problem in my dataset, as my data contains missing values for observations and 'webuse grunfeld' doesn't ?

      Furthermore the window of the asrol command doesn't seem modifiable to compute the preceding 5 years.

      As the help file states:
      "The second argument should be a number that specifies the length of the rolling window. For example, if our time variable is year and
      we want a rolling window of 5 observations, (that is, the current observation and previous 4 observations), then option window will look like: window(year 5)"

      Comment


      • #4
        Your dummy should be

        Code:
        gen dummy = missing(l)
        as it being 0 or 1 is essential for your sum to work.

        Comment


        • #5
          Or with asrol:
          Code:
          webuse grunfeld, clear
          replace invest=. if invest>400
          bys company (year): gen investlag1 = L1.invest
          bys company: asrol invest, stat(mean) win(year 5) min(3)
          bys company: asrol investlag1, stat(mean) win(year 5) min(3)
          And note i created some missings here. Look at the result and see if this is now what you intended

          Comment


          • #6
            Thank you all, for your inputs.

            I tried it with Jorrit's code and it worked like a charm:

            bys company_id (year): gen I_lag1 = L1.I
            bys company_id: asrol I_lag1, stat(mean) win(year 5) min(3)
            rename mean5_I_lag1 I_ma_500

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float company_id int year float(I I_lag1) double I_ma_500
            
            id  year      I         I_lag1           I_ma_500
            1 2003         .         .                   .
            1 2004         .         .                   .
            1 2005  .2720588         .                   .
            1 2006  .4888889  .2720588                   .
            1 2007 .13846155  .4888889                   .
            1 2008         0 .13846155   .2998030831416448
            1 2009         0         0   .2248523123562336
            1 2010         0         0  .17988184988498687
            1 2011         0         0   .1254700869321823
            1 2012         .         0 .027692309021949767
            end
            format %ty year

            Although I am still quite suprised, that the asrol command with a moving average, won't let you modify the window, in which the averages should be executed, like tssmooth ma does.

            Comment


            • #7
              The number of periods to be included in the window are modifiable. For example, for a 10-period rolling window, the command will look like:
              Code:
              bys company_id: asrol I_lag1, stat(mean) win(year 10) min(3)
              So I am not quite sure what are you referring to
              Although I am still quite suprised, that the asrol command with a moving average, won't let you modify the window,
              Last edited by Attaullah Shah; 26 Jul 2018, 10:48.
              Regards
              --------------------------------------------------
              Attaullah Shah, PhD.
              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
              FinTechProfessor.com
              https://asdocx.com
              Check out my asdoc program, which sends outputs to MS Word.
              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

              Comment


              • #8
                Hallo Mr Shah,

                I did not want to offend anybody.
                I would be lost without this great command.

                My remark was just intended to ask if it would be also possible that the asrol command could look like this:
                bys company_id: asrol varname, stat(mean) win(year 5 0 0) min(3)

                That is, defining the rolling average window inside of the win() brackets [like in the command tssmooth ma].
                If this would be possible, one wouldn't have to create an additional lagged variable to e.g. execute a rolling average over the preceding 5 years.

                Best Regards,
                Tamer von Steinsdorff

                Comment


                • #9
                  rangestat (from SSC) is a general tool for calculating statistics over a range of observations and can, as with tssmooth, calculate a rolling average using the previous 5 years (that is excluding the current observation). If you request a count of non-missing observations within the window, you can also exclude results below the desired minimum.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float company_id int year float I
                  1 2003         .
                  1 2004         .
                  1 2005  .2720588
                  1 2006  .4888889
                  1 2007 .13846155
                  1 2008         0
                  1 2009         0
                  1 2010         0
                  1 2011         0
                  1 2012         .
                  end
                  format %ty year
                  
                  rangestat (mean) I (count) I, interval(year -5 -1) by(company_id)
                  replace I_mean = . if I_count < 3
                  
                  list
                  and the results
                  Code:
                  . list
                  
                       +--------------------------------------------------+
                       | compan~d   year          I      I_mean   I_count |
                       |--------------------------------------------------|
                    1. |        1   2003          .           .         . |
                    2. |        1   2004          .           .         0 |
                    3. |        1   2005   .2720588           .         0 |
                    4. |        1   2006   .4888889           .         1 |
                    5. |        1   2007   .1384615           .         2 |
                       |--------------------------------------------------|
                    6. |        1   2008          0   .29980308         3 |
                    7. |        1   2009          0   .22485231         4 |
                    8. |        1   2010          0   .17988185         5 |
                    9. |        1   2011          0   .12547009         5 |
                   10. |        1   2012          .   .02769231         5 |
                       +--------------------------------------------------+

                  Comment


                  • #10
                    And since this is panel data, you can also get there with a one liner using tsegen (from SSC):

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input float company_id int year float I
                    1 2003         .
                    1 2004         .
                    1 2005  .2720588
                    1 2006  .4888889
                    1 2007 .13846155
                    1 2008         0
                    1 2009         0
                    1 2010         0
                    1 2011         0
                    1 2012         .
                    end
                    format %ty year
                    tsset company_id year
                    
                    tsegen Imean = rowmean(L(1/5).I, 3)
                    
                    list
                    and the results
                    Code:
                    . tsegen Imean = rowmean(L(1/5).I,3)
                    (5 missing values generated)
                    
                    . 
                    . list
                    
                         +---------------------------------------+
                         | compan~d   year          I      Imean |
                         |---------------------------------------|
                      1. |        1   2003          .          . |
                      2. |        1   2004          .          . |
                      3. |        1   2005   .2720588          . |
                      4. |        1   2006   .4888889          . |
                      5. |        1   2007   .1384615          . |
                         |---------------------------------------|
                      6. |        1   2008          0   .2998031 |
                      7. |        1   2009          0   .2248523 |
                      8. |        1   2010          0   .1798819 |
                      9. |        1   2011          0   .1254701 |
                     10. |        1   2012          .   .0276923 |
                         +---------------------------------------+

                    Comment


                    • #11
                      Dear Tamer von Steinsdorff Thanks for the additional details. So you wanted to exclude the focal observation. Yes, that can be done using the option xfocal of asrol, that would eliminate the need for creating a lagged variable. See:

                      Code:
                      bys company_id: asrol I, stat(mean) win(year 6) xfocal(focal) min(3) gen(mean5)
                      list
                      
                           +----------------------------------------+
                           | compan~d   year          I       mean5 |
                           |----------------------------------------|
                        1. |        1   2003          .           . |
                        2. |        1   2004          .           . |
                        3. |        1   2005   .2720588           . |
                        4. |        1   2006   .4888889           . |
                        5. |        1   2007   .1384615           . |
                           |----------------------------------------|
                        6. |        1   2008          0   .29980308 |
                        7. |        1   2009          0   .22485231 |
                        8. |        1   2010          0   .17988185 |
                        9. |        1   2011          0   .12547009 |
                       10. |        1   2012          .   .02769231 |
                           +----------------------------------------+
                      Regards
                      --------------------------------------------------
                      Attaullah Shah, PhD.
                      Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                      FinTechProfessor.com
                      https://asdocx.com
                      Check out my asdoc program, which sends outputs to MS Word.
                      For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                      Comment


                      • #12
                        The point is more than being able to exclude the focal observation. In some fields averages of previous values are standard and the window may extend to include the present observation. In other fields. it is common, if not standard, to centre windows on the present observation. For example, this is prevailing practice in climatology, hydrology and various other environmental sciences. Hanning (1/4 previous + 1/2 present + 1/4 next) was named to reflect its use circa 1900 by the climatologist Julius von Hann and similar moving averages go back two centuries in climatology. Naturally with this set-up you can't calculate for complete k-year windows until all the k values are in. There is a similar problem for backwards windows at the start of each series.

                        Whatever the window used, you can always shift the results using a lag or lead operation after the smoothing.

                        Comment

                        Working...
                        X