Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Command for Calculating the Average for 7 days periods in Panel Data

    Dear Stata Specialists,

    I have a daily panel data from 01/01/2012 to 12/31/2017 with 10 different regions.

    I would like to generate two variables that represent 7 days the average.

    For example below based on Excel,

    Cell D9 = average (C3:C9), this value is the average rate for 7 days from 01/02/2012 to 01/08/2012).

    Cell E9 = average (C2:C8), his value is the average rate for 7 days from 01/01/2012 to 01/07/2012).

    Since I have 10 different regions (Region 1-10), it is really difficult to calculate the average manually in Excel.

    I would like to generate those columns D and E in STATA based on STATA code.

    Could you please help me on the code?

    Thank you very much in advance.
    Click image for larger version

Name:	average example.jpg
Views:	1
Size:	147.0 KB
ID:	1473563




  • #2
    It is premature to ask for help with code when you have not even imported your data to Stata. The approach to coding solutions to a problem sometimes depends on details of the Stata data set that cannot be seen in the spreadsheet from which it comes, nor even from a screenshot of the Stata data set in the browser. This is all made clear in the Forum FAQ, which everybody is requested to read prior to posting.

    In the future, when showing data examples, please always use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Dear Clyde Schechter,

      Thank you for your response and sorry for your inconvenience.

      Based on your comment, I am uploading my data using "dataex" in Stata as follows:


      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long date str9 time byte region float(rate avg7_inc avg7_exc)
      20120101 "1-Jan-12"  1 .011    .    .
      20120102 "2-Jan-12"  1  .01    .    .
      20120103 "3-Jan-12"  1 .008    .    .
      20120104 "4-Jan-12"  1 .012    .    .
      20120105 "5-Jan-12"  1 .007    .    .
      20120106 "6-Jan-12"  1 .008    .    .
      20120107 "7-Jan-12"  1 .009    .    .
      20120108 "8-Jan-12"  1 .009 .009 .009
      20120109 "9-Jan-12"  1 .023 .011 .009
      20120110 "10-Jan-12" 1 .013 .012 .011
      20120111 "11-Jan-12" 1  .01 .011 .012
      20120112 "12-Jan-12" 1 .007 .011 .011
      20120113 "13-Jan-12" 1  .01 .012 .011
      20120114 "14-Jan-12" 1 .008 .011 .012
      20120115 "15-Jan-12" 1 .007 .011 .011
      20120116 "16-Jan-12" 1 .007 .009 .011
      20120117 "17-Jan-12" 1 .008 .008 .009
      20120118 "18-Jan-12" 1 .017 .009 .008
      20120119 "19-Jan-12" 1  .01  .01 .009
      20120120 "20-Jan-12" 1 .005 .009  .01
      20120121 "21-Jan-12" 1 .005 .008 .009
      20120122 "22-Jan-12" 1 .005 .008 .008
      20120123 "23-Jan-12" 1 .004 .008 .008
      20120124 "24-Jan-12" 1 .006 .007 .008
      20120125 "25-Jan-12" 1 .006 .006 .007
      20120126 "26-Jan-12" 1 .009 .006 .006
      20120127 "27-Jan-12" 1  .01 .006 .006
      20120128 "28-Jan-12" 1 .007 .007 .006
      end
      ------------------ copy up to and including the previous line ------------------

      Here, column "ave7_inc" is the average rate of the previous 7 days including the base date (I manually calculated in Excel).

      column "ave7_exc" is the average rate of the previous 7 days excluding the base date (I manually calculated in Excel).

      In STATA, I would like to duplicate the variables "ave7_inc" and "ave7_exc" based on STATA code across 10 different regions.

      Best,

      Comment


      • #4
        I can show you how to get ave7_inc. But I cannot understand what you are doing with ave7_exc: what is the "base date" that you are excluding? I tried excluding the first, or the last of the 7 dates that go into ave7_inc, but neither produces the results you are showing. I'm wondering if you made some mistakes in that calculation?

        Also in observation 7, for 7 Jan 2012 you show no result for avg7_inc. Why? Shouldn't it be the average of all of the first 7 observations?

        Anyway, take a look at this:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long date str9 time byte region float(rate avg7_inc avg7_exc)
        20120101 "1-Jan-12"  1 .011    .    .
        20120102 "2-Jan-12"  1  .01    .    .
        20120103 "3-Jan-12"  1 .008    .    .
        20120104 "4-Jan-12"  1 .012    .    .
        20120105 "5-Jan-12"  1 .007    .    .
        20120106 "6-Jan-12"  1 .008    .    .
        20120107 "7-Jan-12"  1 .009    .    .
        20120108 "8-Jan-12"  1 .009 .009 .009
        20120109 "9-Jan-12"  1 .023 .011 .009
        20120110 "10-Jan-12" 1 .013 .012 .011
        20120111 "11-Jan-12" 1  .01 .011 .012
        20120112 "12-Jan-12" 1 .007 .011 .011
        20120113 "13-Jan-12" 1  .01 .012 .011
        20120114 "14-Jan-12" 1 .008 .011 .012
        20120115 "15-Jan-12" 1 .007 .011 .011
        20120116 "16-Jan-12" 1 .007 .009 .011
        20120117 "17-Jan-12" 1 .008 .008 .009
        20120118 "18-Jan-12" 1 .017 .009 .008
        20120119 "19-Jan-12" 1  .01  .01 .009
        20120120 "20-Jan-12" 1 .005 .009  .01
        20120121 "21-Jan-12" 1 .005 .008 .009
        20120122 "22-Jan-12" 1 .005 .008 .008
        20120123 "23-Jan-12" 1 .004 .008 .008
        20120124 "24-Jan-12" 1 .006 .007 .008
        20120125 "25-Jan-12" 1 .006 .006 .007
        20120126 "26-Jan-12" 1 .009 .006 .006
        20120127 "27-Jan-12" 1  .01 .006 .006
        20120128 "28-Jan-12" 1 .007 .007 .006
        end
        
        rename avg7_inc wanted_inc
        rename avg7_exc wanted_exc
        
        //    CREATE A REAL STATA INTERNAL FORMAT DATE VARIABLE
        drop date
        gen date = daily(time, "DM20Y")
        assert missing(date) == missing(time)
        format date %td
        drop time
        order date, after(region)
        
        //    CALCULATE THE RUNNING RANGES
        rangestat (mean) avg_inc = rate (count) count_inc = rate, interval(date -6 0) by(region)
        replace avg_inc = . if count_inc < 7
        order avg_inc, after(wanted_inc)
        
        rangestat (mean) avg_exc = rate (count) count_exc = rate, interval(date -5 0) by(region)
        replace avg_exc = . if count_exc < 6
        order avg_exc, after(wanted_exc)
        format avg_* %5.3f
        This code calculates avg_exc by averaging the rate from days -5 through day 0 for each observation. That would correspond to removing the first of the 7 days that were used in avg_inc. But, as I say, I don't understand what you want for avg_exc, as I cannot reproduce your results using any of the inclusions that I understood as sensible here. You can rechceck your results, or you can tinker with the numbers in the -interval()- option to get what you want.

        You will need the -rangestat- command, written by Robert Picard, Nick Cox, and Roberto Ferrer to use this code. It is available from SSC.




        Comment


        • #5
          Dear Clyde Schechter,

          Thank you so much for your time and support.

          Your code is perfectly working for me.

          I am not sure how to explain what I actually wanted to do for ave_exc

          However, I got the result for ave_exc after I modified your code as follows: rangestat (mean) avg_exc = rate (count) count_exc = rate, interval (date -7 -1) by (region).

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte region float(date rate wanted_inc) double avg_inc float wanted_exc double avg_exc
          1 18993 .011    .                    .    .                    .
          1 18994  .01    .                    .    .                    .
          1 18995 .008    .                    .    .                    .
          1 18996 .012    .                    .    .                    .
          1 18997 .007    .                    .    .                    .
          1 18998 .008    .                    .    .                    .
          1 18999 .009    .  .009285714344254561    .                    .
          1 19000 .009 .009  .009000000011708056 .009  .009285714344254561
          1 19001 .023 .011  .010857142907168185 .009  .009000000011708056
          1 19002 .013 .012  .011571428605488368 .011  .010857142907168185
          1 19003  .01 .011  .011285714272941862 .012  .011571428605488368
          1 19004 .007 .011  .011285714272941862 .011  .011285714272941862
          1 19005  .01 .012  .011571428472442287 .011  .011285714272941862
          1 19006 .008 .011  .011428571439215116 .012  .011571428472442287
          1 19007 .007 .011   .01114285723971469 .011  .011428571439215116
          1 19008 .007 .009  .008857142978480883 .011   .01114285723971469
          1 19009 .008 .008    .0081428572801607 .009  .008857142978480883
          1 19010 .017 .009  .009142857444073473 .008    .0081428572801607
          1 19011  .01  .01  .009571428809847151 .009  .009142857444073473
          1 19012 .005 .009  .008857143111526966  .01  .009571428809847151
          1 19013 .005 .008  .008428571612707205 .009  .008857143111526966
          1 19014 .005 .008    .0081428572801607 .008  .008428571612707205
          1 19015 .004 .008  .007714285847863981 .008    .0081428572801607
          1 19016 .006 .007  .007428571515317474 .008  .007714285847863981
          1 19017 .006 .006   .00585714281935777 .007  .007428571515317474
          1 19018 .009 .006 .0057142856530845165 .006   .00585714281935777
          1 19019  .01 .006  .006428571351404701 .006 .0057142856530845165
          1 19020 .007 .007  .006714285683951208 .006  .006428571351404701
          end
          format %td date
          ------------------ copy up to and including the previous line ------------------

          Thank you so much again.

          Comment


          • #6
            You can also use asrol (from SSC) program for rolling window statistics. So
            Code:
             ssc install asrol
            
            * Rolling window mean, requiring minimum 7 observation
            bys region: asrol rate, wind(date 7) stat(mean) min(7)
            
            * Rolling window mean excluding the focal observation 
            bys region: asrol rate, wind(date 7) stat(mean)  gen(mean_xf)  xf(focal)
            
            * If minimum of 7 observations are required, then
            bys region: asrol rate, wind(date 7) stat(count)  xf(focal)
            replace mean_xf = . if count7_rate <6
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment

            Working...
            X