Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating 5 year averages

    Hi all,

    I am working with a data set that has event data at the district-month level where

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(uniqueid year month count)
    1001 1990  1  1
    1001 1990  2  1
    1001 1990  3  3
    1001 1990  4  4
    1001 1990  5  3
    1001 1990  6  5
    1001 1990  7  5
    1001 1990  8  2
    1001 1990  9  2
    1001 1990 10  1
    1001 1990 11  5
    1001 1990 12  2
    1001 1991  1  5
    1001 1991  2 12
    1001 1991  3  1
    1001 1991  4  3
    1001 1991  7  2
    1001 1991  8  6
    1001 1991  9  4
    1001 1991 10  4
    1001 1991 11  5
    1001 1991 12  2
    1001 1992  1  2
    1001 1992  2  3
    1001 1992  4  3
    1001 1992  5  3
    1001 1992  6  3
    1001 1992  7  2
    1001 1992  8  3
    1001 1992  9  1
    1001 1992 10  2
    1001 1992 11  2
    1001 1992 12  2
    1001 1993  1  6
    1001 1993  2  2
    1001 1993  3  5
    1001 1993  4  2
    1001 1993  5  1
    1001 1993  8  1
    1001 1993  9  2
    1001 1993 10  3
    1001 1993 11  3
    1001 1994  1  2
    1001 1994  2  2
    1001 1994  3  1
    1001 1994  5  2
    1001 1994  7  1
    1001 1994  8  1
    1001 1994  9  2
    1001 1994 10  3
    1001 1994 11  2
    1001 1994 12  1
    1001 1995  1  2
    1001 1995  2  3
    1001 1995  3  1
    1001 1995  4  1
    1001 1995  5  3
    1001 1995  6  3
    1001 1995  7  5
    1001 1995  8  2
    1001 1995  9  6
    1001 1995 10  1
    1001 1995 11  1
    1001 1996  1  2
    1001 1996  2  6
    1001 1996  3  1
    1001 1996  4  1
    1001 1996  7  5
    1001 1996  9  2
    1001 1996 10  3
    1001 1996 11  3
    1001 1996 12  3
    1001 1997  1  1
    1001 1997  2  2
    1001 1997  3  4
    1001 1997  4  1
    1001 1997  5  1
    1001 1997  7  1
    1001 1997  8  1
    1001 1997  9  4
    1001 1997 10  1
    1001 1997 12  3
    1001 1998  1  2
    1001 1998  2  1
    1001 1998  3  5
    1001 1998  4  5
    1001 1998  5  6
    1001 1998  6  1
    1001 1998  7  4
    1001 1998  8  2
    1001 1998  9  3
    1001 1998 10  1
    1001 1998 11  4
    1001 1998 12  3
    1001 1999  1  3
    1001 1999  2  1
    1001 1999  3  2
    1001 1999  5  1
    1001 1999  8  3
    1001 1999  9  4
    end
    uniqueid is the district identifier. I would like to create five year averages for each municipality- but I am not sure how to take into account the fact that missing months mean no event occurred. What is the proper way to approach this?

    Thank you.

  • #2
    The simplest way to resolve this is to supplement the data set with observations for the missing months, putting in 0 for the count variable. To do this, you will need to have a real Stata monthly date variable--which you will probably need any way for other purposes as well. (Separate year and month variables are usually not very useful.) Once you have that, the -tsfill- command brings you home.

    Code:
    //    CREATE A STATA INTERNAL FORMAT MONTHLY DATE VARIABLE
    gen mdate = ym(year, month)
    format mdate %tm
    assert missing(mdate) == missing(year, month)
    
    //    FILLIN MISSING MONTHS WITH ZERO COUNT
    xtset uniqueid mdate
    tsfill
    replace count = 0 if missing(count)
    Now, you can proceed with your five-year averages.

    Comment


    • #3
      Dear Clyde, Will the following code do?
      Code:
      rangestat count, interval(year -4 0) by(uniqueid)
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        River Huang Please always remember the protocol to explain community-contributed commands: so, rangestat is from SSC.

        That code takes a mean of count across monthly observations in 5 year intervals. It is biased upwards by not including zeros. But

        Code:
         
         rangestat (sum) sum=count, interval(year -4 0) by(uniqueid)
        clearly doesn't need to know about any zeros, and it's possible that dividing that result by 5 or by 60 gives Adrienne Wold what she wants. If she wants averages for disjoint 5 year periods, that's different.

        I am with Clyde Schechter here: It's likely that filling in the gaps will be helpful for many purposes, and not obviously harmful for any purpose.

        Comment


        • #5
          Dear Nick, Thanks. I should have mentioned the -rangestat- command must be installed from SSC. Now, I see the point.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            Thanks all for the replies. I tried Nick's most recent command posted and obtained:
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(uniqueid year month count) double sum
            1001 1990  1  1  34
            1001 1990  2  1  34
            1001 1990  3  3  34
            1001 1990  4  4  34
            1001 1990  5  3  34
            1001 1990  6  5  34
            1001 1990  7  5  34
            1001 1990  8  2  34
            1001 1990  9  2  34
            1001 1990 10  1  34
            1001 1990 11  5  34
            1001 1990 12  2  34
            1001 1991  1  5  78
            1001 1991  2 12  78
            1001 1991  3  1  78
            1001 1991  4  3  78
            1001    .  .  0   .
            1001    .  .  0   .
            1001 1991  7  2  78
            1001 1991  8  6  78
            1001 1991  9  4  78
            1001 1991 10  4  78
            1001 1991 11  5  78
            1001 1991 12  2  78
            1001 1992  1  2 104
            1001 1992  2  3 104
            1001    .  .  0   .
            1001 1992  4  3 104
            1001 1992  5  3 104
            1001 1992  6  3 104
            1001 1992  7  2 104
            1001 1992  8  3 104
            1001 1992  9  1 104
            1001 1992 10  2 104
            1001 1992 11  2 104
            1001 1992 12  2 104
            1001 1993  1  6 129
            1001 1993  2  2 129
            1001 1993  3  5 129
            1001 1993  4  2 129
            1001 1993  5  1 129
            1001    .  .  0   .
            1001    .  .  0   .
            1001 1993  8  1 129
            1001 1993  9  2 129
            1001 1993 10  3 129
            1001 1993 11  3 129
            1001    .  .  0   .
            1001 1994  1  2 146
            1001 1994  2  2 146
            1001 1994  3  1 146
            1001    .  .  0   .
            1001 1994  5  2 146
            1001    .  .  0   .
            1001 1994  7  1 146
            1001 1994  8  1 146
            1001 1994  9  2 146
            1001 1994 10  3 146
            1001 1994 11  2 146
            1001 1994 12  1 146
            1001 1995  1  2 140
            1001 1995  2  3 140
            1001 1995  3  1 140
            1001 1995  4  1 140
            1001 1995  5  3 140
            1001 1995  6  3 140
            1001 1995  7  5 140
            1001 1995  8  2 140
            1001 1995  9  6 140
            1001 1995 10  1 140
            1001 1995 11  1 140
            1001    .  .  0   .
            1001 1996  1  2 122
            1001 1996  2  6 122
            1001 1996  3  1 122
            1001 1996  4  1 122
            1001    .  .  0   .
            1001    .  .  0   .
            1001 1996  7  5 122
            1001    .  .  0   .
            1001 1996  9  2 122
            1001 1996 10  3 122
            1001 1996 11  3 122
            1001 1996 12  3 122
            1001 1997  1  1 115
            1001 1997  2  2 115
            1001 1997  3  4 115
            1001 1997  4  1 115
            1001 1997  5  1 115
            1001    .  .  0   .
            1001 1997  7  1 115
            1001 1997  8  1 115
            1001 1997  9  4 115
            1001 1997 10  1 115
            1001    .  .  0   .
            1001 1997 12  3 115
            1001 1998  1  2 127
            1001 1998  2  1 127
            1001 1998  3  5 127
            1001 1998  4  5 127
            end
            But, I'm not sure why the sum variable is not aggregated by 5 year period/ each year is summed dindividually?

            Comment


            • #7
              Using Clyde Schechter suggestion, would
              Code:
              gen period = 5 * floor(mdate/5)
              be the way to proceed with the averages?

              Comment


              • #8
                #6 Well, thanks I guess, but it seems that you ignored important parts of Clyde's advice in #2. rangestat gives results for moving windows: did you study its help?

                #7 Yes, if and only if you want 5 year periods like 1990-1994, 1995-1999, ....
                Last edited by Nick Cox; 06 May 2019, 04:20.

                Comment

                Working...
                X