Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Command for calculating the mean for 5 year periods in panel data

    Dear Stata specialists

    I am quite a rookie at Stata. I want to summarize my data into 5 year periods. I have a data set with over 120 countries and my time period is from 1980 to 2010. Now I want to create the averages for 5 year periods (1980-1984; 1985-1989; 1990-1994; ...).

    The problem is, that my panel is UNBALANCED, so there is data MISSING at some points. So for example when there are just 3 values avaiable in a 5 year periode I just want to have the mean of these 3 values.

    My data set looks like that:
    Country Year corruption law and order Information Index Account Index Transparency Index
    Albania 1985 4,00 4,00 25 11 19
    Albania 1986 4,00 4,00 26 11 .
    Albania 1987 4,00 4,00 26 15 21
    Albania 1988 4,00 4,00 26 11 .
    Albania 1989 4,00 4,00 26 11 19
    Albania 1990 4,00 . 27 22 25
    Albania 1991 4,00 2,83 27 31 29
    Albania 1992 . 2,08 29 42 34
    Albania 1993 4,00 3,33 31 37 34
    Albania 1994 4,00 4,67 39 33 36
    Albania 1995 3,83 4,67 44 41 42
    Albania 1996 2,75 4,00 47 41 44
    Albania 1997 2,25 2,83 48 49 48
    Albania 1998 3,00 3,00 52 60 55
    Albania 1999 2,33 . 50 . 51
    Albania 2000 2,00 2,00 48 46 47
    Albania 2001 2,00 2,00 55 49 52
    Albania 2002 2,00 2,00 57 59 .
    Albania 2003 . 2,00 59 58 .
    Albania 2004 . 2,00 61 61 61
    Albania 2005 1,83 2,21 63 66 64
    Albania 2006 1,00 2,50 69 57 62
    Albania 2007 1,00 2,50 71 55 63
    Albania 2008 1,50 2,50 71 60 66
    Albania 2009 1,50 2,50 69 57 63
    Albania 2010 1,83 2,50 64 58 61
    Algeria 1985 4,00 2,00 43 12 29
    Algeria 1986 4,00 2,00 43 12 29
    Algeria 1987 4,00 2,08 43 12 29
    Algeria 1988 4,00 2,75 43 12 30
    Algeria 1989 4,00 2,00 45 31 .
    Algeria 1990 3,92 2,17 44 34 40
    Algeria 1991 4,00 1,42 45 34 40
    Algeria 1992 3,50 1,42 . 14 27
    Algeria 1993 3,00 2,00 37 18 28
    Algeria 1994 3,00 3,00 . 27 32
    Algeria 1995 3,00 3,00 39 29 .
    Algeria 1996 . 3,00 38 37 38
    Algeria 1997 2,50 3,00 39 35 37
    Algeria 1998 2,00 2,08 39 27 33
    Algeria 1999 . 2,00 40 30 35
    Algeria 2000 2,00 2,00 41 36 39
    Algeria 2001 1,96 2,00 46 34 40
    Algeria 2002 1,50 2,00 44 35 39
    Algeria 2003 1,50 2,00 47 33 .
    Algeria 2004 1,50 2,42 48 33 42
    Algeria 2005 1,50 . 56 . 46
    Algeria 2006 1,50 3,00 58 38 48
    Algeria 2007 1,50 3,00 58 32 44
    Algeria 2008 1,50 . 58 30 42
    Algeria 2009 1,50 3,00 59 29 42
    Algeria 2010 1,83 3,00 59 32 44

    and in the end it has to look like that:

    country Periods mean_corruption mean_law and order mean transparancy
    Albania 1980-1984
    Albania 1985-1989
    Albania 1990-1994
    Albania 1995-1999
    Albania 2000-2004
    Albania 2005-2009
    Algeria 1980-1984
    Algeria 1985-1989
    Algeria 1990-1994
    Algeria 1995-1999
    Algeria 2000-2004
    Algeria 2005-2009
    Can you please advise me on the code I can use for this? I am writting on my bachelor thesis and it is really important.

    Thank you very much in advance
    Last edited by Rick Mueller; 18 Jan 2016, 06:29.

  • #2
    Please read (again, as it should be)

    http://www.statalist.org/forums/help#technical

    as you have a good question, just ask it. We don't need or want detailed explanations of how much you need help, how urgent it is for you, or how grateful you will be for attention.
    http://www.statalist.org/forums/help#realnames

    You are asked to post on Statalist using your full real name, including given name(s) and a family name, such as "Ronald Fisher" or "Gertrude M. Cox". Giving full names is one of the ways in which we show respect for others and is a long tradition on Statalist. It is also much easier to get to know people when real names are used.
    If you overlook this on first registration, it is easy to fix. Click on “Contact us” located at the bottom right-hand corner of every page.
    You can define your five-year periods by the years that start them:

    Code:
    gen period = 5 * floor(year/5)
    Proof of concept:

    Code:
    . di 5 * floor(1980/5)
    1980
    
    . di 5 * floor(1984/5)
    1980
    
    . di 5 * floor(1985/5)
    1985
    After that it's just an application of collapse, so read its help. collapse is smart about missing or omitted values. Here's a silly example

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . gen mpg5 = 5 *  floor(mpg/5)
    
    . tab mpg5
    
           mpg5 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
             10 |          8       10.81       10.81
             15 |         27       36.49       47.30
             20 |         20       27.03       74.32
             25 |         12       16.22       90.54
             30 |          4        5.41       95.95
             35 |          2        2.70       98.65
             40 |          1        1.35      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    
    . l price if mpg >= 40
    
         +-------+
         | price |
         |-------|
     71. | 5,397 |
         +-------+
    
    . collapse price, by(mpg5)
    
    . l
    
         +----------------+
         | mpg5     price |
         |----------------|
      1. |   10    11,139 |
      2. |   15   5,970.7 |
      3. |   20     5,637 |
      4. |   25   5,269.3 |
      5. |   30     4,038 |
         |----------------|
      6. |   35   4,193.5 |
      7. |   40     5,397 |
         +----------------+
    There's only one car with mpg of 40 or above, but collapse uses the right denominator as well as the right numerator.

    Comment


    • #3
      Thanks for informing me about the traditions of Statalist ... I will change my name immediately.

      But I have still problems running the code.

      When I enter the following code:


      encode Country, gen(id)
      gen period = 5 * floor(1980/5)
      tab period
      collapse corrupt if id==1, by (period)



      the output for Albania (id=1) for 1980-1984 is:

      period corrupt
      1980 2.77
      but as you can see in my first post, it should be 4.

      Comment


      • #4
        I'd put a bet on a small error on your part rather than a bug in collapse.

        I suspect that the mapping between Country and id is not what you think. You haven't shown us that Albania is id 1; that's just what we guess from your example. In fact if we take your example literally there should not be an average for Albania 1980 at all.

        But you can just

        Code:
        collapse corrupt, by(Country period)

        Comment


        • #5
          when I just enter the following code

          gen period = 5 * floor(1980/5)
          collapse corrupt, by(Country period)

          here is the result:

          Country period corrupt
          Albania 1980 2.77
          Algeria 1980 2.60
          Angola 1980 2.47
          Argentina 1980 3.01
          Armenia 1980 1.65

          something went wrong. Because the averages are alle wrong and Armenia has an average although there are no observations for Armenia between 1980 and 1984.

          Comment


          • #6
            You typed

            Code:
            gen period = 5 * floor(1980/5)
            which just reduces everything to a single period.

            You need (see #2 again)

            Code:
            gen period = 5 * floor(Year/5)
            That was in your #3. Sorry I didn't spot it then.

            Comment


            • #7
              Now it works! Thank you very much for your help and patience.

              Comment

              Working...
              X