Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Average Income for past 16 years by individual in Panel Data

    Hello,

    I'm working on a thesis data project where I need to calculate the average income of each individual over the past 16 years but for each of their recorded years in the panel data. For instance, an individual might have 25 years of income observations, but for each year, I want to calulate the average of the last 16 years. The reasoning behind this is that this data is for parents and will later be matched with their 16-year-old children to act as a control for average parental income during childhood.

    I assume that a loop is the best method here, however I cannot figure out how to write the code.

    Please see an example of the panel data (for one individual out of a million) I'm using:
    Individual ID code Year Income
    12 1989 20000
    12 1990 20100
    12 1991 20200
    12 1992 21000
    12 1993 0
    12 1994 22000
    12 1995 22200
    12 1996 23000
    12 1997 24000
    12 1998 25000
    12 1999 26000
    12 2000 30000
    12 2001 30000
    12 2002 30100
    12 2003 31000
    12 2004 32000
    12 2005 32000
    12 2006 33000
    12 2007 33100

    Any help and guidance would be greatly appreciated. I apologise if my explanation is unclear.

    Many thanks,
    Elsa

  • #2
    Code:
    clear all
    
    set obs 2
    g id = _n
    expand 20
    bys id: g year = _n
    g income = rgamma(10,5)
    
    xtset id year
    asreg income , window(year 16) by(id)
    drop _*R2
    rename _b_cons income16yr
    _Nobs tells you how many years were used if you want to restrict on that.

    Comment


    • #3
      Install rangestat from SSC. Below, if the sample period is less than 16 years, you get the mean over that sample period. If it is equal to 16 or exceeds 16, you get the mean over a 16-year period.

      Code:
      webuse grunfeld, clear
      xtset company year
      *ssc install rangestat, replace
      rangestat (mean) invest, interval(year -16 0) by(company)
      Res.:

      Code:
      . l company year invest*, sepby(company)
      
           +-------------------------------------+
           | company   year   invest   invest_~n |
           |-------------------------------------|
        1. |       1   1935    317.6   317.60001 |
        2. |       1   1936    391.8       354.7 |
        3. |       1   1937    410.6   373.33333 |
        4. |       1   1938    257.7     344.425 |
        5. |       1   1939    330.8       341.7 |
        6. |       1   1940    461.2   361.61667 |
        7. |       1   1941      512       383.1 |
        8. |       1   1942      448    391.2125 |
        9. |       1   1943    499.6   403.25556 |
       10. |       1   1944    547.5      417.68 |
       11. |       1   1945    561.2   430.72728 |
       12. |       1   1946    688.1     452.175 |
       13. |       1   1947    568.9   461.15385 |
       14. |       1   1948    529.2   466.01429 |
       15. |       1   1949    555.1   471.95333 |
       16. |       1   1950    642.9    482.6375 |
       17. |       1   1951    755.9   498.71177 |
       18. |       1   1952    891.2   532.45295 |
       19. |       1   1953   1304.4    586.1353 |
       20. |       1   1954   1486.7    649.4353 |
           |-------------------------------------|
       21. |       2   1935    209.9   209.89999 |
       22. |       2   1936    355.3   282.59999 |
       23. |       2   1937    469.9   345.03333 |
       24. |       2   1938    262.3   324.34999 |
       25. |       2   1939    230.4   305.55999 |
       26. |       2   1940    361.6   314.89999 |
       27. |       2   1941    472.8   337.45714 |
       28. |       2   1942    445.6   350.97499 |
       29. |       2   1943    361.6   352.15555 |
       30. |       2   1944    288.2      345.76 |
       31. |       2   1945    258.7   337.84545 |
       32. |       2   1946    420.3   344.71666 |
       33. |       2   1947    420.5   350.54615 |
       34. |       2   1948    494.5   360.82857 |
       35. |       2   1949    405.1      363.78 |
       36. |       2   1950    418.8   367.21875 |
       37. |       2   1951    588.2   380.21765 |
       38. |       2   1952    645.5   405.84118 |
       39. |       2   1953      641   422.64706 |
       40. |       2   1954    459.3   422.02353 |
           |-------------------------------------|
       41. |       3   1935     33.1   33.099998 |
       42. |       3   1936       45   39.049999 |
       43. |       3   1937     77.2   51.766665 |
       44. |       3   1938     44.6   49.974998 |
       45. |       3   1939     48.1   49.599998 |
       46. |       3   1940     74.4   53.733332 |
       47. |       3   1941      113   62.199999 |
       48. |       3   1942     91.9   65.912499 |
       49. |       3   1943     61.3   65.399999 |
       50. |       3   1944     56.8   64.539999 |
       51. |       3   1945     93.6   67.181817 |
       52. |       3   1946    159.9   74.908332 |
       53. |       3   1947    147.2   80.469229 |
       54. |       3   1948    146.3   85.171428 |
       55. |       3   1949     98.3   86.046666 |
       56. |       3   1950     93.5   86.512499 |
       57. |       3   1951    135.2    89.37647 |
       58. |       3   1952    157.3   96.682352 |
       59. |       3   1953    179.5   104.59412 |
       60. |       3   1954    189.6   111.20588 |
           |-------------------------------------|
       61. |       4   1935    40.29   40.290001 |
       62. |       4   1936    72.76   56.525002 |
       63. |       4   1937    66.26   59.770002 |
       64. |       4   1938     51.6   57.727501 |
       65. |       4   1939    52.41   56.664001 |
       66. |       4   1940    69.41   58.788335 |
       67. |       4   1941    68.35   60.154287 |
       68. |       4   1942     46.8   58.485001 |
       69. |       4   1943     47.4   57.253334 |
       70. |       4   1944    59.57   57.485001 |
       71. |       4   1945    88.78       60.33 |
       72. |       4   1946    74.12   61.479167 |
       73. |       4   1947    62.68   61.571539 |
       74. |       4   1948    89.36   63.556429 |
       75. |       4   1949    78.98   64.584667 |
       76. |       4   1950   100.66   66.839376 |
       77. |       4   1951   160.62   72.355883 |
       78. |       4   1952      145   78.515295 |
       79. |       4   1953   174.93   84.525294 |
       80. |       4   1954   172.49   90.774118 |
           |-------------------------------------|
       81. |       5   1935    39.68       39.68 |
       82. |       5   1936    50.73      45.205 |
       83. |       5   1937    74.24   54.883333 |
       84. |       5   1938    53.51   54.539999 |
       85. |       5   1939    42.65      52.162 |
       86. |       5   1940    46.48      51.215 |
       87. |       5   1941     61.4       52.67 |
       88. |       5   1942    39.67      51.045 |
       89. |       5   1943    62.24   52.288889 |
       90. |       5   1944    52.32      52.292 |
       91. |       5   1945    63.21   53.284545 |
       92. |       5   1946    59.37   53.791666 |
       93. |       5   1947    58.02   54.116923 |
       94. |       5   1948    70.34   55.275714 |
       95. |       5   1949    67.42   56.085333 |
       96. |       5   1950    55.74    56.06375 |
       97. |       5   1951     80.3   57.489412 |
       98. |       5   1952     85.4   60.178823 |
       99. |       5   1953     91.9   62.600588 |
      100. |       5   1954    81.43    63.02353 |
           |-------------------------------------|
      101. |       6   1935    20.36   20.360001 |
      102. |       6   1936    25.98       23.17 |
      103. |       6   1937    25.94   24.093334 |
      104. |       6   1938    27.53     24.9525 |
      105. |       6   1939     24.6      24.882 |
      106. |       6   1940    28.54   25.491667 |
      107. |       6   1941    43.41   28.051429 |
      108. |       6   1942    42.81    29.89625 |
      109. |       6   1943    27.84   29.667778 |
      110. |       6   1944     32.6      29.961 |
      111. |       6   1945    39.03   30.785455 |
      112. |       6   1946    50.17   32.400833 |
      113. |       6   1947    51.85   33.896923 |
      114. |       6   1948    64.03   36.049285 |
      115. |       6   1949    68.16       38.19 |
      116. |       6   1950    77.34   40.636875 |
      117. |       6   1951     95.3   43.852353 |
      118. |       6   1952    99.49   48.507059 |
      119. |       6   1953   127.52       54.48 |
      120. |       6   1954   135.72   60.937647 |
           |-------------------------------------|
      121. |       7   1935    24.43       24.43 |
      122. |       7   1936    23.21       23.82 |
      123. |       7   1937    32.78   26.806666 |
      124. |       7   1938    32.54       28.24 |
      125. |       7   1939    26.65      27.922 |
      126. |       7   1940    33.71   28.886666 |
      127. |       7   1941     43.5   30.974285 |
      128. |       7   1942    34.46       31.41 |
      129. |       7   1943    44.28       32.84 |
      130. |       7   1944     70.8      36.636 |
      131. |       7   1945    44.12   37.316363 |
      132. |       7   1946    48.98   38.288333 |
      133. |       7   1947    48.51   39.074615 |
      134. |       7   1948       50      39.855 |
      135. |       7   1949    50.59   40.570666 |
      136. |       7   1950    42.53   40.693125 |
      137. |       7   1951    64.77   42.109411 |
      138. |       7   1952    72.68   44.947647 |
      139. |       7   1953    73.86   47.927058 |
      140. |       7   1954    89.51   51.264117 |
           |-------------------------------------|
      141. |       8   1935    12.93       12.93 |
      142. |       8   1936     25.9      19.415 |
      143. |       8   1937    35.05   24.626666 |
      144. |       8   1938    22.89     24.1925 |
      145. |       8   1939    18.84      23.122 |
      146. |       8   1940    28.57       24.03 |
      147. |       8   1941    48.51   27.527142 |
      148. |       8   1942    43.34    29.50375 |
      149. |       8   1943    37.02   30.338889 |
      150. |       8   1944    37.81      31.086 |
      151. |       8   1945    39.27       31.83 |
      152. |       8   1946    53.46     33.6325 |
      153. |       8   1947    55.56   35.319231 |
      154. |       8   1948    49.56   36.336429 |
      155. |       8   1949    32.04       36.05 |
      156. |       8   1950    32.24   35.811875 |
      157. |       8   1951    54.38   36.904118 |
      158. |       8   1952    71.78   40.365883 |
      159. |       8   1953    90.08   44.141177 |
      160. |       8   1954     68.6   46.114706 |
           |-------------------------------------|
      161. |       9   1935    26.63   26.629999 |
      162. |       9   1936    23.39   25.009999 |
      163. |       9   1937    30.65   26.889999 |
      164. |       9   1938    20.89   25.389999 |
      165. |       9   1939    28.78      26.068 |
      166. |       9   1940    26.93   26.211666 |
      167. |       9   1941    32.08       27.05 |
      168. |       9   1942    32.21      27.695 |
      169. |       9   1943    35.69   28.583333 |
      170. |       9   1944    62.47      31.972 |
      171. |       9   1945    52.32   33.821818 |
      172. |       9   1946    56.95   35.749167 |
      173. |       9   1947    54.32   37.177692 |
      174. |       9   1948    40.53   37.417143 |
      175. |       9   1949    32.54      37.092 |
      176. |       9   1950    43.48    37.49125 |
      177. |       9   1951    56.49   38.608824 |
      178. |       9   1952    65.98    40.92353 |
      179. |       9   1953    66.11   43.436471 |
      180. |       9   1954    49.34   44.535883 |
           |-------------------------------------|
      181. |      10   1935     2.54        2.54 |
      182. |      10   1936        2        2.27 |
      183. |      10   1937     2.19   2.2433333 |
      184. |      10   1938     1.99        2.18 |
      185. |      10   1939     2.03        2.15 |
      186. |      10   1940     1.81   2.0933333 |
      187. |      10   1941     2.14         2.1 |
      188. |      10   1942     1.86        2.07 |
      189. |      10   1943      .93   1.9433333 |
      190. |      10   1944     1.18       1.867 |
      191. |      10   1945     1.36   1.8209091 |
      192. |      10   1946     2.24   1.8558333 |
      193. |      10   1947     3.81   2.0061538 |
      194. |      10   1948     5.66   2.2671428 |
      195. |      10   1949     4.21   2.3966667 |
      196. |      10   1950     3.42    2.460625 |
      197. |      10   1951     4.67   2.5905882 |
      198. |      10   1952        6   2.7941177 |
      199. |      10   1953     6.53   3.0605883 |
      200. |      10   1954     5.12   3.2329412 |
           +-------------------------------------

      Comment


      • #4
        yup, that works too.

        I think -16 should be -15, since 0 is included.

        If you want the observation count, you can switch out (mean) and use (obs). (if you do so, you'll see that that 17 years are included with -16).

        Comment


        • #5
          15 is correct, thanks George.

          Comment


          • #6
            Hi George & Andrew,

            Thank you very much for your quick replies, but I face an issue: I'm working from a virtual desktop that has internet access blocked due to the confidential nature of the data. Therefore, I cannot install asreg or rangestat (stata cannot connect to the host website for installation)

            Is there any other method you would propose for the code?

            Thank you!

            Comment


            • #7
              As a minor twist on very helpful answers, I will add that rangestat lets you calculate various summaries at the same time. Here it will usually be at least prudent and possibly essential to keep track of how many values went into the mean.

              Code:
              webuse grunfeld, clear
              xtset company year
              *ssc install rangestat, replace
              rangestat (count) invest (mean) invest, interval(year -15 0) by(company)
              
              list company year invest* in 1/20 
              
                   +------------------------------------------------+
                   | company   year   invest   invest~t   invest_~n |
                   |------------------------------------------------|
                1. |       1   1935    317.6          1   317.60001 |
                2. |       1   1936    391.8          2       354.7 |
                3. |       1   1937    410.6          3   373.33333 |
                4. |       1   1938    257.7          4     344.425 |
                5. |       1   1939    330.8          5       341.7 |
                   |------------------------------------------------|
                6. |       1   1940    461.2          6   361.61667 |
                7. |       1   1941      512          7       383.1 |
                8. |       1   1942      448          8    391.2125 |
                9. |       1   1943    499.6          9   403.25556 |
               10. |       1   1944    547.5         10      417.68 |
                   |------------------------------------------------|
               11. |       1   1945    561.2         11   430.72728 |
               12. |       1   1946    688.1         12     452.175 |
               13. |       1   1947    568.9         13   461.15385 |
               14. |       1   1948    529.2         14   466.01429 |
               15. |       1   1949    555.1         15   471.95333 |
                   |------------------------------------------------|
               16. |       1   1950    642.9         16    482.6375 |
               17. |       1   1951    755.9         16   510.03125 |
               18. |       1   1952    891.2         16   541.24376 |
               19. |       1   1953   1304.4         16   597.10626 |
               20. |       1   1954   1486.7         16   673.91875 |
                   +------------------------------------------------+
              
              .

              Comment


              • #8
                Here is an answer to #6. Note that this solution assumes one value each year, although it could be missing.

                rangestat is used too to show that with that proviso results are the same.

                Code:
                webuse grunfeld, clear
                xtset company year
                
                bysort company (year) : gen double cusum1 = sum(invest)
                by company : gen count1 = sum(invest < .)
                by company : gen double cusum2 = sum(invest[_n - 16])
                by company : gen count2 = sum(invest[_n-16] < .)
                
                gen count = count1 - count2 
                gen double mean = (cusum1 - cusum2) / count 
                
                rangestat (count) invest (mean) invest, int(year -15 0) by(company)
                
                list count mean invest_* in 1/20 
                
                     +------------------------------------------+
                     | count        mean   invest~t   invest_~n |
                     |------------------------------------------|
                  1. |     1   317.60001          1   317.60001 |
                  2. |     2       354.7          2       354.7 |
                  3. |     3   373.33333          3   373.33333 |
                  4. |     4     344.425          4     344.425 |
                  5. |     5       341.7          5       341.7 |
                     |------------------------------------------|
                  6. |     6   361.61667          6   361.61667 |
                  7. |     7       383.1          7       383.1 |
                  8. |     8    391.2125          8    391.2125 |
                  9. |     9   403.25556          9   403.25556 |
                 10. |    10      417.68         10      417.68 |
                     |------------------------------------------|
                 11. |    11   430.72728         11   430.72728 |
                 12. |    12     452.175         12     452.175 |
                 13. |    13   461.15385         13   461.15385 |
                 14. |    14   466.01429         14   466.01429 |
                 15. |    15   471.95333         15   471.95333 |
                     |------------------------------------------|
                 16. |    16    482.6375         16    482.6375 |
                 17. |    16   510.03125         16   510.03125 |
                 18. |    16   541.24376         16   541.24376 |
                 19. |    16   597.10626         16   597.10626 |
                 20. |    16   673.91875         16   673.91875 |
                     +------------------------------------------+

                Comment


                • #9
                  Code:
                  clear all
                  
                  set obs 2
                  g id = _n
                  expand 20
                  bys id: g year = _n
                  g income = rgamma(10,5)
                  
                  xtset id year
                  g meanincome = .
                  forv i = 1/20 {
                      capture drop zzz
                      local j = `i'-16
                      egen zzz = mean(cond(year>=`j' & year<=`i',income,.)), by(id)
                      replace meanincome = zzz if year==`i'
                  }
                  
                  * check it
                  rangestat (mean) income, interval(year -15 0) by(id)
                  rangestat (obs) income, interval(year -15 0) by(id)

                  Comment


                  • #10
                    Originally posted by Elsa Hume View Post
                    Hi George & Andrew,

                    Thank you very much for your quick replies, but I face an issue: I'm working from a virtual desktop that has internet access blocked due to the confidential nature of the data. Therefore, I cannot install asreg or rangestat (stata cannot connect to the host website for installation)

                    Is there any other method you would propose for the code?

                    Thank you!
                    The technique I illustrate (thanks to an old post by Nick Cox ) requires a balanced panel. I ensure this is the case using tsfill. It uses all official commands and functions.

                    Code:
                    webuse grunfeld, clear
                    xtset company year
                    *CREATE BALANCED PANEL
                    tsfill
                    *16 YEAR MEAN OF VARIABLE INVEST
                    bys company (year): gen meaninvest1= (sum(invest[_n])- sum(invest[_n-16]))/min(16,_n)
                    *COMPARE WITH RANGESTAT
                    rangestat (mean) invest, interval(year -15 0) by(company)
                    Res.:

                    Code:
                    . l in 1/40, sepby(company)
                    
                         +-------------------------------------------------------------------------+
                         | company   year   invest   mvalue   kstock   time   meanin~1   invest_~n |
                         |-------------------------------------------------------------------------|
                      1. |       1   1935    317.6   3078.5      2.8      1      317.6   317.60001 |
                      2. |       1   1936    391.8   4661.7     52.6      2      354.7       354.7 |
                      3. |       1   1937    410.6   5387.1    156.9      3   373.3333   373.33333 |
                      4. |       1   1938    257.7   2792.2    209.2      4    344.425     344.425 |
                      5. |       1   1939    330.8   4313.2    203.4      5      341.7       341.7 |
                      6. |       1   1940    461.2   4643.9    207.2      6   361.6167   361.61667 |
                      7. |       1   1941      512   4551.2    255.2      7      383.1       383.1 |
                      8. |       1   1942      448   3244.1    303.7      8   391.2125    391.2125 |
                      9. |       1   1943    499.6   4053.7    264.1      9   403.2556   403.25556 |
                     10. |       1   1944    547.5   4379.3    201.6     10     417.68      417.68 |
                     11. |       1   1945    561.2   4840.9      265     11   430.7273   430.72728 |
                     12. |       1   1946    688.1   4900.9    402.2     12    452.175     452.175 |
                     13. |       1   1947    568.9   3526.5    761.5     13   461.1538   461.15385 |
                     14. |       1   1948    529.2   3254.7    922.4     14   466.0143   466.01429 |
                     15. |       1   1949    555.1   3700.2   1020.1     15   471.9533   471.95333 |
                     16. |       1   1950    642.9   3755.6     1099     16   482.6375    482.6375 |
                     17. |       1   1951    755.9     4833   1207.7     17   510.0313   510.03125 |
                     18. |       1   1952    891.2   4924.9   1430.5     18   541.2438   541.24376 |
                     19. |       1   1953   1304.4   6241.7   1777.3     19   597.1063   597.10626 |
                     20. |       1   1954   1486.7   5593.6   2226.3     20   673.9188   673.91875 |
                         |-------------------------------------------------------------------------|
                     21. |       2   1935    209.9   1362.4     53.8      1      209.9   209.89999 |
                     22. |       2   1936    355.3   1807.1     50.5      2      282.6   282.59999 |
                     23. |       2   1937    469.9   2676.3    118.1      3   345.0333   345.03333 |
                     24. |       2   1938    262.3   1801.9    260.2      4     324.35   324.34999 |
                     25. |       2   1939    230.4   1957.3    312.7      5     305.56   305.55999 |
                     26. |       2   1940    361.6   2202.9    254.2      6      314.9   314.89999 |
                     27. |       2   1941    472.8   2380.5    261.4      7   337.4571   337.45714 |
                     28. |       2   1942    445.6   2168.6    298.7      8    350.975   350.97499 |
                     29. |       2   1943    361.6   1985.1    301.8      9   352.1555   352.15555 |
                     30. |       2   1944    288.2   1813.9    279.1     10     345.76      345.76 |
                     31. |       2   1945    258.7   1850.2    213.8     11   337.8455   337.84545 |
                     32. |       2   1946    420.3   2067.7    132.6     12   344.7167   344.71666 |
                     33. |       2   1947    420.5   1796.7    264.8     13   350.5461   350.54615 |
                     34. |       2   1948    494.5   1625.8    306.9     14   360.8286   360.82857 |
                     35. |       2   1949    405.1     1667    351.1     15     363.78      363.78 |
                     36. |       2   1950    418.8   1677.4    357.8     16   367.2188   367.21875 |
                     37. |       2   1951    588.2   2289.5    342.1     17   390.8625    390.8625 |
                     38. |       2   1952    645.5   2159.4    444.2     18        409         409 |
                     39. |       2   1953      641   2031.3    623.6     19   419.6938   419.69375 |
                     40. |       2   1954    459.3   2115.5    669.7     20   432.0063   432.00625 |
                         +-------------------------------------------------------------------------+

                    Comment


                    • #11
                      Actually, with a balanced panel created from an unbalanced panel, the missing values will be summed as zero values, biasing downwards the mean. So just skip the tsfill part of the code, noting that the mean is calculated from available observations and thus can be biased in the presence of missing values. How to handle missing values is a topic on its own which I won't get into.

                      Comment


                      • #12
                        So just skip the tsfill part of the code
                        I think that's wrong. In the unbalanced data, invest[_n-16] will sometimes refer to a year that is outside the 16 year lagging window, because it will refer to the 16th preceding observation, however far back in time that turns out to be.

                        I think the way to work around missing values is to keep the -tsfill- command to create a balanced data set and modify the calculation to account for the fact that some observations are missing.

                        Code:
                        bys company (year): gen valid_obs = sum(!missing(invest))
                        bys company (year): gen meaninvest1= (sum(invest[_n])- sum(invest[_n-16]))/(valid_obs - valid_obs[_n-min(_n, 16)])
                        


                        Of course, that doesn't eliminate the inherent bias attributable to the existence of missing values, but at least it gives a correct calculation of the mean observed invest during the 16 year window.

                        Comment


                        • #13
                          Thanks Clyde. Your reasoning here is better than mine.

                          Comment


                          • #14
                            Great, thank you very much everyone! Your help is really appreciated! Hope you enjoy the rest of your week!

                            Comment

                            Working...
                            X