Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Progressing age variable in a panel dataset

    I have a panel dataset with over 1mln person-month observations. I use Stata 18. Each monthly observation has time var yr_mo (yyyymm), starting from 201701 to 202304; Age var age ("yy.y" ex: "60.4" numeric format); unique identifier randid (numeric) and other variables. The age variable is coded at the time when the individual enters the panel (so not everyone starts at 201701, some join later, and others exit) and does not change going forward. For example, randid 12343 would join the panel in 201704 at the age of 61.4 and stay in the panel until 202304, with the same age for every subsequent month.

    How do I write a code that will take the initial age when the individual joins the panel and progress it forward until 202304 or when they exit?

  • #2
    this what the data looks like
    Click image for larger version

Name:	Screenshot 2023-06-07 at 6.51.17 PM.png
Views:	1
Size:	51.3 KB
ID:	1716391

    Comment


    • #3
      Code:
      gen year = floor(yr_mo)/100
      gen month = mod(yr_mo, 100)
      gen date = mdy(month, 1, year)
      format date %td
      
      by randid (date), sort: replace age = age[1] ///
          + daysinmonth(date[_n-1])/cond(mod(year[_n-1], 4), 365, 366) ///
          if _n > 1
      Notes:

      1. The code accounts for the difference in year length during leap years in a way that is valid for the range of years, 2017-2023 we are dealing with here. It does not extend to all years, however, because it does not cover the exceptional treatment of century years.

      2. It is not possible to resolve this problem exactly. When we are told that in a given month a person is 72.8 years old, we do not know at which date in the month that is exactly true. In this code we make the assumption that all of the dates refer to the first day of the month.

      The code is untested because example data was not provided in a usable form. It may contain typos or other errors. In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
      Last edited by Clyde Schechter; 07 Jun 2023, 21:46. Reason: Correct important errors in code.

      Comment


      • #4
        Clyde, thank you so much for your advice.

        Generating the date works great, but the second part of the code (aging the individual going forward) doesn't work as expected. basically the digit after the decimal is increasing every month, but the age in years is not. Here is a data example with dataex

        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long(yr_mo randid) float age
        201804 10014501 63.4
        201805 10014501 63.4
        201806 10014501 63.4
        201807 10014501 63.4
        201808 10014501 63.4
        201809 10014501 63.4
        201810 10014501 63.4
        201811 10014501 63.4
        201812 10014501 63.4
        201901 10014501 63.4
        201902 10014501 63.4
        201903 10014501 63.4
        201904 10014501 63.4
        201905 10014501 63.4
        201906 10014501 63.4
        201907 10014501 63.4
        201908 10014501 63.4
        201909 10014501 63.4
        201910 10014501 63.4
        201911 10014501 63.4
        201912 10014501 63.4
        202001 10014501 63.4
        202002 10014501 63.4
        202003 10014501 63.4
        202004 10014501 63.4
        202005 10014501 63.4
        202006 10014501 63.4
        202007 10014501 63.4
        202008 10014501 63.4
        202009 10014501 63.4
        202010 10014501 63.4
        202011 10014501 63.4
        202012 10014501 63.4
        202101 10014501 63.4
        202102 10014501 63.4
        202103 10014501 63.4
        202104 10014501 63.4
        202105 10014501 63.4
        202106 10014501 63.4
        202107 10014501 63.4
        202108 10014501 63.4
        202109 10014501 63.4
        202110 10014501 63.4
        202111 10014501 63.4
        202112 10014501 63.4
        202201 10014501 63.4
        202202 10014501 63.4
        202203 10014501 63.4
        202204 10014501 63.4
        202205 10014501 63.4
        202206 10014501 63.4
        202207 10014501 63.4
        202208 10014501 63.4
        202209 10014501 63.4
        202210 10014501 63.4
        202211 10014501 63.4
        202212 10014501 63.4
        202301 10014501 63.4
        202302 10014501 63.4
        202303 10014501 63.4
        202304 10014501 63.4
        201911 10026482 71.4
        201912 10026482 71.4
        202001 10026482 71.4
        202002 10026482 71.4
        202003 10026482 71.4
        202004 10026482 71.4
        202005 10026482 71.4
        202006 10026482 71.4
        202007 10026482 71.4
        202008 10026482 71.4
        202009 10026482 71.4
        202010 10026482 71.4
        202011 10026482 71.4
        202012 10026482 71.4
        202101 10026482 71.4
        202102 10026482 71.4
        202103 10026482 71.4
        202104 10026482 71.4
        202105 10026482 71.4
        202106 10026482 71.4
        202107 10026482 71.4
        202108 10026482 71.4
        202109 10026482 71.4
        202110 10026482 71.4
        202111 10026482 71.4
        202112 10026482 71.4
        202201 10026482 71.4
        202202 10026482 71.4
        202203 10026482 71.4
        202204 10026482 71.4
        202205 10026482 71.4
        202206 10026482 71.4
        202207 10026482 71.4
        202208 10026482 71.4
        202209 10026482 71.4
        202210 10026482 71.4
        202211 10026482 71.4
        202212 10026482 71.4
        202301 10026482 71.4
        end
        [/CODE]

        Comment


        • #5
          Sorry, my error. Thanks to the -dataex- example, I am able to see what went wrong and fix it:

          Code:
          by randid (date), sort: replace age = age[_n-1] ///
              + daysinmonth(date[_n-1])/cond(mod(year[_n-1], 4), 365, 366) ///
              if _n > 1

          Comment


          • #6
            This works! Thank you so much!
            If you could say a few words about the logic of how this works, that would really help. Other users might also find it easier to adjust the code to their data.
            Last edited by Alex Meyers; 08 Jun 2023, 12:33.

            Comment


            • #7
              We can also take advantage of the datediff_frac function built into Stata:

              Code:
              gen year = floor(yr_mo)/100
              gen month = mod(yr_mo, 100)
              gen date = mdy(month, 1, year)
              format date %td
              
              gen age2 = age
              
              by randid (date), sort: replace age2 = age2[_n-1] ///
                  + daysinmonth(date[_n-1])/cond(mod(year[_n-1], 4), 365, 366) ///
                  if _n > 1
              
              gen age3 = age
              
              by randid (date), sort: replace age3 = age3[_n-1] ///
                  + datediff_frac(date[_n-1], date, "year") if _n > 1
              For reasons that I haven't looked into yet, Clyde's result in age2 gives a somewhat different value than age3 which uses datediff_frac, for many observations.

              Code:
                +------------------------------------------------------------------------------+
                |  yr_mo     randid    age      year   month        date       age2       age3 |
                |------------------------------------------------------------------------------|
                | 201804   10014501   63.4   2018.04       4   01apr2018       63.4       63.4 |
                | 201805   10014501   63.4   2018.05       5   01may2018   63.48219   63.48219 |
                | 201806   10014501   63.4   2018.06       6   01jun2018   63.56712   63.56712 |
                | 201807   10014501   63.4   2018.07       7   01jul2018   63.64931   63.64931 |
                | 201808   10014501   63.4   2018.08       8   01aug2018   63.73425   63.73425 |
                | 201809   10014501   63.4   2018.09       9   01sep2018   63.81918   63.81918 |
                | 201810   10014501   63.4    2018.1      10   01oct2018   63.90137   63.90137 |
                | 201811   10014501   63.4   2018.11      11   01nov2018    63.9863    63.9863 |
                | 201812   10014501   63.4   2018.12      12   01dec2018   64.06849   64.06849 |
                | 201901   10014501   63.4   2019.01       1   01jan2019   64.15342   64.15342 |
                | 201902   10014501   63.4   2019.02       2   01feb2019   64.23835   64.23835 |
                | 201903   10014501   63.4   2019.03       3   01mar2019   64.31506   64.31506 |
                | 201904   10014501   63.4   2019.04       4   01apr2019   64.39999   64.39977 |
                | 201905   10014501   63.4   2019.05       5   01may2019   64.48219   64.48174 |
                | 201906   10014501   63.4   2019.06       6   01jun2019   64.56712   64.56644 |
                | 201907   10014501   63.4   2019.07       7   01jul2019   64.64931   64.64841 |
                | 201908   10014501   63.4   2019.08       8   01aug2019   64.73424   64.73311 |
                | 201909   10014501   63.4   2019.09       9   01sep2019   64.81917   64.81781 |
                | 201910   10014501   63.4    2019.1      10   01oct2019   64.90136   64.89978 |
                | 201911   10014501   63.4   2019.11      11   01nov2019   64.98629   64.98448 |
                | 201912   10014501   63.4   2019.12      12   01dec2019   65.06848   65.06645 |
                | 202001   10014501   63.4   2020.01       1   01jan2020   65.15341   65.15115 |
                | 202002   10014501   63.4   2020.02       2   01feb2020   65.23834   65.23586 |
                | 202003   10014501   63.4   2020.03       3   01mar2020   65.31779   65.31509 |
                | 202004   10014501   63.4   2020.04       4   01apr2020   65.40273   65.40002 |
                | 202005   10014501   63.4   2020.05       5   01may2020   65.48492   65.48221 |
                | 202006   10014501   63.4   2020.06       6   01jun2020   65.56985   65.56714 |
                | 202007   10014501   63.4   2020.07       7   01jul2020   65.65204   65.64933 |
                | 202008   10014501   63.4   2020.08       8   01aug2020   65.73697   65.73426 |
                | 202009   10014501   63.4   2020.09       9   01sep2020    65.8219   65.81919 |
                | 202010   10014501   63.4    2020.1      10   01oct2020   65.90409   65.90138 |
                | 202011   10014501   63.4   2020.11      11   01nov2020   65.98902   65.98631 |
                | 202012   10014501   63.4   2020.12      12   01dec2020   66.07121    66.0685 |
                | 202101   10014501   63.4   2021.01       1   01jan2021   66.15614   66.15343 |
                | 202102   10014501   63.4   2021.02       2   01feb2021   66.24107   66.23837 |
                | 202103   10014501   63.4   2021.03       3   01mar2021   66.31779   66.31508 |
                | 202104   10014501   63.4   2021.04       4   01apr2021   66.40272   66.40001 |
                | 202105   10014501   63.4   2021.05       5   01may2021   66.48491    66.4822 |
                | 202106   10014501   63.4   2021.06       6   01jun2021   66.56984   66.56713 |
                | 202107   10014501   63.4   2021.07       7   01jul2021   66.65203   66.64932 |
                | 202108   10014501   63.4   2021.08       8   01aug2021   66.73696   66.73425 |
                | 202109   10014501   63.4   2021.09       9   01sep2021   66.82189   66.81918 |
                | 202110   10014501   63.4    2021.1      10   01oct2021   66.90408   66.90137 |
                | 202111   10014501   63.4   2021.11      11   01nov2021   66.98901   66.98631 |
                | 202112   10014501   63.4   2021.12      12   01dec2021   67.07121    67.0685 |
                | 202201   10014501   63.4   2022.01       1   01jan2022   67.15614   67.15343 |
                | 202202   10014501   63.4   2022.02       2   01feb2022   67.24107   67.23836 |
                | 202203   10014501   63.4   2022.03       3   01mar2022   67.31778   67.31507 |
                | 202204   10014501   63.4   2022.04       4   01apr2022   67.40271       67.4 |
                | 202205   10014501   63.4   2022.05       5   01may2022    67.4849   67.48219 |
                | 202206   10014501   63.4   2022.06       6   01jun2022   67.56983   67.56712 |
                | 202207   10014501   63.4   2022.07       7   01jul2022   67.65202   67.64931 |
                | 202208   10014501   63.4   2022.08       8   01aug2022   67.73695   67.73425 |
                | 202209   10014501   63.4   2022.09       9   01sep2022   67.82188   67.81918 |
                | 202210   10014501   63.4    2022.1      10   01oct2022   67.90408   67.90137 |
                | 202211   10014501   63.4   2022.11      11   01nov2022   67.98901    67.9863 |
                | 202212   10014501   63.4   2022.12      12   01dec2022    68.0712   68.06849 |
                | 202301   10014501   63.4   2023.01       1   01jan2023   68.15613   68.15342 |
                | 202302   10014501   63.4   2023.02       2   01feb2023   68.24106   68.23835 |
                | 202303   10014501   63.4   2023.03       3   01mar2023   68.31777   68.31506 |
                | 202304   10014501   63.4   2023.04       4   01apr2023    68.4027   68.39977 |
                | 201911   10026482   71.4   2019.11      11   01nov2019       71.4       71.4 |
                | 201912   10026482   71.4   2019.12      12   01dec2019   71.48219   71.48197 |
                | 202001   10026482   71.4   2020.01       1   01jan2020   71.56712   71.56667 |
                | 202002   10026482   71.4   2020.02       2   01feb2020   71.65205   71.65137 |
                | 202003   10026482   71.4   2020.03       3   01mar2020   71.73151   71.73061 |
                | 202004   10026482   71.4   2020.04       4   01apr2020   71.81644   71.81554 |
                | 202005   10026482   71.4   2020.05       5   01may2020   71.89863   71.89773 |
                | 202006   10026482   71.4   2020.06       6   01jun2020   71.98356   71.98266 |
                | 202007   10026482   71.4   2020.07       7   01jul2020   72.06575   72.06485 |
                | 202008   10026482   71.4   2020.08       8   01aug2020   72.15068   72.14978 |
                | 202009   10026482   71.4   2020.09       9   01sep2020   72.23561   72.23471 |
                | 202010   10026482   71.4    2020.1      10   01oct2020    72.3178    72.3169 |
                | 202011   10026482   71.4   2020.11      11   01nov2020   72.40273   72.40183 |
                | 202012   10026482   71.4   2020.12      12   01dec2020   72.48492   72.48402 |
                | 202101   10026482   71.4   2021.01       1   01jan2021   72.56985   72.56895 |
                | 202102   10026482   71.4   2021.02       2   01feb2021   72.65479   72.65388 |
                | 202103   10026482   71.4   2021.03       3   01mar2021    72.7315    72.7306 |
                | 202104   10026482   71.4   2021.04       4   01apr2021   72.81643   72.81553 |
                | 202105   10026482   71.4   2021.05       5   01may2021   72.89862   72.89772 |
                | 202106   10026482   71.4   2021.06       6   01jun2021   72.98355   72.98265 |
                | 202107   10026482   71.4   2021.07       7   01jul2021   73.06574   73.06484 |
                | 202108   10026482   71.4   2021.08       8   01aug2021   73.15067   73.14977 |
                | 202109   10026482   71.4   2021.09       9   01sep2021    73.2356    73.2347 |
                | 202110   10026482   71.4    2021.1      10   01oct2021   73.31779   73.31689 |
                | 202111   10026482   71.4   2021.11      11   01nov2021   73.40273   73.40182 |
                | 202112   10026482   71.4   2021.12      12   01dec2021   73.48492   73.48402 |
                | 202201   10026482   71.4   2022.01       1   01jan2022   73.56985   73.56895 |
                | 202202   10026482   71.4   2022.02       2   01feb2022   73.65478   73.65388 |
                | 202203   10026482   71.4   2022.03       3   01mar2022   73.73149   73.73059 |
                | 202204   10026482   71.4   2022.04       4   01apr2022   73.81642   73.81552 |
                | 202205   10026482   71.4   2022.05       5   01may2022   73.89861   73.89771 |
                | 202206   10026482   71.4   2022.06       6   01jun2022   73.98354   73.98264 |
                | 202207   10026482   71.4   2022.07       7   01jul2022   74.06573   74.06483 |
                | 202208   10026482   71.4   2022.08       8   01aug2022   74.15067   74.14977 |
                | 202209   10026482   71.4   2022.09       9   01sep2022    74.2356    74.2347 |
                | 202210   10026482   71.4    2022.1      10   01oct2022   74.31779   74.31689 |
                | 202211   10026482   71.4   2022.11      11   01nov2022   74.40272   74.40182 |
                | 202212   10026482   71.4   2022.12      12   01dec2022   74.48491   74.48401 |
                | 202301   10026482   71.4   2023.01       1   01jan2023   74.56984   74.56894 |
                +------------------------------------------------------------------------------+

              Comment


              • #8
                If you could say a few words about the logic of how this works, that would really help.
                Code:
                by randid (date), sort: replace age = age[_n-1] ///
                      + daysinmonth(date[_n-1])/cond(mod(year[_n-1], 4), 365, 366) ///
                     if _n > 1
                First, the -by randid (date), sort:- prefix tells Stata that the command that follows is to be carried out in groups defined by randid, and the data should be sorted chronologicall within randid groups. It also tells Stata that references to _n are to be understood as referring to the order of the observation within the randid group.

                Next, looking at the very end of the code, the instructions will be carried out only in observations where _n > 1. In other words, only in the second through final observations of each randid group. The first observation in the group is left as is.

                What does the -replace- command do here? It replaces the value of the age variable by age[_n-1], which is the value of the age variable in the immediately preceding observation within the randid group, plus the value of this expression: daysinmonth(date[_n-1])/cond(mod(year[_n-1], 4), 365, 366). Let's unpack that expression. Take the denominator first. cond(mod(year[_n-1], 4, 365, 366)) means: take the value of year in the preceding observation, divided by 4 and take the remainder. If the remainder is 0, then the denominator will be 366, and if the remainder is other than 0, the denominator will be 365. So this denominator is simply the number of days in the year of the preceding observation, according to whether it is a leap year or not. (Note: this formula is not correct for century years, but there are no century years in this data.) The numerator is simply the number of days in the month of the preceding observation.

                So putting it all together, the data are grouped by randid and sorted chronologically within randid. Then moving down the observations of a randid from second to last, the value of age is updated to the preceding observation's value of age increased by the length of the month reflected by the preceding observation expressed as a fraction of the length of the year in the preceding observation.

                Comment


                • #9
                  Thank you! This is very helpful

                  Comment


                  • #10
                    friends can u suggest reference how to interpret result of analysis of long panel data ?

                    Comment


                    • #11
                      Sara, I provide an answer in your other thread. Please don't post to multiple threads like this.

                      Comment

                      Working...
                      X