Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Forward and backword fill of missing sequential variables

    Hello All,

    I'm encountering some difficulties with filling in missing age values in my dataset.

    I have variables name, year, and age. For the same name, some years have missing age values, while other years have existing values for the same name that I can use to fill in the missing ones. There are no gaps in the year variable. I’ve tried forward and backward filling to address the missing values. Forward filling seems to work well, but backward filling only iterates once. There are no two different individuals with the same name.

    Could anyone help me with refining this code?

    Thank you in advance!



    Code:
                        * Step 1: Forward fill with correct year increment
                        while inlist(1, missing(age)) {   // Continue until there are no more missing age values
                            bysort name (year): gen year_diff_forward = year - year[_n-1]
                            bysort name (year): replace age = age[_n-1] + year_diff_forward if missing(age) & !missing(age[_n-1])
                        }
    
                        * Step 2: Backward fill with correct year decrement
                        while inlist(1, missing(AGE)) {   // Continue until there are no more missing age values
                            bysort name (year): gen year_diff_backward = year[_n+1] - year
                            bysort name (year): replace age = age[_n+1] - year_diff_backward if missing(age) & !missing(age[_n+1])
                        }

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str50 name double(year age)
    ""                          1993  .
    ""                          1994  .
    ""                          1995  .
    ""                          1996  .
    ""                          1997  .
    "Maria"             2006 37
    "Maria"             2007 38
    "Maria"             2009 40
    "Maria"             2010 41
    "Maria"             2012 43
    "Elena"               2001  .
    "Elena"               2002  .
    "Elena"               2003 50
    "Elena"               2004 51
    "Elena"               2005 52
    "Elena"               2006 53
    "Elena"               2007 54
    "Elena"               2008 55
    "Elena"               2009 56
    "Elena"               2010 57
    "Elena"               2011 58
    "Elena"               2012 59
    "Elena"               2013 60
    "Elena"               2014 61
    "Elena"               2015 62
    "Elena"               2016 63
    "Luca"   2022 46
    "Marcos"        1997 57
    "Marcos"        1998 58
    "Marcos"        1999 59
    "Marcos"        2000 60
    "Marcos"        2001 61
    "Marcos"        2002 62
    "Marcos"        2003 63
    "Marcos"        2004 64
    "Marcos"        2005 65
    "Marcos"        2006 66
    "Marcos"        2007 67
    "Marcos"        2008 68
    "Marcos"        2009 69
    "Marcos"        2010 70
    "Marcos"        2011 71
    "Marcos"        2012 72
    "Marcos"        2013 73
    "Nico"          1995  .
    "Nico"          1996  .
    "Nico"          2001  .
    "Nico"          2002  .
    "Nico"          2003  .
    "Syu"           2006 59
    "Syu"           2008 61
    "Syu"           2009 62
    "Syu"           2010 63
    "Gerard"   1993 56
    "Gerard"   1994 57
    "Gerard"   1995 58
    "Gerard"   1996 .
    "Gerard"   1997 .
    "Gerard"   1998 61
    "Gerard"   1999 62
    "Gerard"   2000 63
    "Gerard"   2001 64
    "Gerard"   2002 .
    "Gerard"   2003 .
    "Gerard"   2019 49
    "Gerard"   2020 50
    "Gerard"   2021 51
    "Gerard"   2022 52
    "Amir"          2014 46
    "Amir"          2015 47
    "Amir"          2016 48
    "Amir"          2017 49
    "Amir"          2018 50
    "Lola" 2019 53
    "Lola" 2020 54
    "Lola" 2021 55
    "Lola" 2022 56
    "Ashton"             1994 60
    "Ashton"             1995 61
    "Ashton"             1998 64
    "Kotler"            2016 58
    "Kotler"            2017 59
    "Kotler"            2018 60
    "Kotler"            2019 61
    "Kotler"            2020 62
    "Kotler"            2021 63
    "Kotler"            2022 64
    "Kyle"         2022 48
    "Mola"         2020 46
    "Mola"         2021 47
    "Mola"         2022 48
    "Bart"            1997  .
    "Bart"            1998  .
    "Bart"            1999  .
    "Ana"            2021 51
    "D’Art"              2006 65
    "D’Art"              2007 66
    "D’Art"              2008 67
    "D’Art"              2009 68
    "D’Art"              2010 69
    end
    Last edited by Nick Baradar; 15 Sep 2024, 04:30.

  • #2
    Code:
    ssc install carryforward
    help carryforward

    Comment


    • #3
      Originally posted by Rich Goldstein View Post
      Code:
      ssc install carryforward
      help carryforward
      Hello Rich,

      Thank you for sharing the package info. I installed and went over the guide, however, I do not understand how it will solve my problem where I do not need the code to repeat existing values. Do you have any idea why my current code does not work?

      Comment


      • #4
        The principles are that copying downwards is a cascade, but copying upwards is not -- because by default Stata works with the existing order of observations. You could reverse time temporarily, but in your case the problem is one of interpolation for which ipolate is already a dedicated official command. However, your data example shows problems, e.g. two people both called Gerard who appear to be different.

        See e,g,

        SJ-23-3 dm0113 Speaking Stata: Replacing missing values: The easiest problems
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
        3/23 SJ 23(3):884--896 (no commands)
        discusses replacing missing values; includes how to satisfy
        constraints that interpolation is confined to filling gaps
        between values known to be equal or to observations moderately
        close to a known value in time or in some other sequence or
        position variable


        SJ-4-4 dm0008 Stata tip 13: generate and replace use the current sort order
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Newson
        Q4/04 SJ 4(4):484--485 (no commands)
        tips for using generate and replace, which use the
        current sort order

        FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
        1/12 How can I replace missing values with previous or
        following nonmissing values or within sequences?
        http://www.stata.com/support/faqs/data-management/
        replacing-missing-values/



        Here is some technique, and you need more tricks, such as adding extra years for some people, e.g. by tsset and tsfill before you interpolate and extrapolate.

        Code:
        . gen id = sum(name != name[_n-1] | year != year[_n-1] + 1)
        
        .
        . ipolate age year, by(id) gen(age2) epolate
        (13 missing values generated)
        
        .
        . egen anymissing = total(missing(age)), by(id)
        
        .
        . list if anymissing, sepby(id)
        
             +--------------------------------------------+
             |   name   year   age   id   age2   anymis~g |
             |--------------------------------------------|
          1. |          1993     .    1      .          5 |
          2. |          1994     .    1      .          5 |
          3. |          1995     .    1      .          5 |
          4. |          1996     .    1      .          5 |
          5. |          1997     .    1      .          5 |
             |--------------------------------------------|
         11. |  Elena   2001     .    5     48          2 |
         12. |  Elena   2002     .    5     49          2 |
         13. |  Elena   2003    50    5     50          2 |
         14. |  Elena   2004    51    5     51          2 |
         15. |  Elena   2005    52    5     52          2 |
         16. |  Elena   2006    53    5     53          2 |
         17. |  Elena   2007    54    5     54          2 |
         18. |  Elena   2008    55    5     55          2 |
         19. |  Elena   2009    56    5     56          2 |
         20. |  Elena   2010    57    5     57          2 |
         21. |  Elena   2011    58    5     58          2 |
         22. |  Elena   2012    59    5     59          2 |
         23. |  Elena   2013    60    5     60          2 |
         24. |  Elena   2014    61    5     61          2 |
         25. |  Elena   2015    62    5     62          2 |
         26. |  Elena   2016    63    5     63          2 |
             |--------------------------------------------|
         45. |   Nico   1995     .    8      .          2 |
         46. |   Nico   1996     .    8      .          2 |
             |--------------------------------------------|
         47. |   Nico   2001     .    9      .          3 |
         48. |   Nico   2002     .    9      .          3 |
         49. |   Nico   2003     .    9      .          3 |
             |--------------------------------------------|
         54. | Gerard   1993    56   12     56          4 |
         55. | Gerard   1994    57   12     57          4 |
         56. | Gerard   1995    58   12     58          4 |
         57. | Gerard   1996     .   12     59          4 |
         58. | Gerard   1997     .   12     60          4 |
         59. | Gerard   1998    61   12     61          4 |
         60. | Gerard   1999    62   12     62          4 |
         61. | Gerard   2000    63   12     63          4 |
         62. | Gerard   2001    64   12     64          4 |
         63. | Gerard   2002     .   12     65          4 |
         64. | Gerard   2003     .   12     66          4 |
             |--------------------------------------------|
         92. |   Bart   1997     .   21      .          3 |
         93. |   Bart   1998     .   21      .          3 |
         94. |   Bart   1999     .   21      .          3 |
             +--------------------------------------------+
        
        .
        . list if !anymissing, sepby(id)
        
             +--------------------------------------------+
             |   name   year   age   id   age2   anymis~g |
             |--------------------------------------------|
          6. |  Maria   2006    37    2     37          0 |
          7. |  Maria   2007    38    2     38          0 |
             |--------------------------------------------|
          8. |  Maria   2009    40    3     40          0 |
          9. |  Maria   2010    41    3     41          0 |
             |--------------------------------------------|
         10. |  Maria   2012    43    4     43          0 |
             |--------------------------------------------|
         27. |   Luca   2022    46    6     46          0 |
             |--------------------------------------------|
         28. | Marcos   1997    57    7     57          0 |
         29. | Marcos   1998    58    7     58          0 |
         30. | Marcos   1999    59    7     59          0 |
         31. | Marcos   2000    60    7     60          0 |
         32. | Marcos   2001    61    7     61          0 |
         33. | Marcos   2002    62    7     62          0 |
         34. | Marcos   2003    63    7     63          0 |
         35. | Marcos   2004    64    7     64          0 |
         36. | Marcos   2005    65    7     65          0 |
         37. | Marcos   2006    66    7     66          0 |
         38. | Marcos   2007    67    7     67          0 |
         39. | Marcos   2008    68    7     68          0 |
         40. | Marcos   2009    69    7     69          0 |
         41. | Marcos   2010    70    7     70          0 |
         42. | Marcos   2011    71    7     71          0 |
         43. | Marcos   2012    72    7     72          0 |
         44. | Marcos   2013    73    7     73          0 |
             |--------------------------------------------|
         50. |    Syu   2006    59   10     59          0 |
             |--------------------------------------------|
         51. |    Syu   2008    61   11     61          0 |
         52. |    Syu   2009    62   11     62          0 |
         53. |    Syu   2010    63   11     63          0 |
             |--------------------------------------------|
         65. | Gerard   2019    49   13     49          0 |
         66. | Gerard   2020    50   13     50          0 |
         67. | Gerard   2021    51   13     51          0 |
         68. | Gerard   2022    52   13     52          0 |
             |--------------------------------------------|
         69. |   Amir   2014    46   14     46          0 |
         70. |   Amir   2015    47   14     47          0 |
         71. |   Amir   2016    48   14     48          0 |
         72. |   Amir   2017    49   14     49          0 |
         73. |   Amir   2018    50   14     50          0 |
             |--------------------------------------------|
         74. |   Lola   2019    53   15     53          0 |
         75. |   Lola   2020    54   15     54          0 |
         76. |   Lola   2021    55   15     55          0 |
         77. |   Lola   2022    56   15     56          0 |
             |--------------------------------------------|
         78. | Ashton   1994    60   16     60          0 |
         79. | Ashton   1995    61   16     61          0 |
             |--------------------------------------------|
         80. | Ashton   1998    64   17     64          0 |
             |--------------------------------------------|
         81. | Kotler   2016    58   18     58          0 |
         82. | Kotler   2017    59   18     59          0 |
         83. | Kotler   2018    60   18     60          0 |
         84. | Kotler   2019    61   18     61          0 |
         85. | Kotler   2020    62   18     62          0 |
         86. | Kotler   2021    63   18     63          0 |
         87. | Kotler   2022    64   18     64          0 |
             |--------------------------------------------|
         88. |   Kyle   2022    48   19     48          0 |
             |--------------------------------------------|
         89. |   Mola   2020    46   20     46          0 |
         90. |   Mola   2021    47   20     47          0 |
         91. |   Mola   2022    48   20     48          0 |
             |--------------------------------------------|
         95. |    Ana   2021    51   22     51          0 |
             |--------------------------------------------|
         96. |  D'Art   2006    65   23     65          0 |
         97. |  D'Art   2007    66   23     66          0 |
         98. |  D'Art   2008    67   23     67          0 |
         99. |  D'Art   2009    68   23     68          0 |
        100. |  D'Art   2010    69   23     69          0 |
             +--------------------------------------------+
        Last edited by Nick Cox; 16 Sep 2024, 01:31.

        Comment


        • #5
          Hi Nick Cox ! Thanks a lot for your detailed explanation. I understand the logic now and was able to apply the code.

          Comment

          Working...
          X