Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling missing values by id and time

    Dear Statalist

    I am using Stata 15 and I am trying to replace some missing values for a period of three years previous to a given year (this year is not fixed). For example, for the variable (x1) for id=1 the years 1990-1992 are missing and I need to replace it by the value in 1993. Again, I need to replace the missing in 1994-1996 by the value in 1997, and so on until 2013. I am using the following command:
    Code:
    gen q1 = .
    replace q1 = 1 if year==1990 | year==1991 | year==1992
    gen q2 = .
    replace q2 = 1 if year==1994 | year==1995 | year==1996
    gen q3 = .
    replace q3 = 1 if year==1998 | year==1999 | year==2000
    gen q4 = .
    replace q4 = 1 if year==2002 | year==2003 | year==2004
    gen q5 = .
    replace q5 = 1 if year==2006 | year==2007 | year==2008
    gen q6 = .
    replace q6 = 1 if year==2010 | year==2011 | year==2012
    
    global quarter " x1 x2 x3 x4  "
    global time " q1 q2 q3 q4 q5 q6   "
    
    
     foreach x in $quarter {
     foreach y in $time  {
    
     forvalues i = 1993(4)2013    {
     
         egen `x'_`y' = mean(`x') if year == `i', by(id)
         egen `x'_`y'_mode = mode( `x'_`y' ), by(id)
         replace `x' = `x'_`y'_mode if  `y' == 1
         drop `x'_`y'  `x'_`y'_mode
    }
    }
    }
    I am pretty sure that it might be an easier, shorter and more eficient way to do it. However, I do not know how to do it, and with the command I show you above I cannot see why is it failing.
    The first three years for the variable “x1” should have the value 1 (coming from 1993) but instead it put a “2”. It is clear that I am doing something wrong but I cannot figure out what it is.
    Here you have an example of the dataset:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id year) byte x1 int(x2 x3 x4)
    1 1990 . . . .
    1 1991 . . . .
    1 1992 . . . .
    1 1993 1 5 1 1
    1 1994 . . . .
    1 1995 . . . .
    1 1996 . . . .
    1 1997 2 7 1 1
    1 1998 . . . .
    1 1999 . . . .
    1 2000 . . . .
    1 2001 2 3 3 1
    1 2002 . . . .
    1 2003 . . . .
    1 2004 . . . .
    1 2005 2 5 2 1
    1 2006 . . . .
    1 2007 . . . .
    1 2008 . . . .
    1 2009 2 2 3 1
    1 2010 . . . .
    1 2011 . . . .
    1 2012 . . . .
    1 2013 2 1 6 1
    1 2014 2 2 6 0
    1 2015 2 2 6 0
    2 1990 . . . .
    2 1991 . . . .
    2 1992 . . . .
    2 1993 2 0 2 0
    2 1994 . . . .
    2 1995 . . . .
    2 1996 . . . .
    2 1997 2 0 0 0
    2 1998 . . . .
    2 1999 . . . .
    2 2000 . . . .
    2 2001 1 0 1 3
    2 2002 . . . .
    2 2003 . . . .
    2 2004 . . . .
    2 2005 2 2 6 3
    2 2006 . . . .
    2 2007 . . . .
    2 2008 . . . .
    2 2009 . . . .
    2 2010 . . . .
    2 2011 . . . .
    2 2012 . . . .
    2 2013 . . . .
    2 2014 . . . .
    2 2015 . . . .
    3 1990 . . . .
    3 1991 . . . .
    3 1992 . . . .
    3 1993 2 6 1 4
    3 1994 . . . .
    3 1995 . . . .
    3 1996 . . . .
    3 1997 2 1 0 4
    3 1998 . . . .
    3 1999 . . . .
    3 2000 . . . .
    3 2001 2 1 0 2
    3 2002 . . . .
    3 2003 . . . .
    3 2004 . . . .
    3 2005 1 2 0 3
    3 2006 . . . .
    3 2007 . . . .
    3 2008 . . . .
    3 2009 1 2 0 3
    3 2010 . . . .
    3 2011 . . . .
    3 2012 . . . .
    3 2013 2 2 0 3
    3 2014 . . . .
    3 2015 . . . .
    4 1990 . . . .
    4 1991 . . . .
    4 1992 . . . .
    4 1993 2 0 0 0
    4 1994 . . . .
    4 1995 . . . .
    4 1996 . . . .
    4 1997 2 0 0 0
    4 1998 . . . .
    4 1999 . . . .
    4 2000 . . . .
    4 2001 . . . .
    4 2002 . . . .
    4 2003 . . . .
    4 2004 . . . .
    4 2005 . . . .
    4 2006 . . . .
    4 2007 . . . .
    4 2008 . . . .
    4 2009 . . . .
    4 2010 . . . .
    4 2011 . . . .
    4 2012 . . . .
    4 2013 . . . .
    4 2014 . . . .
    4 2015 . . . .
    end

    Any help will be much appreciated.

  • #2
    I haven't tried to follow your code. The rule seems to be: to copy each known quarterly value backwards to replace previous missing values. That is what mipolate (SSC) calls backward interpolation. There are many other ways to do it.

    Then you have quarterly data that cry out to be reshaped for just about any Stata purpose. Here is some code.

    Code:
    forval q = 1/4 { 
       mipolate x`q' year, by(id) backward gen(X`q')
    }
    reshape long X , i(id year) j(quarter) 
    gen qdate = yq(year, quarter) 
    format qdate %tq  
    sort id qdate
    Here are the results for the first identifier.


    Code:
    . list if id == 1 , sep(4)
    
         +------------------------------------------------------+
         | id   year   quarter   x1   x2   x3   x4   X    qdate |
         |------------------------------------------------------|
      1. |  1   1990         1    .    .    .    .   1   1990q1 |
      2. |  1   1990         2    .    .    .    .   5   1990q2 |
      3. |  1   1990         3    .    .    .    .   1   1990q3 |
      4. |  1   1990         4    .    .    .    .   1   1990q4 |
         |------------------------------------------------------|
      5. |  1   1991         1    .    .    .    .   1   1991q1 |
      6. |  1   1991         2    .    .    .    .   5   1991q2 |
      7. |  1   1991         3    .    .    .    .   1   1991q3 |
      8. |  1   1991         4    .    .    .    .   1   1991q4 |
         |------------------------------------------------------|
      9. |  1   1992         1    .    .    .    .   1   1992q1 |
     10. |  1   1992         2    .    .    .    .   5   1992q2 |
     11. |  1   1992         3    .    .    .    .   1   1992q3 |
     12. |  1   1992         4    .    .    .    .   1   1992q4 |
         |------------------------------------------------------|
     13. |  1   1993         1    1    5    1    1   1   1993q1 |
     14. |  1   1993         2    1    5    1    1   5   1993q2 |
     15. |  1   1993         3    1    5    1    1   1   1993q3 |
     16. |  1   1993         4    1    5    1    1   1   1993q4 |
         |------------------------------------------------------|
     17. |  1   1994         1    .    .    .    .   2   1994q1 |
     18. |  1   1994         2    .    .    .    .   7   1994q2 |
     19. |  1   1994         3    .    .    .    .   1   1994q3 |
     20. |  1   1994         4    .    .    .    .   1   1994q4 |
         |------------------------------------------------------|
     21. |  1   1995         1    .    .    .    .   2   1995q1 |
     22. |  1   1995         2    .    .    .    .   7   1995q2 |
     23. |  1   1995         3    .    .    .    .   1   1995q3 |
     24. |  1   1995         4    .    .    .    .   1   1995q4 |
         |------------------------------------------------------|
     25. |  1   1996         1    .    .    .    .   2   1996q1 |
     26. |  1   1996         2    .    .    .    .   7   1996q2 |
     27. |  1   1996         3    .    .    .    .   1   1996q3 |
     28. |  1   1996         4    .    .    .    .   1   1996q4 |
         |------------------------------------------------------|
     29. |  1   1997         1    2    7    1    1   2   1997q1 |
     30. |  1   1997         2    2    7    1    1   7   1997q2 |
     31. |  1   1997         3    2    7    1    1   1   1997q3 |
     32. |  1   1997         4    2    7    1    1   1   1997q4 |
         |------------------------------------------------------|
     33. |  1   1998         1    .    .    .    .   2   1998q1 |
     34. |  1   1998         2    .    .    .    .   3   1998q2 |
     35. |  1   1998         3    .    .    .    .   3   1998q3 |
     36. |  1   1998         4    .    .    .    .   1   1998q4 |
         |------------------------------------------------------|
     37. |  1   1999         1    .    .    .    .   2   1999q1 |
     38. |  1   1999         2    .    .    .    .   3   1999q2 |
     39. |  1   1999         3    .    .    .    .   3   1999q3 |
     40. |  1   1999         4    .    .    .    .   1   1999q4 |
         |------------------------------------------------------|
     41. |  1   2000         1    .    .    .    .   2   2000q1 |
     42. |  1   2000         2    .    .    .    .   3   2000q2 |
     43. |  1   2000         3    .    .    .    .   3   2000q3 |
     44. |  1   2000         4    .    .    .    .   1   2000q4 |
         |------------------------------------------------------|
     45. |  1   2001         1    2    3    3    1   2   2001q1 |
     46. |  1   2001         2    2    3    3    1   3   2001q2 |
     47. |  1   2001         3    2    3    3    1   3   2001q3 |
     48. |  1   2001         4    2    3    3    1   1   2001q4 |
         |------------------------------------------------------|
     49. |  1   2002         1    .    .    .    .   2   2002q1 |
     50. |  1   2002         2    .    .    .    .   5   2002q2 |
     51. |  1   2002         3    .    .    .    .   2   2002q3 |
     52. |  1   2002         4    .    .    .    .   1   2002q4 |
         |------------------------------------------------------|
     53. |  1   2003         1    .    .    .    .   2   2003q1 |
     54. |  1   2003         2    .    .    .    .   5   2003q2 |
     55. |  1   2003         3    .    .    .    .   2   2003q3 |
     56. |  1   2003         4    .    .    .    .   1   2003q4 |
         |------------------------------------------------------|
     57. |  1   2004         1    .    .    .    .   2   2004q1 |
     58. |  1   2004         2    .    .    .    .   5   2004q2 |
     59. |  1   2004         3    .    .    .    .   2   2004q3 |
     60. |  1   2004         4    .    .    .    .   1   2004q4 |
         |------------------------------------------------------|
     61. |  1   2005         1    2    5    2    1   2   2005q1 |
     62. |  1   2005         2    2    5    2    1   5   2005q2 |
     63. |  1   2005         3    2    5    2    1   2   2005q3 |
     64. |  1   2005         4    2    5    2    1   1   2005q4 |
         |------------------------------------------------------|
     65. |  1   2006         1    .    .    .    .   2   2006q1 |
     66. |  1   2006         2    .    .    .    .   2   2006q2 |
     67. |  1   2006         3    .    .    .    .   3   2006q3 |
     68. |  1   2006         4    .    .    .    .   1   2006q4 |
         |------------------------------------------------------|
     69. |  1   2007         1    .    .    .    .   2   2007q1 |
     70. |  1   2007         2    .    .    .    .   2   2007q2 |
     71. |  1   2007         3    .    .    .    .   3   2007q3 |
     72. |  1   2007         4    .    .    .    .   1   2007q4 |
         |------------------------------------------------------|
     73. |  1   2008         1    .    .    .    .   2   2008q1 |
     74. |  1   2008         2    .    .    .    .   2   2008q2 |
     75. |  1   2008         3    .    .    .    .   3   2008q3 |
     76. |  1   2008         4    .    .    .    .   1   2008q4 |
         |------------------------------------------------------|
     77. |  1   2009         1    2    2    3    1   2   2009q1 |
     78. |  1   2009         2    2    2    3    1   2   2009q2 |
     79. |  1   2009         3    2    2    3    1   3   2009q3 |
     80. |  1   2009         4    2    2    3    1   1   2009q4 |
         |------------------------------------------------------|
     81. |  1   2010         1    .    .    .    .   2   2010q1 |
     82. |  1   2010         2    .    .    .    .   1   2010q2 |
     83. |  1   2010         3    .    .    .    .   6   2010q3 |
     84. |  1   2010         4    .    .    .    .   1   2010q4 |
         |------------------------------------------------------|
     85. |  1   2011         1    .    .    .    .   2   2011q1 |
     86. |  1   2011         2    .    .    .    .   1   2011q2 |
     87. |  1   2011         3    .    .    .    .   6   2011q3 |
     88. |  1   2011         4    .    .    .    .   1   2011q4 |
         |------------------------------------------------------|
     89. |  1   2012         1    .    .    .    .   2   2012q1 |
     90. |  1   2012         2    .    .    .    .   1   2012q2 |
     91. |  1   2012         3    .    .    .    .   6   2012q3 |
     92. |  1   2012         4    .    .    .    .   1   2012q4 |
         |------------------------------------------------------|
     93. |  1   2013         1    2    1    6    1   2   2013q1 |
     94. |  1   2013         2    2    1    6    1   1   2013q2 |
     95. |  1   2013         3    2    1    6    1   6   2013q3 |
     96. |  1   2013         4    2    1    6    1   1   2013q4 |
         |------------------------------------------------------|
     97. |  1   2014         1    2    2    6    0   2   2014q1 |
     98. |  1   2014         2    2    2    6    0   2   2014q2 |
     99. |  1   2014         3    2    2    6    0   6   2014q3 |
    100. |  1   2014         4    2    2    6    0   0   2014q4 |
         |------------------------------------------------------|
    101. |  1   2015         1    2    2    6    0   2   2015q1 |
    102. |  1   2015         2    2    2    6    0   2   2015q2 |
    103. |  1   2015         3    2    2    6    0   6   2015q3 |
    104. |  1   2015         4    2    2    6    0   0   2015q4 |
         +------------------------------------------------------+
    If you only want the previous 3 years filled, and no more, the code might be

    Code:
    gsort id -year 
    by id : gen known = year if x1 < . 
    by id : replace known = known[_n-1] if missing(known) 
    
    forval q = 1/4 { 
       mipolate x`q' year, by(id) backward gen(X`q')
       replace X`q' = . if (known - year) > 3 
    }
    
    reshape long X , i(id year) j(quarter) 
    gen qdate = yq(year, quarter) 
    format qdate %tq  
    sort id qdate
    but that hinges on an assumption (true for your example) that either all of the x variables are known in a given year or none of them are.

    Comment


    • #3
      Dear Nick,

      Thanks a lot for your reply as well as for the code, in fact, I just needed the first part of the code
      Code:
       
       forval q = 1/4 {      mipolate x`q' year, by(id) backward gen(X`q') }
      since I am only interested in replacing the missing of the three first years using the value in the fourth year for the four variables (x1, x2, x3, x4) for each id; without building quarters for a given year.
      However, I realized that for those (id) without and observation in the previous 4-years period, the method will continue filling the missing until the next value or for the entire (id). So using your second suggestion
      Code:
      gsort id -year
      by id : gen known = year if x1 < .
      by id : replace known = known[_n-1] if missing(known)
      
      forval q = 1/4 {
         mipolate x`q' year, by(id) backward gen(X`q')
         replace X`q' = . if (known - year) > 3
      }
      is the perfect match to my problem. I was trying so hard by three days but I did not really knew this "mipolate" command.


      Again, thanks a lot for your time and help.

      Comment


      • #4
        Thanks for the closure.

        You don't really need mipolate for this: how to copy backwards has long been part of the folklore, as in https://www.stata.com/support/faqs/d...issing-values/ and whatever preceded it on Statalist. I can't believe that I invented any of the stuff documented there.

        But I mention mipolate largely as a small Swiss Army knife for interpolation.

        However, doing only it for the last # times is a little trickier.

        Comment

        Working...
        X