Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mipolate backward imputation for missing values

    Hi, I need help with imputing missing values for my panel data for 100+ countries for 15 years (from 1990 to 2004). There are six variables. For some variables and countries, the data for the first few years are missing and start randomly within the first 10 years. For example, for country A, variable X might miss the data for 1990 and 1991 but start at 1992 while for country B the same variable X might miss data until 1997 and start in 1998. I need to fill in the missing values for the early years but only if the first non-missing value is within the first five years ahead (so, for 1990, up to 1995 and not beyond). For conceptual reasons, it would be incorrect to use a value that's more than 5 years ahead.

    I have posted an example below with just two countries, Albania and Algeria. For my purpose, the general command:

    mipolate x year, by(id) gen(x_new) backward

    would work for Albania because its first non-missing value starts within the first 5 years (in 1993). However, it wouldn't work for Algeria because its first non-missing value for this variable starts outside of this 5-year restriction (in 1998).


    year country x id

    1990 Albania . 1
    1991 Albania . 1
    1992 Albania . 1
    1993 Albania 5 1
    1994 Albania 5 1

    1995 Albania 5 1
    1996 Albania 6 1
    1997 Albania 5 1
    1998 Albania 5 1
    1999 Albania 5 1

    2000 Albania 5 1
    2001 Albania 5 1
    2002 Albania 7 1
    2003 Albania 7 1
    2004 Albania 7 1

    1990 Algeria . 2
    1991 Algeria . 2
    1992 Algeria . 2
    1993 Algeria . 2
    1994 Algeria . 2

    1995 Algeria . 2
    1996 Algeria . 2
    1997 Algeria . 2
    1998 Algeria -3 2
    1999 Algeria -3 2

    2000 Algeria -3 2
    2001 Algeria -3 2
    2002 Algeria -3 2
    2003 Algeria -3 2
    2004 Algeria 2 2

    Is there a command to restrict the first non-missing value picked to only until the first 5 years ahead? If mipolate does not have a ready command, maybe a set of commands can be constructed from first principles (I have seen Nick Cox talk about it often on this forum)? But I admit I do not know how to create that set of commands. Thanks for any help anyone can offer.

  • #2
    mipolate is from SSC and discussed (e.g.) at https://www.statalist.org/forums/for...-interpolation (compare FAQ Advice #12 on explaining community-contributed commands you mention).

    You can keep track of the date of the last known value and with a little more work of the date of the next known value. Then interpolation can be made conditional on the difference in dates being acceptable. This code could be condensed but spells out more than you need to illuminate not just this problem but also related problems.

    Please note the use of dataex to cut down on the minor surgery needed with your data example. (FAQ Advice #12 again).


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year str7 country byte(x id)
    1990 "Albania"  . 1
    1991 "Albania"  . 1
    1992 "Albania"  . 1
    1993 "Albania"  5 1
    1994 "Albania"  5 1
    1995 "Albania"  5 1
    1996 "Albania"  6 1
    1997 "Albania"  5 1
    1998 "Albania"  5 1
    1999 "Albania"  5 1
    2000 "Albania"  5 1
    2001 "Albania"  5 1
    2002 "Albania"  7 1
    2003 "Albania"  7 1
    2004 "Albania"  7 1
    1990 "Algeria"  . 2
    1991 "Algeria"  . 2
    1992 "Algeria"  . 2
    1993 "Algeria"  . 2
    1994 "Algeria"  . 2
    1995 "Algeria"  . 2
    1996 "Algeria"  . 2
    1997 "Algeria"  . 2
    1998 "Algeria" -3 2
    1999 "Algeria" -3 2
    2000 "Algeria" -3 2
    2001 "Algeria" -3 2
    2002 "Algeria" -3 2
    2003 "Algeria" -3 2
    2004 "Algeria"  2 2
    end
    
    gen negyear = -year
    gen next = year if !missing(x)
    bysort id (negyear) : replace next = next[_n-1] if missing(next)
    
    gen last = year if !missing(x)
    
    bysort id (year) : replace last = last[_n-1] if missing(last)
    
    mipolate x year, by(id) gen(wanted) backward
    gen wanted2 = cond((next - year) <= 5, wanted, .)
    
    list, sepby(id)
    
         +---------------------------------------------------------------------+
         | year   country    x   id   negyear   next   last   wanted   wanted2 |
         |---------------------------------------------------------------------|
      1. | 1990   Albania    .    1     -1990   1993      .        5         5 |
      2. | 1991   Albania    .    1     -1991   1993      .        5         5 |
      3. | 1992   Albania    .    1     -1992   1993      .        5         5 |
      4. | 1993   Albania    5    1     -1993   1993   1993        5         5 |
      5. | 1994   Albania    5    1     -1994   1994   1994        5         5 |
      6. | 1995   Albania    5    1     -1995   1995   1995        5         5 |
      7. | 1996   Albania    6    1     -1996   1996   1996        6         6 |
      8. | 1997   Albania    5    1     -1997   1997   1997        5         5 |
      9. | 1998   Albania    5    1     -1998   1998   1998        5         5 |
     10. | 1999   Albania    5    1     -1999   1999   1999        5         5 |
     11. | 2000   Albania    5    1     -2000   2000   2000        5         5 |
     12. | 2001   Albania    5    1     -2001   2001   2001        5         5 |
     13. | 2002   Albania    7    1     -2002   2002   2002        7         7 |
     14. | 2003   Albania    7    1     -2003   2003   2003        7         7 |
     15. | 2004   Albania    7    1     -2004   2004   2004        7         7 |
         |---------------------------------------------------------------------|
     16. | 1990   Algeria    .    2     -1990   1998      .       -3         . |
     17. | 1991   Algeria    .    2     -1991   1998      .       -3         . |
     18. | 1992   Algeria    .    2     -1992   1998      .       -3         . |
     19. | 1993   Algeria    .    2     -1993   1998      .       -3        -3 |
     20. | 1994   Algeria    .    2     -1994   1998      .       -3        -3 |
     21. | 1995   Algeria    .    2     -1995   1998      .       -3        -3 |
     22. | 1996   Algeria    .    2     -1996   1998      .       -3        -3 |
     23. | 1997   Algeria    .    2     -1997   1998      .       -3        -3 |
     24. | 1998   Algeria   -3    2     -1998   1998   1998       -3        -3 |
     25. | 1999   Algeria   -3    2     -1999   1999   1999       -3        -3 |
     26. | 2000   Algeria   -3    2     -2000   2000   2000       -3        -3 |
     27. | 2001   Algeria   -3    2     -2001   2001   2001       -3        -3 |
     28. | 2002   Algeria   -3    2     -2002   2002   2002       -3        -3 |
     29. | 2003   Algeria   -3    2     -2003   2003   2003       -3        -3 |
     30. | 2004   Algeria    2    2     -2004   2004   2004        2         2 |
         +---------------------------------------------------------------------+
    Reversing time is discussed at https://www.stata.com/support/faqs/d...issing-values/
    Last edited by Nick Cox; 16 Jul 2020, 03:18.

    Comment


    • #3
      Dear Nick, thank you for the very helpful commands and the links to additional resources (and the use of dataex). The commands worked to solve the issue I encountered with the missing data in my dataset. So, thank you, again.

      Comment

      Working...
      X