Hello All,
I'm encountering some difficulties with filling in missing age values in my dataset.
I have variables name, year, and age. For the same name, some years have missing age values, while other years have existing values for the same name that I can use to fill in the missing ones. There are no gaps in the year variable. I’ve tried forward and backward filling to address the missing values. Forward filling seems to work well, but backward filling only iterates once. There are no two different individuals with the same name.
Could anyone help me with refining this code?
Thank you in advance!
I'm encountering some difficulties with filling in missing age values in my dataset.
I have variables name, year, and age. For the same name, some years have missing age values, while other years have existing values for the same name that I can use to fill in the missing ones. There are no gaps in the year variable. I’ve tried forward and backward filling to address the missing values. Forward filling seems to work well, but backward filling only iterates once. There are no two different individuals with the same name.
Could anyone help me with refining this code?
Thank you in advance!
Code:
* Step 1: Forward fill with correct year increment while inlist(1, missing(age)) { // Continue until there are no more missing age values bysort name (year): gen year_diff_forward = year - year[_n-1] bysort name (year): replace age = age[_n-1] + year_diff_forward if missing(age) & !missing(age[_n-1]) } * Step 2: Backward fill with correct year decrement while inlist(1, missing(AGE)) { // Continue until there are no more missing age values bysort name (year): gen year_diff_backward = year[_n+1] - year bysort name (year): replace age = age[_n+1] - year_diff_backward if missing(age) & !missing(age[_n+1]) }
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str50 name double(year age) "" 1993 . "" 1994 . "" 1995 . "" 1996 . "" 1997 . "Maria" 2006 37 "Maria" 2007 38 "Maria" 2009 40 "Maria" 2010 41 "Maria" 2012 43 "Elena" 2001 . "Elena" 2002 . "Elena" 2003 50 "Elena" 2004 51 "Elena" 2005 52 "Elena" 2006 53 "Elena" 2007 54 "Elena" 2008 55 "Elena" 2009 56 "Elena" 2010 57 "Elena" 2011 58 "Elena" 2012 59 "Elena" 2013 60 "Elena" 2014 61 "Elena" 2015 62 "Elena" 2016 63 "Luca" 2022 46 "Marcos" 1997 57 "Marcos" 1998 58 "Marcos" 1999 59 "Marcos" 2000 60 "Marcos" 2001 61 "Marcos" 2002 62 "Marcos" 2003 63 "Marcos" 2004 64 "Marcos" 2005 65 "Marcos" 2006 66 "Marcos" 2007 67 "Marcos" 2008 68 "Marcos" 2009 69 "Marcos" 2010 70 "Marcos" 2011 71 "Marcos" 2012 72 "Marcos" 2013 73 "Nico" 1995 . "Nico" 1996 . "Nico" 2001 . "Nico" 2002 . "Nico" 2003 . "Syu" 2006 59 "Syu" 2008 61 "Syu" 2009 62 "Syu" 2010 63 "Gerard" 1993 56 "Gerard" 1994 57 "Gerard" 1995 58 "Gerard" 1996 . "Gerard" 1997 . "Gerard" 1998 61 "Gerard" 1999 62 "Gerard" 2000 63 "Gerard" 2001 64 "Gerard" 2002 . "Gerard" 2003 . "Gerard" 2019 49 "Gerard" 2020 50 "Gerard" 2021 51 "Gerard" 2022 52 "Amir" 2014 46 "Amir" 2015 47 "Amir" 2016 48 "Amir" 2017 49 "Amir" 2018 50 "Lola" 2019 53 "Lola" 2020 54 "Lola" 2021 55 "Lola" 2022 56 "Ashton" 1994 60 "Ashton" 1995 61 "Ashton" 1998 64 "Kotler" 2016 58 "Kotler" 2017 59 "Kotler" 2018 60 "Kotler" 2019 61 "Kotler" 2020 62 "Kotler" 2021 63 "Kotler" 2022 64 "Kyle" 2022 48 "Mola" 2020 46 "Mola" 2021 47 "Mola" 2022 48 "Bart" 1997 . "Bart" 1998 . "Bart" 1999 . "Ana" 2021 51 "D’Art" 2006 65 "D’Art" 2007 66 "D’Art" 2008 67 "D’Art" 2009 68 "D’Art" 2010 69 end
Comment