My dataset has many companies over a span of 7 years. I'm trying to fill in missings for "var" with the values in "newvar." I need to apply 2 main rules to fill in the missing observations for my dataset:
1. If there are existing values in the years before and after the missing, replace missing with the average of existing values closest in time to the missing year. In the below example, for years 2002 and 2003, I want to replace the missing with the average of 2001 and 2004 (hence, 45 in the newvar).
2. If there are existing values only in years after the missing, or only in years before the missing, replace the missing year with the existing value most recent in time. In the example, year 2000 is replaced with the value in 2001 and year 2006 is replaced with the value in 2005.
I'm having trouble with how to implement the first rule. Any help would be very appreciated, thanks!
1. If there are existing values in the years before and after the missing, replace missing with the average of existing values closest in time to the missing year. In the below example, for years 2002 and 2003, I want to replace the missing with the average of 2001 and 2004 (hence, 45 in the newvar).
2. If there are existing values only in years after the missing, or only in years before the missing, replace the missing year with the existing value most recent in time. In the example, year 2000 is replaced with the value in 2001 and year 2006 is replaced with the value in 2005.
Code:
ID year var newvar
A 2000 . 40
A 2001 40 40
A 2002 . 45
A 2003 . 45
A 2004 50 50
A 2005 50 50
A 2006 . 50
Comment