Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling in observations by group if year before and year after observations are the same

    Hi,
    I am trying to fill in a variable (state) over IDs IF the state is consistent for the year before and year after. My data for example looks like this:
    ID_child year state
    5032 1983 Tennessee
    5032 1984 Tennessee
    5032 1985 Tennessee
    5032 1986 Tennessee
    5032 1987 Tennessee
    5032 1988 Tennessee
    5032 1989 Tennessee
    5032 1990 Tennessee
    5032 1991 Tennessee
    5032 1992 Tennessee
    5032 1993 Tennessee
    5032 1994 Tennessee
    5032 1995 Tennessee
    5032 1996 Tennessee
    5032 1997 Tennessee
    5032 1998 .
    5032 1999 Tennessee
    5032 2000 .
    5032 2001 Tennessee
    5032 2002 .
    5032 2003 Tennessee
    5032 2004 .
    5032 2005 Tennessee
    5032 2008 .
    5032 2009 Illinois
    5032 2010

    I want it to look like this:

    ID_child year state
    5032 1983 Tennessee
    5032 1984 Tennessee
    5032 1985 Tennessee
    5032 1986 Tennessee
    5032 1987 Tennessee
    5032 1988 Tennessee
    5032 1989 Tennessee
    5032 1990 Tennessee
    5032 1991 Tennessee
    5032 1992 Tennessee
    5032 1993 Tennessee
    5032 1994 Tennessee
    5032 1995 Tennessee
    5032 1996 Tennessee
    5032 1997 Tennessee
    5032 1998 Tennessee
    5032 1999 Tennessee
    5032 2000 Tennessee
    5032 2001 Tennessee
    5032 2002 Tennessee
    5032 2003 Tennessee
    5032 2004 Tennessee
    5032 2005 Tennessee
    5032 2008 .
    5032 2009 Illinois
    5032 2010 Illinois

    For example, the state in year 2008 should be missing because we know that the child moved sometime between 2005 and 2009 but we do not know exactly when so we cannot assume that the state in year 2008 is Tennessee or Illinois. I have the code for filling in missing values based on state[_n+1] and state[_n-1] but when I run this code, it fills in 2008 as Illinois when it should really be missing. How do I go about replace the state as missing for those observations in which the state in the previous year that is not missing (for example in this it is 2005) and the year after this is not missing (for example 2009)?

    Thank you for your help and time.
    Surya


  • #2
    I'm not certain of what you intend or what you might have tried, but here's a very literal
    translation of what I'm guessing you want:
    Code:
    bysort ID_child: replace state = state[_n-1]  if ///
                     (state == ".") & (state[_n-1] != ".") & (state[_n+1] != ".") &  ///
                     (year == year[_n-1] + 1) & (year == year[_n+1] - 1)

    Note, by the way, that what you describe as missing values for state, as you show them, are not missing.
    To Stata, "." is not a missing value for a string variable.

    Regards, Mike
    Last edited by Mike Lacy; 17 Apr 2015, 07:35.

    Comment


    • #3
      There's another way to do it that is easy to think about.
      Code:
        bysort ID_child (year): gen f_state = state[_n-1] if missing(state)
        gsort ID_child -year
        by ID_child: gen b_state = state[_n-1] if missing(state)
        replace state = f_state if missing(state) & f_state == b_state
      So, it's
      1. Copy the -state- forward in time if -state- is missing.
      2. Copy the -state- backward in time if -state- is missing.
      3. If the two copies agree, -state- is the same on either side of the gap and you're going to use that interpolated value.
      Notice that this is more than you asked or than Mike's code will do. It fills gaps longer than 1 year and doesn't depend on equal spacing.






      Comment


      • #4
        Thanks Mike and Nick for the suggestions!
        Nick, your suggestion worked great and solved the problem!
        Surya


        Originally posted by Nick Cox View Post
        There's another way to do it that is easy to think about.
        Code:
        bysort ID_child (year): gen f_state = state[_n-1] if missing(state)
        gsort ID_child -year
        by ID_child: gen b_state = state[_n-1] if missing(state)
        replace state = f_state if missing(state) & f_state == b_state
        So, it's
        1. Copy the -state- forward in time if -state- is missing.
        2. Copy the -state- backward in time if -state- is missing.
        3. If the two copies agree, -state- is the same on either side of the gap and you're going to use that interpolated value.
        Notice that this is more than you asked or than Mike's code will do. It fills gaps longer than 1 year and doesn't depend on equal spacing.





        Comment

        Working...
        X