Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing observation between two statuses

    Hi all,

    What I am trying to do is identify all people who have missing observations of two or more years between reporting of being single and being married. This is based on panel data 1984-2015. T tried using the below command (pid=perosnal ID, marstatus =1 if the person is married):

    by pid (syear), sort: gen spell_num = sum((missing(marstatus) != missing(marstatus[_n-1])))
    by pid spell_num (syear), sort: gen spell_duration = _N
    by pid, sort: egen longest_missing_spell = max(cond(missing(marstatus), spell_duration, .))

    by pid, sort: egen ever_married = max(marstatus)
    by pid, sort: gen single_married = Single == 1 | Married==1

    Am I on the right track ?

    Thank you !

    Regards,
    Gabriella

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pid int syear byte marstatus
    101 1984 1
    101 1985 1
    101 1986 1
    101 1987 1
    101 1988 1
    101 1989 1
    102 1984 1
    102 1985 1
    102 1986 1
    102 1987 1
    102 1988 1
    102 1989 1
    103 1984 2
    103 1985 2
    103 1986 2
    103 1987 2
    201 1984 .
    201 1985 .
    201 1986 .
    201 1987 .
    201 1988 .
    201 1989 .
    201 1990 .
    201 1991 .
    201 1992 .
    201 1993 .
    201 1994 .
    201 1995 .
    201 1996 .
    201 1997 .
    201 1998 .
    201 1999 .
    201 2000 4
    201 2001 4
    201 2002 3
    201 2003 4
    201 2004 4
    201 2005 4
    201 2006 4
    201 2007 4
    201 2008 4
    202 1984 2
    202 1985 2
    202 1986 2
    202 1987 2
    203 1984 2
    203 1985 2
    203 1986 2
    203 1987 2
    203 1988 2
    203 1989 2
    203 1990 2
    203 1991 2
    203 1992 2
    203 1993 2
    203 1994 2
    203 1995 2
    203 1996 2
    203 1997 2
    203 1998 2
    203 1999 2
    203 2000 2
    203 2001 2
    203 2002 2
    203 2003 2
    203 2004 2
    203 2005 2
    203 2006 2
    203 2007 2
    203 2008 2
    204 1990 .
    204 1991 .
    204 1992 .
    204 1993 .
    204 1994 .
    301 1984 2
    301 1985 1
    301 1986 1
    301 1987 1
    301 1988 1
    301 1989 1
    301 1990 1
    301 1991 1
    301 1992 1
    301 1993 1
    302 1984 2
    302 1985 1
    302 1986 1
    302 1987 1
    302 1988 1
    302 1989 1
    302 1990 1
    302 1991 1
    302 1992 1
    302 1993 1
    303 1984 .
    303 1985 .
    303 1986 .
    303 1987 .
    303 1988 .
    end
    label values marstatus d11104
    label def d11104 1 "[1] Married        1", modify
    label def d11104 2 "[2] Single         2", modify
    label def d11104 3 "[3] Widowed        3", modify
    label def d11104 4 "[4] Divorced       4", modify

  • #2
    Though this is similar data to a problem you posted recently, the problem is, I think, more complicated and requires a different approach. Perhaps somebody else will come up with a simpler, and more Stata-ish way of solving this, but the following will work:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pid int syear byte marstatus
    101 1984 1
    101 1985 1
    101 1986 1
    101 1987 1
    101 1988 1
    101 1989 1
    102 1984 1
    102 1985 1
    102 1986 1
    102 1987 1
    102 1988 1
    102 1989 1
    103 1984 2
    103 1985 2
    103 1986 2
    103 1987 2
    201 1984 .
    201 1985 .
    201 1986 .
    201 1987 .
    201 1988 .
    201 1989 .
    201 1990 .
    201 1991 .
    201 1992 .
    201 1993 .
    201 1994 .
    201 1995 .
    201 1996 .
    201 1997 .
    201 1998 .
    201 1999 .
    201 2000 4
    201 2001 4
    201 2002 3
    201 2003 4
    201 2004 4
    201 2005 4
    201 2006 4
    201 2007 4
    201 2008 4
    202 1984 2
    202 1985 2
    202 1986 2
    202 1987 2
    203 1984 2
    203 1985 2
    203 1986 2
    203 1987 2
    203 1988 2
    203 1989 2
    203 1990 2
    203 1991 2
    203 1992 2
    203 1993 2
    203 1994 2
    203 1995 2
    203 1996 2
    203 1997 2
    203 1998 2
    203 1999 2
    203 2000 2
    203 2001 2
    203 2002 2
    203 2003 2
    203 2004 2
    203 2005 2
    203 2006 2
    203 2007 2
    203 2008 2
    204 1990 .
    204 1991 .
    204 1992 .
    204 1993 .
    204 1994 .
    301 1984 2
    301 1985 1
    301 1986 1
    301 1987 1
    301 1988 1
    301 1989 1
    301 1990 1
    301 1991 1
    301 1992 1
    301 1993 1
    302 1984 2
    302 1985 .
    302 1986 .
    302 1987 1
    302 1988 1
    302 1989 1
    302 1990 1
    302 1991 1
    302 1992 1
    302 1993 1
    303 1984 .
    303 1985 .
    303 1986 .
    303 1987 .
    303 1988 .
    end
    label values marstatus d11104
    label def d11104 1 "[1] Married        1", modify
    label def d11104 2 "[2] Single         2", modify
    label def d11104 3 "[3] Widowed        3", modify
    label def d11104 4 "[4] Divorced       4", modify
    
    capture program drop one_pid
    program define one_pid
        sort syear
        gen state = (marstatus == 2) in 1
        forvalues i = 2/`=_N' {
            replace state = 1 if marstatus == 2 in `i'
            replace state = 2 in `i' if missing(marstatus[`i']) & state[`i'-1] == 1
            replace state = 3 in `i' if missing(marstatus[`i']) & inlist(state[`i'-1], 2, 3)
            replace state = 4 in `i' if state[`i'-1] == 3 & marstatus[`i'] == 1
        }
        egen wanted = max(state == 4)
        exit
    end
    
    runby one_pid, by(pid)
    Notes: I have changed your example data, because what you gave doesn't have any actual instances of the sequence of events you are looking for. The example data here does (pid 302).

    To use this code you must install -runby-, written by Robert Picard and me, available from SSC.

    Comment


    • #3
      identify all people who have missing observations of two or more years between reporting of being single and being married
      1. This description of yours seems focusing on the cases like (i) 2 . . 1 (exactly 2 consecutive missings) or (ii) 2 . . . 1 (more than 2 consecutive missings). Given your dataset might be a large one, it is not clear for cases like (iii) 2 . 2 . 2 . 1 (more than 1 missing between (the first) 2 to the next 1, but not consecutive) or (iv) 2 . 1 . 3 . . 1 ("unclean" situations). A clarification, therefore, would be needed here.

      2. Indeed, if just serving for the case (i) while not identifying the case (ii) and the others, below line could be an alternative for the code of Clyde in #2. Then, once again, for me, you do need to clarify further, then effective solution(s) might be suggested.
      Code:
       * rename just for shortening the line
      ren marstatus m
      
      bys pid (syear):egen x= max(m==2 & m[_n+1]==. & m[_n+2]==. & m[_n+3]==1 &_n<_N-2)
      Last edited by Romalpa Akzo; 08 Feb 2019, 22:11.

      Comment


      • #4
        Hi Clyde, Hi Romalpa,

        thank you both so much! Both codes work.

        Have an awesome day!

        Regards,
        Gabriela

        Comment


        • #5
          Honestly, I do not think that my code in #3 fits your description in #1. Actually, the code is given out just to illustrate the point that you do need to clarify what you wanted.

          Just give it some more rechecking.
          Last edited by Romalpa Akzo; 09 Feb 2019, 06:18.

          Comment

          Working...
          X