Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • update birth years

    Dear all,

    I´m currently simulating whether individuals/couples get another child in the next period t+1, t+2, ... after they no longer participate in the survey I work with.

    So I do have a variable change_child that indicates whether another child is born in t+1.
    To be able to that for a couple of years in the future, I always have to update the children information in order to use them as explanatory variables.

    For these information I need to assign the birth years of the children to their parents.

    My data looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int year float number_kids int(kidbirth01 kidbirth02 kidbirth03) float change_kid
     10 2000 2 1990 1992 -2 0
     10 2001 2 1990 1992 -2 0
     10 2002 2 1990 1992 -2 1
     10 2003 3  1990 1992  .   .
     13 2000 0   -2   -2 -2 1
     13 2001 1  2001   -2 -2 0
     13 2002 1   2001   -2 -2 1
     13 2003 2   2001   . . .
     13 2004 .   . . . .
    Therefore, I need kidbirth`y' to take the value of the year t if change_child[_n-1]==1.

    The problem is that of course every individual has a certain number of children (n_kids) already, so that the `y' of variable kidbirth`y' (that now should contain the birth year t) has to be n_kids+1.

    So for example if a individual has already 2 kids (see example ID 10) and change_child is 1 in period 2002, kidbirth03 has to contain the year 2003.

    Can maybe someone help me with the code, since I´ve tried multiple different ways but nothing really worked so far.

    Thank you in advance.

  • #2
    There are several obstacles here. First, the use of -2 as a code for missing values in your kidbirth variables gets in the way. (In general, in Stata it is a bad idea to use numeric codes for missing values. It is very difficult to avoid, at some point, mistakenly doing a calculation which treats those numbers at face value. Stata has system missing and extended missing values, which are better alternatives for denoting missing values.) Second, having a series of variables like kidbirth1, kidbirth2, etc. often a source of difficulty in Stata. Long layout of that kind of information usually works better (although in this case, it would actually make matters even more confusing.)

    I think this is best done by combining the initial kidbirth years for each id into a single string variable, and then tacking an additional year onto the end of that variable when a new child is born. Finally, if you really want a separate variable for each birth year (which I still think is a bad idea) you can split up the string:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int year float number_kids int(kidbirth01 kidbirth02 kidbirth03) float change_kid
     10 2000 2 1990 1992 -2 0
     10 2001 2 1990 1992 -2 0
     10 2002 2 1990 1992 -2 1
     10 2003 3  1990 1992  .   .
     13 2000 0   -2   -2 -2 1
     13 2001 1  2001   -2 -2 0
     13 2002 1   2001   -2 -2 1
     13 2003 2   2001   . . .
     13 2004 .   . . . .
     end
     
    mvdecode kidbirth*, mv(-2)
    egen birthyears = concat(kidbirth*), punct(" ")
    drop kidbirth*
    by id (year), sort: replace birthyears = "" if _n > 1
    replace birthyears = trim(itrim(birthyears))
    replace birthyears = subinstr(birthyears, ".", "", .)
    by id (year), sort: replace birthyears = ///
        cond(number_kids > number_kids[_n-1] & !missing(number_kids), ///
        birthyears[_n-1] + " " + string(year), birthyears[_n-1]) if _n > 1
        
    //    AND IF YOU REALLY WANT EACH KID BIRTH YEAR IN A SEPARATE VARIABLE
    //    (WHICH IS PROBABLY NOT A GOOD IDEA), YOU CAN ADD THIS:
    split birthyears, destring gen(kidbirth)

    Comment


    • #3
      Thank you very much for your help, Clyde.

      I tried to run code and it almost did exactly what I want it to do. But there seems to be a problem, that sometimes the string variable "birthyears" contains the same birth year twice (see the example below).

      Do you maybe have an idea on how to fix this?

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long id int year float(numbers_kids change_kid) str79 birthyears int(kidbirth1 kidbirth2 kidbirth3 kidbirth4 kidbirth5 kidbirth6 kidbirth7 kidbirth8 kidbirth9 kidbirth10 kidbirth11 kidbirth12 kidbirth13 kidbirth14 kidbirth15kidbirth16)
      61 1984 0 0 "1990 1996      "           1990 1996    .    . . . . . . . . . . . . .
      61 1985 0 0 "1990 1996      "           1990 1996    .    . . . . . . . . . . . . .
      61 1986 0 0 "1990 1996      "           1990 1996    .    . . . . . . . . . . . . .
      61 1987 0 0 "1990 1996      "           1990 1996    .    . . . . . . . . . . . . .
      61 1988 0 0 "1990 1996      "           1990 1996    .    . . . . . . . . . . . . .
      61 1989 0 1 "1990 1996      "           1990 1996    .    . . . . . . . . . . . . .
      61 1990 1 0 "1990 1996       1990"      1990 1996 1990    . . . . . . . . . . . . .
      61 1991 1 0 "1990 1996       1990"      1990 1996 1990    . . . . . . . . . . . . .
      61 1992 1 0 "1990 1996       1990"      1990 1996 1990    . . . . . . . . . . . . .
      61 1993 1 0 "1990 1996       1990"      1990 1996 1990    . . . . . . . . . . . . .
      61 1994 1 0 "1990 1996       1990"      1990 1996 1990    . . . . . . . . . . . . .
      61 1995 1 1 "1990 1996       1990"      1990 1996 1990    . . . . . . . . . . . . .
      61 1996 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 1997 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 1998 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 1999 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2000 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2001 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2002 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2003 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2004 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2005 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2006 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2007 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2008 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2009 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2010 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2011 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2012 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2013 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2014 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2015 2 0 "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      61 2016 2 . "1990 1996       1990 1996" 1990 1996 1990 1996 . . . . . . . . . . . .
      end

      Comment


      • #4
        Well, you don't show the starting data this time, but I can infer back to what it was, and I see the source of the problem. Let's look at the very first observation. It is individual 61 in year 1984, but you already show that there are children born in 19990 and 1996. Those are carried along, and then when 1990 and 1996 are reached, they get added in again. In your original data, there were never any references in the beginning observations to children not yet born. For example, in the original data example, for id 10, the first value of year was 2000--and the initial birth years were in the past. The fix is simple. (I have modified your -dataex- in two ways: I have eliminated the variable birthyears, because that variable is actually created by the code and is not an input. And I have renamed numbers_kids to number_kids to make it consistent with the earlier -dataex- and the earlier code.)

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long id int year float(number_kids change_kid) int(kidbirth1 kidbirth2 kidbirth3 kidbirth4 kidbirth5 kidbirth6 kidbirth7 kidbirth8 kidbirth9 kidbirth10 kidbirth11 kidbirth12 kidbirth13 kidbirth14 kidbirth15kidbirth16)
        61 1984 0 0 1990 1996    .    . . . . . . . . . . . .
        61 1985 0 0 1990 1996    .    . . . . . . . . . . . .
        61 1986 0 0 1990 1996    .    . . . . . . . . . . . .
        61 1987 0 0 1990 1996    .    . . . . . . . . . . . .
        61 1988 0 0 1990 1996    .    . . . . . . . . . . . .
        61 1989 0 1 1990 1996    .    . . . . . . . . . . . .
        61 1990 1 0 1990 1996 1990    . . . . . . . . . . . .
        61 1991 1 0 1990 1996 1990    . . . . . . . . . . . .
        61 1992 1 0 1990 1996 1990    . . . . . . . . . . . .
        61 1993 1 0 1990 1996 1990    . . . . . . . . . . . .
        61 1994 1 0 1990 1996 1990    . . . . . . . . . . . .
        61 1995 1 1 1990 1996 1990    . . . . . . . . . . . .
        61 1996 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 1997 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 1998 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 1999 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2000 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2001 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2002 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2003 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2004 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2005 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2006 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2007 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2008 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2009 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2010 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2011 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2012 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2013 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2014 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2015 2 0 1990 1996 1990 1996 . . . . . . . . . . .
        61 2016 2 . 1990 1996 1990 1996 . . . . . . . . . . .
        end
        
        
        mvdecode kidbirth*, mv(-2)
        
        //    ELIMINATE BIRTHS THAT HAVE NOT YET OCCURRED
        foreach v of varlist kidbirth* {
            replace `v' = . if `v' > year
        }
        
        egen birthyears = concat(kidbirth*), punct(" ")
        drop kidbirth*
        by id (year), sort: replace birthyears = "" if _n > 1
        replace birthyears = trim(itrim(birthyears))
        replace birthyears = subinstr(birthyears, ".", "", .)
        by id (year), sort: replace birthyears = ///
            cond(number_kids > number_kids[_n-1] & !missing(number_kids), ///
            birthyears[_n-1] + " " + string(year), birthyears[_n-1]) if _n > 1
            
        //    AND IF YOU REALLY WANT EACH KID BIRTH YEAR IN A SEPARATE VARIABLE
        //    (WHICH IS PROBABLY NOT A GOOD IDEA), YOU CAN ADD THIS:
        split birthyears, destring gen(kidbirth)

        Comment


        • #5
          Works perfectly. Thanks a lot

          Comment

          Working...
          X