Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • duration terms

    Dear all,

    I`m trying to calculate duration terms, giving the number of years passed since the birth of the youngest child.

    My data structure looks like this:

    pid year kidbirth1 kidbirth2 kidbirth3
    4 1999 2001 2005 -2
    4 2000 2001 2005 -2
    4 2001 2001 2005 -2
    4 2002 2001 2005 -2
    4 2003 2001 2005 -2
    4 2004 2001 2005 -2
    4 2005 2001 2005 -2
    4 2006 2001 2005 -2
    5 1999 1982 -2 -2
    5 2000 1982 -2 -2
    . .
    . .
    . .

    I tried different things like for example generating a auxiliary variable like:

    sort pid syear
    egen min_kid_year=rowmax(kidbirth*)

    and then calculate a duration term:

    gen duration = 0
    sort pid syear
    by pid: replace duration= syear-min_kid_year if min_kid_year >0 & min_kid_year!=.


    But what makes thinks difficult is that i need the auxiliary variable min_kid_year to be 2001 before the year 2005 (birth of the second child) and to be 2005 after the birth of the second child. Also the "-2" make things tricky.

    I also tried to use -ereplace- but nothing really worked for me.

    I would appreciate any help. Thanks in advance

  • #2
    Welcome to Statalist.

    Perhaps this will start you in a useful direction.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte pid int(year kidbirth1 kidbirth2 kidbirth3)
    4 1999 2001 2005 -2
    4 2000 2001 2005 -2
    4 2001 2001 2005 -2
    4 2002 2001 2005 -2
    4 2003 2001 2005 -2
    4 2004 2001 2005 -2
    4 2005 2001 2005 -2
    4 2006 2001 2005 -2
    5 1999 1982   -2 -2
    5 2000 1982   -2 -2
    end
    mvdecode kidbirth*, mv(-2)
    forvalues k=1/3 {
        generate kb`k' = kidbirth`k' if kidbirth`k'<=year
        }
    egen min_kid_year=rowmax(kb*)
    list, clean noobs
    Code:
    . list, clean noobs
    
        pid   year   kidbir~1   kidbir~2   kidbir~3    kb1    kb2   kb3   min_ki~r  
          4   1999       2001       2005          .      .      .     .          .  
          4   2000       2001       2005          .      .      .     .          .  
          4   2001       2001       2005          .   2001      .     .       2001  
          4   2002       2001       2005          .   2001      .     .       2001  
          4   2003       2001       2005          .   2001      .     .       2001  
          4   2004       2001       2005          .   2001      .     .       2001  
          4   2005       2001       2005          .   2001   2005     .       2005  
          4   2006       2001       2005          .   2001   2005     .       2005  
          5   1999       1982          .          .   1982      .     .       1982  
          5   2000       1982          .          .   1982      .     .       1982
    One key thing: your "-2" values are apparently codes for "missing" - for example, no 2nd or 3rd child. In that case, you should start by transforming them - and similar numeric missing value codes - into Stata missing values, as I do here. See help missing for a longer explanation.

    That you overlook the correct handling of missing values makes me think you're new to Stata. If so, some further advice.

    When I began using Stata in a serious way, I started - as others here did - by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and manual.

    Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

    Comment


    • #3
      Thank you William for your helpful reply and the useful suggestion to read my way trough the Stata manual.

      The reason why I did not transform the "-2" into Stata missing values is that in the next step I want to impute missing birth years with the help of chained imputation. Therefore, I need the "-2" in order to tell Stata not to impute here. But I will use your solution and merge "min_kid_year" to my dataset with the help of the "preserve" and "restore" commands.

      Thanks again!

      Comment


      • #4
        Thanks for the further explanation. This version leaves the "-2" unchanged.
        Code:
        forvalues k=1/3 {
            generate kb`k' = kidbirth`k' if kidbirth`k'<=year & kidbirth`k'!=-2
            }
        egen min_kid_year=rowmax(kb*)
        But with that said, if I wanted to distinguish the "-2" values from the missing values you hope to impute, I would explore replacing "-2" with one of the extended missing values .a through .z.
        Code:
        mvdecode kidbirth*, mv(-2=.a)
        My reading of help mi impute suggests that it only imputes system missing values "." but not extended missing values.

        In general, it is rare that Stata requires using a fake numeric value rather than a missing value when data is not available.

        Comment


        • #5
          That is an interesting hint. Thanks William!

          Comment


          • #6
            I´ve got another question related to my former one:

            I´m currently simulating out-of-sample information on whether individuals become another child in period t+1, t+2, ... after they are no longer interviewed.
            So I do have a variable change_child that indicates whether another child is born in t+1.
            To be able to that for a couple of years in the future, I always have to update the children information in order to use them as explanatory variables.

            To for example update the duration term (discussed above), I´ve to update birth year information.
            Therefore, I need kidbirth`y' to take the value of the year t if change_child[_n-1]==1.
            The problem is that of course every individual has a certain number of children (n_kids) already, so that the `y' of variable kidbirth`y' (that now should contain the birth year t) has to be n_kids+1.

            So for example if a individual has already 3 kids (n_kids = 3 in year t-1) and change_child is 1 in period t-1, kidbirth4 has to contain the year t.

            Can maybe someone help me with the code, since I´ve tried multiple different ways but nothing really worked so far.

            Thank you in advance.

            Comment

            Working...
            X