Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating variable to identify siblings using longitudinal data

    My data set has 8 time points, and includes households with multiple children (singletons, twins, or 2-3 siblings). Households are identified by hhid, children are identified by childid. Age is identified by ageinmonths. I would like to create a variable that identifies each child as either a singleton, twin, or sibling. For siblings, I would like to identify which one is older and which one is younger (and how to identify, oldest, middle, and youngest in the case of three). I believe this can be done by tagging hhid's that have multiple childid's assigned, and then sorting to generate a variable for older/younger, but am not sure how to do this exactly.

    Also, if I generate a new binary variable, for example "eligible" , and classify each childid as eligible or not, how do I count the unique childid's that are eligible? For example it might say 737 observations are eligible but many of these will be the same childid over multiple time points.

  • #2
    Jillian, the egen command is your friend in these cases, usually bysort hhid: egen... This type of question is common on the forum (search for "household", "hhid", or "child"), but to provide more helpful advice, we really need an example of your data and a example of the outcome you want. Please read the FAQ, especially #12 that explains how to use dataex to provide an example dataset.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      I suspect you've told us less about your data than we need to know:
      • are there parent records or only children records?
      • do you want to identify children as singletons/twins/siblings within each household, or within each household and time point? That is, can a child be a singleton in periods 1-2 and a sibling in periods 3-8?
      Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, with just a few variables and observations, showing the data before the process and how you expect it new variables to look after the process. It would be helpful to post that using dataex as explained in the FAQ, so that readers would be able to test their work on your data if they so desire.

      And I second Carole's suggestion, which crossed mine in cyberspace.

      Comment


      • #4
        Sorry, I thought I had provided enough information. There are household records identified by hhid and child records identified by childid, but not parent records. I just want to identify children as singletons/twins/siblings within each household for the entire time period, not at each time point. The variable I'm looking for at the end is a categorical variable for each childid, categories to include singleton, twin, oldest sib, youngest sib, middle sib. Here is an example from my dataset. The childid variable contains the 4 digit hhid followed by .1 or .2, to identify different children (siblings) within the same household. I also provided the birthday and the child age in months. Please let me know if I can provide any more information.- and just to clarify this is data from just the first time point, if the sibling variable holds constant I suppose it doesn't matter that it's longitudinal data.

        clear
        input int hhid double childid int consensusbday float ageinmonths
        1000 1000.1 19080 4.3039017
        1000 1000.2 18550 21.71663
        1007 1007.1 18942 8.837782
        1008 1008.1 18882 10.809035
        1009 1009.1 19024 6.143737
        1009 1009.2 18652 18.365503
        end
        format %tdnn/dd/CCYY consensusbday
        [/CODE]
        Last edited by Jillian Emerson; 01 May 2016, 10:07.

        Comment


        • #5
          This may start you in the right direction. It creates two categorical variables: count is 1 for singletons, 2 for twins, 3 for triplets, etc; rank gives birth order, 1 for the eldest, etc. These should be sufficient for you to create what you need.
          Code:
          clear
          input byte wave int hhid double childid int bday
          1 1000 1000.1 19080 
          1 1000 1000.2 18550 
          1 1007 1007.1 18942 
          1 1008 1008.1 18882 
          1 1009 1009.1 19024 
          1 1009 1009.2 18652 
          2 1000 1000.1 19080 
          2 1000 1000.2 18550 
          2 1007 1007.1 18942 
          2 1007 1007.2 19942 
          2 1007 1007.3 19942 
          2 1008 1008.1 18882 
          2 1009 1009.1 19024 
          2 1009 1009.2 18652 
          end
          format %tdnn/dd/CCYY bday
          
          sort hhid bday
          egen c = tag(hhid childid)
          bysort hhid  bday:  egen byte count = total(c)
          bysort hhid (bday): generate byte r = sum(c)
          bysort hhid  bday:  egen byte rank = max(r)
          drop c r
          
          sort wave hhid childid
          list, sepby(wave hhid) noobs
          Code:
            +---------------------------------------------------+
            | wave   hhid   childid         bday   count   rank |
            |---------------------------------------------------|
            |    1   1000    1000.1    3/28/2012       1      2 |
            |    1   1000    1000.2   10/15/2010       1      1 |
            |---------------------------------------------------|
            |    1   1007    1007.1   11/11/2011       1      1 |
            |---------------------------------------------------|
            |    1   1008    1008.1    9/12/2011       1      1 |
            |---------------------------------------------------|
            |    1   1009    1009.1     2/1/2012       1      2 |
            |    1   1009    1009.2    1/25/2011       1      1 |
            |---------------------------------------------------|
            |    2   1000    1000.1    3/28/2012       1      2 |
            |    2   1000    1000.2   10/15/2010       1      1 |
            |---------------------------------------------------|
            |    2   1007    1007.1   11/11/2011       1      1 |
            |    2   1007    1007.2     8/7/2014       2      3 |
            |    2   1007    1007.3     8/7/2014       2      3 |
            |---------------------------------------------------|
            |    2   1008    1008.1    9/12/2011       1      1 |
            |---------------------------------------------------|
            |    2   1009    1009.1     2/1/2012       1      2 |
            |    2   1009    1009.2    1/25/2011       1      1 |
            +---------------------------------------------------+

          Comment


          • #6
            thanks very much for taking a look at this! I'm just having trouble figure out what the variable "wave" is and how it was generated?

            Comment


            • #7
              I'm sorry, the data I input was your sample data, modified to make it suitable for testing the code. In particular I made it longitudinal data to demonstrate what happens with twins, and what happens when additional children are added in a later wave.

              I took the 6 lines of your original data, deleted ageinmonths which was not needed and added wave=1 (I suppose I should have named it time or year or something, but the longitudinal data I deal with is panel data from surveys given in multiple "waves"). I then copied that data, changed it to wave=2, and added a pair of twins to hhid 1007 born subsequent to the first child in that household.

              On looking at this, I realize you also need to know the total number of children ever in each household, so that you can tell in the first wave of hhid 1007 that the one child at that time will not always be a singleton. Here's some updated code and output, same sample data.

              Code:
              sort hhid bday
              egen c = tag(hhid childid)
              bysort hhid: egen famsize = total(c)
              bysort hhid  bday:  egen byte count = total(c)
              bysort hhid (bday): generate byte r = sum(c)
              bysort hhid  bday:  egen byte rank = max(r)
              drop c r
              
              sort wave hhid childid
              list, sepby(wave hhid) noobs
              Code:
                +-------------------------------------------------------------+
                | wave   hhid   childid         bday   famsize   count   rank |
                |-------------------------------------------------------------|
                |    1   1000    1000.1    3/28/2012         2       1      2 |
                |    1   1000    1000.2   10/15/2010         2       1      1 |
                |-------------------------------------------------------------|
                |    1   1007    1007.1   11/11/2011         3       1      1 |
                |-------------------------------------------------------------|
                |    1   1008    1008.1    9/12/2011         1       1      1 |
                |-------------------------------------------------------------|
                |    1   1009    1009.1     2/1/2012         2       1      2 |
                |    1   1009    1009.2    1/25/2011         2       1      1 |
                |-------------------------------------------------------------|
                |    2   1000    1000.1    3/28/2012         2       1      2 |
                |    2   1000    1000.2   10/15/2010         2       1      1 |
                |-------------------------------------------------------------|
                |    2   1007    1007.1   11/11/2011         3       1      1 |
                |    2   1007    1007.2     8/7/2014         3       2      3 |
                |    2   1007    1007.3     8/7/2014         3       2      3 |
                |-------------------------------------------------------------|
                |    2   1008    1008.1    9/12/2011         1       1      1 |
                |-------------------------------------------------------------|
                |    2   1009    1009.1     2/1/2012         2       1      2 |
                |    2   1009    1009.2    1/25/2011         2       1      1 |
                +-------------------------------------------------------------+

              Comment

              Working...
              X