generating variable to identify siblings using longitudinal data

Jillian Emerson

Join Date: Jan 2016

Posts: 12
#1

generating variable to identify siblings using longitudinal data

30 Apr 2016, 13:53

My data set has 8 time points, and includes households with multiple children (singletons, twins, or 2-3 siblings). Households are identified by hhid, children are identified by childid. Age is identified by ageinmonths. I would like to create a variable that identifies each child as either a singleton, twin, or sibling. For siblings, I would like to identify which one is older and which one is younger (and how to identify, oldest, middle, and youngest in the case of three). I believe this can be done by tagging hhid's that have multiple childid's assigned, and then sorting to generate a variable for older/younger, but am not sure how to do this exactly.

Also, if I generate a new binary variable, for example "eligible" , and classify each childid as eligible or not, how do I count the unique childid's that are eligible? For example it might say 737 observations are eligible but many of these will be the same childid over multiple time points.
Tags: None
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#2

30 Apr 2016, 14:20

Jillian, the egen command is your friend in these cases, usually bysort hhid: egen... This type of question is common on the forum (search for "household", "hhid", or "child"), but to provide more helpful advice, we really need an example of your data and a example of the outcome you want. Please read the FAQ, especially #12 that explains how to use dataex to provide an example dataset.

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

30 Apr 2016, 14:26

I suspect you've told us less about your data than we need to know:
are there parent records or only children records?

do you want to identify children as singletons/twins/siblings within each household, or within each household and time point? That is, can a child be a singleton in periods 1-2 and a sibling in periods 3-8?

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, with just a few variables and observations, showing the data before the process and how you expect it new variables to look after the process. It would be helpful to post that using dataex as explained in the FAQ, so that readers would be able to test their work on your data if they so desire.

And I second Carole's suggestion, which crossed mine in cyberspace.
Comment
Jillian Emerson

Join Date: Jan 2016

Posts: 12
#4

01 May 2016, 10:02

Sorry, I thought I had provided enough information. There are household records identified by hhid and child records identified by childid, but not parent records. I just want to identify children as singletons/twins/siblings within each household for the entire time period, not at each time point. The variable I'm looking for at the end is a categorical variable for each childid, categories to include singleton, twin, oldest sib, youngest sib, middle sib. Here is an example from my dataset. The childid variable contains the 4 digit hhid followed by .1 or .2, to identify different children (siblings) within the same household. I also provided the birthday and the child age in months. Please let me know if I can provide any more information.- and just to clarify this is data from just the first time point, if the sibling variable holds constant I suppose it doesn't matter that it's longitudinal data.

clear
input int hhid double childid int consensusbday float ageinmonths
1000 1000.1 19080 4.3039017
1000 1000.2 18550 21.71663
1007 1007.1 18942 8.837782
1008 1008.1 18882 10.809035
1009 1009.1 19024 6.143737
1009 1009.2 18652 18.365503
end
format %tdnn/dd/CCYY consensusbday
[/CODE]

Last edited by Jillian Emerson; 01 May 2016, 10:07.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

01 May 2016, 11:18

This may start you in the right direction. It creates two categorical variables: count is 1 for singletons, 2 for twins, 3 for triplets, etc; rank gives birth order, 1 for the eldest, etc. These should be sufficient for you to create what you need.

Code:

clear
input byte wave int hhid double childid int bday
1 1000 1000.1 19080 
1 1000 1000.2 18550 
1 1007 1007.1 18942 
1 1008 1008.1 18882 
1 1009 1009.1 19024 
1 1009 1009.2 18652 
2 1000 1000.1 19080 
2 1000 1000.2 18550 
2 1007 1007.1 18942 
2 1007 1007.2 19942 
2 1007 1007.3 19942 
2 1008 1008.1 18882 
2 1009 1009.1 19024 
2 1009 1009.2 18652 
end
format %tdnn/dd/CCYY bday

sort hhid bday
egen c = tag(hhid childid)
bysort hhid  bday:  egen byte count = total(c)
bysort hhid (bday): generate byte r = sum(c)
bysort hhid  bday:  egen byte rank = max(r)
drop c r

sort wave hhid childid
list, sepby(wave hhid) noobs

Code:

  +---------------------------------------------------+
  | wave   hhid   childid         bday   count   rank |
  |---------------------------------------------------|
  |    1   1000    1000.1    3/28/2012       1      2 |
  |    1   1000    1000.2   10/15/2010       1      1 |
  |---------------------------------------------------|
  |    1   1007    1007.1   11/11/2011       1      1 |
  |---------------------------------------------------|
  |    1   1008    1008.1    9/12/2011       1      1 |
  |---------------------------------------------------|
  |    1   1009    1009.1     2/1/2012       1      2 |
  |    1   1009    1009.2    1/25/2011       1      1 |
  |---------------------------------------------------|
  |    2   1000    1000.1    3/28/2012       1      2 |
  |    2   1000    1000.2   10/15/2010       1      1 |
  |---------------------------------------------------|
  |    2   1007    1007.1   11/11/2011       1      1 |
  |    2   1007    1007.2     8/7/2014       2      3 |
  |    2   1007    1007.3     8/7/2014       2      3 |
  |---------------------------------------------------|
  |    2   1008    1008.1    9/12/2011       1      1 |
  |---------------------------------------------------|
  |    2   1009    1009.1     2/1/2012       1      2 |
  |    2   1009    1009.2    1/25/2011       1      1 |
  +---------------------------------------------------+

Comment

Jillian Emerson

Join Date: Jan 2016

Posts: 12
#6

02 May 2016, 14:15

thanks very much for taking a look at this! I'm just having trouble figure out what the variable "wave" is and how it was generated?
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 May 2016, 18:20

I'm sorry, the data I input was your sample data, modified to make it suitable for testing the code. In particular I made it longitudinal data to demonstrate what happens with twins, and what happens when additional children are added in a later wave.

I took the 6 lines of your original data, deleted ageinmonths which was not needed and added wave=1 (I suppose I should have named it time or year or something, but the longitudinal data I deal with is panel data from surveys given in multiple "waves"). I then copied that data, changed it to wave=2, and added a pair of twins to hhid 1007 born subsequent to the first child in that household.

On looking at this, I realize you also need to know the total number of children ever in each household, so that you can tell in the first wave of hhid 1007 that the one child at that time will not always be a singleton. Here's some updated code and output, same sample data.

Code:

sort hhid bday
egen c = tag(hhid childid)
bysort hhid: egen famsize = total(c)
bysort hhid  bday:  egen byte count = total(c)
bysort hhid (bday): generate byte r = sum(c)
bysort hhid  bday:  egen byte rank = max(r)
drop c r

sort wave hhid childid
list, sepby(wave hhid) noobs

Code:

  +-------------------------------------------------------------+
  | wave   hhid   childid         bday   famsize   count   rank |
  |-------------------------------------------------------------|
  |    1   1000    1000.1    3/28/2012         2       1      2 |
  |    1   1000    1000.2   10/15/2010         2       1      1 |
  |-------------------------------------------------------------|
  |    1   1007    1007.1   11/11/2011         3       1      1 |
  |-------------------------------------------------------------|
  |    1   1008    1008.1    9/12/2011         1       1      1 |
  |-------------------------------------------------------------|
  |    1   1009    1009.1     2/1/2012         2       1      2 |
  |    1   1009    1009.2    1/25/2011         2       1      1 |
  |-------------------------------------------------------------|
  |    2   1000    1000.1    3/28/2012         2       1      2 |
  |    2   1000    1000.2   10/15/2010         2       1      1 |
  |-------------------------------------------------------------|
  |    2   1007    1007.1   11/11/2011         3       1      1 |
  |    2   1007    1007.2     8/7/2014         3       2      3 |
  |    2   1007    1007.3     8/7/2014         3       2      3 |
  |-------------------------------------------------------------|
  |    2   1008    1008.1    9/12/2011         1       1      1 |
  |-------------------------------------------------------------|
  |    2   1009    1009.1     2/1/2012         2       1      2 |
  |    2   1009    1009.2    1/25/2011         2       1      1 |
  +-------------------------------------------------------------+

Announcement