Inconsistencies in age and year using PSID data

Surya Singh

Join Date: Sep 2014

Posts: 54
#1

Inconsistencies in age and year using PSID data

27 Jan 2015, 12:14

I'm using PSID data focusing only on women. I have year that the woman is breastfeeding but they are inconsistent for some of the women because I don’t have the exact birth year of the child in the main PSID file. For example for this woman:
ID year childage_months childage_yrs year_breastfeedingR
579 1995
579 1996
579 1997 88.9 7.408333 1990
579 1998
579 1999
579 2000
579 2001
579 2002 155.28 12.94 1989
579 2003
579 2004
579 2005
579 2006
579 2007 219.12 18.26 1989
579 2008
579 2009
579 2010
I used year_breastfeedingR =round(( year – childage_yrs), 1) and I did childage_yrs = childage_months/12
I’m not sure what to do here because if the child age in 1997 is 7.4, the child age in 2002 should be around 12.4 and then 17.4 in 2007 and so the year_breastfeedingR should be 1990 rather than 1989. Is there a way to fix this by using the child age in 1997 as my reference point (and then fill in the rest according to the year, help on how to do this would be appreciated as well!) but some women didn’t get interviewed in the 1997 child development supplement (which has the question about child age in years and breastfeeding) and have the child age for the first time in 2002 or 2007.
Any help would be greatly appreciated! Thanks.
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

27 Jan 2015, 15:50

What is PSID? Stata is an international list, so such an acronym will not be familiar to many. I know it used to mean "Panel Study of Income Dynamics", but it appears to have a broader scope now.

I'm confused about your questions, so I'll start out with basics.

• Exactly what is this childage_months variable and how was it computed? Be exact. I notice it is is fractional months, which suggests it was recorded in finer units. What does it have to do with breast feeding? Go back to the original questionnaire tell us the questions on which it was based.

• What is your goal: to get an accurate age at subsequent interviews, or something else?

Last edited by Steve Samuels; 27 Jan 2015, 16:14.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#3

27 Jan 2015, 18:13

Hi Surya, as Steve points out, you should check how the childage_months variable is computed. It is possible that you get different values for childage_yrs since the observations for childage_months are recorded at different points in separate years, e.g. January 1997, May 2002, December 2007, etc. However, since you choose to fix your base year at 1997, the following code should give you a variable that indicates childage_yrs at the same point in time the 97 survey took part for the years 2002 and 2007, and a common year variable for year_breastfeedingR corresponding to 1997. I label these variables childage_base_yr and year_breastfeedingR_base_yr

Code:

sort ID year by ID year: gen childage_base_yr= childage_yrs if year==1997 by ID: replace childage_base_yr = childage_base_yr[_n-5] if missing( childage_base_yr) replace childage_base_yr= childage_base_yr+ year-1997 gen year_breastfeedingR_base_yr= year_breastfeedingR if year==1997 sort ID year by ID: replace year_breastfeedingR_base_yr= year_breastfeedingR_base_yr[_n-5] if missing( year_breastfeedingR_base_yr)

Last edited by Andrew Musau; 27 Jan 2015, 18:23.
Comment
Surya Singh

Join Date: Sep 2014

Posts: 54
#4

29 Jan 2015, 04:10

Hi Steve,

Apologies for not making it clear, the PSID is the Panel Study of Income Dynamics used in the USA.
My goal is to compute the variables (such as labor income, wages, family income, employment status) when the woman was breastfeeding and therefore, I need the year the woman is breastfeeding of which I'm computing using child age in months of the time of the interview. It seems to have been computed by the interview date and "preloaded birth date of child" according to the codebook. I want to compare woman's labor income, for example, before she was breastfeeding and after she was breastfeeding.

I have actually figured out the original question that I posted but now trying to figure out how to construct a loop over observations that creates variables when the mother was breastfeeding such as laborincome_breastfeeding using year_breastfeeding. Is there a way to do this?

Thank you for your help!

Originally posted by Steve Samuels View Post

What is PSID? Stata is an international list, so such an acronym will not be familiar to many. I know it used to mean "Panel Study of Income Dynamics", but it appears to have a broader scope now.

I'm confused about your questions, so I'll start out with basics.

• Exactly what is this childage_months variable and how was it computed? Be exact. I notice it is is fractional months, which suggests it was recorded in finer units. What does it have to do with breast feeding? Go back to the original questionnaire tell us the questions on which it was based.

• What is your goal: to get an accurate age at subsequent interviews, or something else?

Last edited by Surya Singh; 29 Jan 2015, 04:13.
Comment
Surya Singh

Join Date: Sep 2014

Posts: 54
#5

29 Jan 2015, 04:13

Hi Andrew,

Thank you for the code, it did what I wanted it to do! You're right about why there are inconsistencies in the child age in the survey but hopefully using 1997 as the base year will be okay.
I am trying to now to figure out how to construct a loop over observations that creates variables when the mother was breastfeeding such as laborincome_breastfeeding using year that the mother was breastfeeding since I want to compare woman's labor income, for example, before she was breastfeeding and after she was breastfeeding. Do you have any tips on this?

Thank you for your help!

Originally posted by Andrew Musau View Post

Hi Surya, as Steve points out, you should check how the childage_months variable is computed. It is possible that you get different values for childage_yrs since the observations for childage_months are recorded at different points in separate years, e.g. January 1997, May 2002, December 2007, etc. However, since you choose to fix your base year at 1997, the following code should give you a variable that indicates childage_yrs at the same point in time the 97 survey took part for the years 2002 and 2007, and a common year variable for year_breastfeedingR corresponding to 1997. I label these variables childage_base_yr and year_breastfeedingR_base_yr

Code:

sort ID year by ID year: gen childage_base_yr= childage_yrs if year==1997 by ID: replace childage_base_yr = childage_base_yr[_n-5] if missing( childage_base_yr) replace childage_base_yr= childage_base_yr+ year-1997 gen year_breastfeedingR_base_yr= year_breastfeedingR if year==1997 sort ID year by ID: replace year_breastfeedingR_base_yr= year_breastfeedingR_base_yr[_n-5] if missing( year_breastfeedingR_base_yr)
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#6

30 Jan 2015, 09:59

Hello Surya again

All you need to do is to have all your variables in one dataset and then have an indicator variable (dummy 0/1) that takes a value of 1 for years after a mother starts breast feeding and a value of 0 otherwise. This variable will distinguish between observations that relate to the pre period and the post period, and you can do various forms of comparisons and analysis.

Therefore, if you have variables in different datasets, you will need to merge the datasets. The following command will guide you on how to do this

Code:

help merge

Apart from this detail, as a referee, I would anticipate the following issues prior to undertaking the kind of analysis that you suggest:

I want to compare woman's labor income, for example, before she was breastfeeding and after she was breastfeeding.

It is very easy to confound effects are due to "breast feeding" with those due to "having a child". I presume that you are interested in the effects that are due to a mother's decision to breastfeed. If you are familiar with earlier empirical models of female labor supply in Labor Economics, the main goal was to estimate income and substitution effects. The reason why these studies used women and not men is that over the life cycle, there is not much variation in the income of men whereas there is a lot of variation in women's income. A primary factor that affects a woman's income is the decision to have a child (children), others being her level of education, her previous labor market experience, her age, her non-labor income, her husband's income, etc.

Therefore, let us assume that we observe that for the average woman, her income is lower post breastfeeding compared to pre breast feeding. Should we attribute this to her decision to breastfeed? (1) It may be that she decides to work part time after having a child (assuming that she worked full time prior to having a child), so that she spends more time raising the kid. So one factor to control for is labor market status: employed full time/ employed part time/ unemployed. (2) Another possibility is that her husband decided to work extra and thus the wife needs not worry about her individual income because the family income has increased - thus we need to control for changes in aggregate family income, the age of the kid, and so on. Remember that all these reasons are due to having a child and not breast feeding per se.

To escape the above confound, you may want to restrict your sample to women who have a child and distinguish between those who breast feed and those who use the bottle (formula feed), so that the issue is not having a kid. As before, you have to control for factors such as education, e.g., it may be the case that highly educated women are more likely to formula feed and these women on average have higher incomes. To achieve this distinction, however, you will need a variable that indicates whether a woman breast feeds versus formula feeds.

You may be aware of the following paper by Mroz which uses the PSID - it is an excellent reference for those earlier studies.

Thomas A. Mroz, Econometrica Vol. 55 No.4, July 1987 pp.765-799)
http://eml.berkeley.edu/~cle/e250a_f14/mroz-paper.pdf

Last edited by Andrew Musau; 30 Jan 2015, 10:15.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

31 Jan 2015, 17:55

I agree with Andrew's excellent suggestions.

According to this page- https://psidonline.isr.umich.edu/Guide/FAQ.aspx?Type=9, -month and day of interview are in the data set. So, for every event for which you have dates (dates of interview). From that you can estimate the child's birth date to within a few days: multiply the fractional month-age by 12.4375 and subtract from the interview date. Now date of birth can be considered the start of breastfeeding.

In your post you refer to a variable "year_breastfeeding", but why a single year? Women can breastfeed the same child during more than one calendar year. You must know the ending year and month to unambiguously define the "after" for your analysis.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Surya Singh

Join Date: Sep 2014

Posts: 54
#8

10 Feb 2015, 07:41

Hi Andrew,

Thank you for your helpful suggestions! I have created the indicator variable. The paper is also very informative and I will also take that into account for my study.

Best wishes,
Surya

Originally posted by Andrew Musau View Post

Hello Surya again

All you need to do is to have all your variables in one dataset and then have an indicator variable (dummy 0/1) that takes a value of 1 for years after a mother starts breast feeding and a value of 0 otherwise. This variable will distinguish between observations that relate to the pre period and the post period, and you can do various forms of comparisons and analysis.

Therefore, if you have variables in different datasets, you will need to merge the datasets. The following command will guide you on how to do this

Code:

help merge

Apart from this detail, as a referee, I would anticipate the following issues prior to undertaking the kind of analysis that you suggest:

It is very easy to confound effects are due to "breast feeding" with those due to "having a child". I presume that you are interested in the effects that are due to a mother's decision to breastfeed. If you are familiar with earlier empirical models of female labor supply in Labor Economics, the main goal was to estimate income and substitution effects. The reason why these studies used women and not men is that over the life cycle, there is not much variation in the income of men whereas there is a lot of variation in women's income. A primary factor that affects a woman's income is the decision to have a child (children), others being her level of education, her previous labor market experience, her age, her non-labor income, her husband's income, etc.

Therefore, let us assume that we observe that for the average woman, her income is lower post breastfeeding compared to pre breast feeding. Should we attribute this to her decision to breastfeed? (1) It may be that she decides to work part time after having a child (assuming that she worked full time prior to having a child), so that she spends more time raising the kid. So one factor to control for is labor market status: employed full time/ employed part time/ unemployed. (2) Another possibility is that her husband decided to work extra and thus the wife needs not worry about her individual income because the family income has increased - thus we need to control for changes in aggregate family income, the age of the kid, and so on. Remember that all these reasons are due to having a child and not breast feeding per se.

To escape the above confound, you may want to restrict your sample to women who have a child and distinguish between those who breast feed and those who use the bottle (formula feed), so that the issue is not having a kid. As before, you have to control for factors such as education, e.g., it may be the case that highly educated women are more likely to formula feed and these women on average have higher incomes. To achieve this distinction, however, you will need a variable that indicates whether a woman breast feeds versus formula feeds.

You may be aware of the following paper by Mroz which uses the PSID - it is an excellent reference for those earlier studies.

Thomas A. Mroz, Econometrica Vol. 55 No.4, July 1987 pp.765-799)
http://eml.berkeley.edu/~cle/e250a_f14/mroz-paper.pdf
Comment
Surya Singh

Join Date: Sep 2014

Posts: 54
#9

10 Feb 2015, 07:47

Hi Steve,

Thanks for the reference for the date of the interview, I must have missed it when looking for variables!

Best wishes,
Surya

Originally posted by Steve Samuels View Post

I agree with Andrew's excellent suggestions.

According to this page- https://psidonline.isr.umich.edu/Guide/FAQ.aspx?Type=9, -month and day of interview are in the data set. So, for every event for which you have dates (dates of interview). From that you can estimate the child's birth date to within a few days: multiply the fractional month-age by 12.4375 and subtract from the interview date. Now date of birth can be considered the start of breastfeeding.

In your post you refer to a variable "year_breastfeeding", but why a single year? Women can breastfeed the same child during more than one calendar year. You must know the ending year and month to unambiguously define the "after" for your analysis.
Comment

Announcement

Inconsistencies in age and year using PSID data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment