Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inconsistencies in age and year using PSID data

    I'm using PSID data focusing only on women. I have year that the woman is breastfeeding but they are inconsistent for some of the women because I don’t have the exact birth year of the child in the main PSID file. For example for this woman:
    ID year childage_months childage_yrs year_breastfeedingR
    579 1995
    579 1996
    579 1997 88.9 7.408333 1990
    579 1998
    579 1999
    579 2000
    579 2001
    579 2002 155.28 12.94 1989
    579 2003
    579 2004
    579 2005
    579 2006
    579 2007 219.12 18.26 1989
    579 2008
    579 2009
    579 2010
    I used year_breastfeedingR =round(( year – childage_yrs), 1) and I did childage_yrs = childage_months/12
    I’m not sure what to do here because if the child age in 1997 is 7.4, the child age in 2002 should be around 12.4 and then 17.4 in 2007 and so the year_breastfeedingR should be 1990 rather than 1989. Is there a way to fix this by using the child age in 1997 as my reference point (and then fill in the rest according to the year, help on how to do this would be appreciated as well!) but some women didn’t get interviewed in the 1997 child development supplement (which has the question about child age in years and breastfeeding) and have the child age for the first time in 2002 or 2007.
    Any help would be greatly appreciated! Thanks.

  • #2
    What is PSID? Stata is an international list, so such an acronym will not be familiar to many. I know it used to mean "Panel Study of Income Dynamics", but it appears to have a broader scope now.

    I'm confused about your questions, so I'll start out with basics.

    • Exactly what is this childage_months variable and how was it computed? Be exact. I notice it is is fractional months, which suggests it was recorded in finer units. What does it have to do with breast feeding? Go back to the original questionnaire tell us the questions on which it was based.


    • What is your goal: to get an accurate age at subsequent interviews, or something else?
    Last edited by Steve Samuels; 27 Jan 2015, 16:14.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Hi Surya, as Steve points out, you should check how the childage_months variable is computed. It is possible that you get different values for childage_yrs since the observations for childage_months are recorded at different points in separate years, e.g. January 1997, May 2002, December 2007, etc. However, since you choose to fix your base year at 1997, the following code should give you a variable that indicates childage_yrs at the same point in time the 97 survey took part for the years 2002 and 2007, and a common year variable for year_breastfeedingR corresponding to 1997. I label these variables childage_base_yr and year_breastfeedingR_base_yr



      Code:
      sort ID year
      by ID year: gen  childage_base_yr=  childage_yrs if year==1997
      by ID: replace  childage_base_yr =  childage_base_yr[_n-5] if missing( childage_base_yr)
      replace  childage_base_yr= childage_base_yr+ year-1997
      gen year_breastfeedingR_base_yr=  year_breastfeedingR if year==1997
      sort ID year
      by ID: replace   year_breastfeedingR_base_yr=  year_breastfeedingR_base_yr[_n-5] if missing(  year_breastfeedingR_base_yr)
      Last edited by Andrew Musau; 27 Jan 2015, 18:23.

      Comment


      • #4
        Hi Steve,

        Apologies for not making it clear, the PSID is the Panel Study of Income Dynamics used in the USA.
        My goal is to compute the variables (such as labor income, wages, family income, employment status) when the woman was breastfeeding and therefore, I need the year the woman is breastfeeding of which I'm computing using child age in months of the time of the interview. It seems to have been computed by the interview date and "preloaded birth date of child" according to the codebook. I want to compare woman's labor income, for example, before she was breastfeeding and after she was breastfeeding.

        I have actually figured out the original question that I posted but now trying to figure out how to construct a loop over observations that creates variables when the mother was breastfeeding such as laborincome_breastfeeding using year_breastfeeding. Is there a way to do this?

        Thank you for your help!

        Originally posted by Steve Samuels View Post
        What is PSID? Stata is an international list, so such an acronym will not be familiar to many. I know it used to mean "Panel Study of Income Dynamics", but it appears to have a broader scope now.

        I'm confused about your questions, so I'll start out with basics.

        • Exactly what is this childage_months variable and how was it computed? Be exact. I notice it is is fractional months, which suggests it was recorded in finer units. What does it have to do with breast feeding? Go back to the original questionnaire tell us the questions on which it was based.


        • What is your goal: to get an accurate age at subsequent interviews, or something else?
        Last edited by Surya Singh; 29 Jan 2015, 04:13.

        Comment


        • #5
          Hi Andrew,

          Thank you for the code, it did what I wanted it to do! You're right about why there are inconsistencies in the child age in the survey but hopefully using 1997 as the base year will be okay.
          I am trying to now to figure out how to construct a loop over observations that creates variables when the mother was breastfeeding such as laborincome_breastfeeding using year that the mother was breastfeeding since I want to compare woman's labor income, for example, before she was breastfeeding and after she was breastfeeding. Do you have any tips on this?

          Thank you for your help!

          Originally posted by Andrew Musau View Post
          Hi Surya, as Steve points out, you should check how the childage_months variable is computed. It is possible that you get different values for childage_yrs since the observations for childage_months are recorded at different points in separate years, e.g. January 1997, May 2002, December 2007, etc. However, since you choose to fix your base year at 1997, the following code should give you a variable that indicates childage_yrs at the same point in time the 97 survey took part for the years 2002 and 2007, and a common year variable for year_breastfeedingR corresponding to 1997. I label these variables childage_base_yr and year_breastfeedingR_base_yr



          Code:
          sort ID year
          by ID year: gen childage_base_yr= childage_yrs if year==1997
          by ID: replace childage_base_yr = childage_base_yr[_n-5] if missing( childage_base_yr)
          replace childage_base_yr= childage_base_yr+ year-1997
          gen year_breastfeedingR_base_yr= year_breastfeedingR if year==1997
          sort ID year
          by ID: replace year_breastfeedingR_base_yr= year_breastfeedingR_base_yr[_n-5] if missing( year_breastfeedingR_base_yr)

          Comment


          • #6
            Hello Surya again

            All you need to do is to have all your variables in one dataset and then have an indicator variable (dummy 0/1) that takes a value of 1 for years after a mother starts breast feeding and a value of 0 otherwise. This variable will distinguish between observations that relate to the pre period and the post period, and you can do various forms of comparisons and analysis.

            Therefore, if you have variables in different datasets, you will need to merge the datasets. The following command will guide you on how to do this

            Code:
            help merge
            Apart from this detail, as a referee, I would anticipate the following issues prior to undertaking the kind of analysis that you suggest:

            I want to compare woman's labor income, for example, before she was breastfeeding and after she was breastfeeding.
            It is very easy to confound effects are due to "breast feeding" with those due to "having a child". I presume that you are interested in the effects that are due to a mother's decision to breastfeed. If you are familiar with earlier empirical models of female labor supply in Labor Economics, the main goal was to estimate income and substitution effects. The reason why these studies used women and not men is that over the life cycle, there is not much variation in the income of men whereas there is a lot of variation in women's income. A primary factor that affects a woman's income is the decision to have a child (children), others being her level of education, her previous labor market experience, her age, her non-labor income, her husband's income, etc.

            Therefore, let us assume that we observe that for the average woman, her income is lower post breastfeeding compared to pre breast feeding. Should we attribute this to her decision to breastfeed? (1) It may be that she decides to work part time after having a child (assuming that she worked full time prior to having a child), so that she spends more time raising the kid. So one factor to control for is labor market status: employed full time/ employed part time/ unemployed. (2) Another possibility is that her husband decided to work extra and thus the wife needs not worry about her individual income because the family income has increased - thus we need to control for changes in aggregate family income, the age of the kid, and so on. Remember that all these reasons are due to having a child and not breast feeding per se.

            To escape the above confound, you may want to restrict your sample to women who have a child and distinguish between those who breast feed and those who use the bottle (formula feed), so that the issue is not having a kid. As before, you have to control for factors such as education, e.g., it may be the case that highly educated women are more likely to formula feed and these women on average have higher incomes. To achieve this distinction, however, you will need a variable that indicates whether a woman breast feeds versus formula feeds.

            You may be aware of the following paper by Mroz which uses the PSID - it is an excellent reference for those earlier studies.

            Thomas A. Mroz, Econometrica Vol. 55 No.4, July 1987 pp.765-799)
            http://eml.berkeley.edu/~cle/e250a_f14/mroz-paper.pdf




            Last edited by Andrew Musau; 30 Jan 2015, 10:15.

            Comment


            • #7
              I agree with Andrew's excellent suggestions.

              According to this page- https://psidonline.isr.umich.edu/Guide/FAQ.aspx?Type=9, -month and day of interview are in the data set. So, for every event for which you have dates (dates of interview). From that you can estimate the child's birth date to within a few days: multiply the fractional month-age by 12.4375 and subtract from the interview date. Now date of birth can be considered the start of breastfeeding.

              In your post you refer to a variable "year_breastfeeding", but why a single year? Women can breastfeed the same child during more than one calendar year. You must know the ending year and month to unambiguously define the "after" for your analysis.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Hi Andrew,

                Thank you for your helpful suggestions! I have created the indicator variable. The paper is also very informative and I will also take that into account for my study.

                Best wishes,
                Surya

                Originally posted by Andrew Musau View Post
                Hello Surya again

                All you need to do is to have all your variables in one dataset and then have an indicator variable (dummy 0/1) that takes a value of 1 for years after a mother starts breast feeding and a value of 0 otherwise. This variable will distinguish between observations that relate to the pre period and the post period, and you can do various forms of comparisons and analysis.

                Therefore, if you have variables in different datasets, you will need to merge the datasets. The following command will guide you on how to do this

                Code:
                help merge
                Apart from this detail, as a referee, I would anticipate the following issues prior to undertaking the kind of analysis that you suggest:



                It is very easy to confound effects are due to "breast feeding" with those due to "having a child". I presume that you are interested in the effects that are due to a mother's decision to breastfeed. If you are familiar with earlier empirical models of female labor supply in Labor Economics, the main goal was to estimate income and substitution effects. The reason why these studies used women and not men is that over the life cycle, there is not much variation in the income of men whereas there is a lot of variation in women's income. A primary factor that affects a woman's income is the decision to have a child (children), others being her level of education, her previous labor market experience, her age, her non-labor income, her husband's income, etc.

                Therefore, let us assume that we observe that for the average woman, her income is lower post breastfeeding compared to pre breast feeding. Should we attribute this to her decision to breastfeed? (1) It may be that she decides to work part time after having a child (assuming that she worked full time prior to having a child), so that she spends more time raising the kid. So one factor to control for is labor market status: employed full time/ employed part time/ unemployed. (2) Another possibility is that her husband decided to work extra and thus the wife needs not worry about her individual income because the family income has increased - thus we need to control for changes in aggregate family income, the age of the kid, and so on. Remember that all these reasons are due to having a child and not breast feeding per se.

                To escape the above confound, you may want to restrict your sample to women who have a child and distinguish between those who breast feed and those who use the bottle (formula feed), so that the issue is not having a kid. As before, you have to control for factors such as education, e.g., it may be the case that highly educated women are more likely to formula feed and these women on average have higher incomes. To achieve this distinction, however, you will need a variable that indicates whether a woman breast feeds versus formula feeds.

                You may be aware of the following paper by Mroz which uses the PSID - it is an excellent reference for those earlier studies.

                Thomas A. Mroz, Econometrica Vol. 55 No.4, July 1987 pp.765-799)
                http://eml.berkeley.edu/~cle/e250a_f14/mroz-paper.pdf



                Comment


                • #9
                  Hi Steve,

                  Thanks for the reference for the date of the interview, I must have missed it when looking for variables!

                  Best wishes,
                  Surya

                  Originally posted by Steve Samuels View Post
                  I agree with Andrew's excellent suggestions.

                  According to this page- https://psidonline.isr.umich.edu/Guide/FAQ.aspx?Type=9, -month and day of interview are in the data set. So, for every event for which you have dates (dates of interview). From that you can estimate the child's birth date to within a few days: multiply the fractional month-age by 12.4375 and subtract from the interview date. Now date of birth can be considered the start of breastfeeding.

                  In your post you refer to a variable "year_breastfeeding", but why a single year? Women can breastfeed the same child during more than one calendar year. You must know the ending year and month to unambiguously define the "after" for your analysis.

                  Comment

                  Working...
                  X