Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace missing day/month with a "generic" date (i.e. 01jan) for a string date variable

    Hello,
    I am working with a string variable called "YearofBirth" that includes includes 443 observations of participant birth year. 423 observations have ONLY birth year, but 20 observations have birth day, month and year. I'm wondering if there is a way to generate a new numeric variable for complete birth date (day, month, year) that gives a "generic" day/month (01jan) for the 423 observations that only have the birth year and are missing their actual birth day/month. I am using STATA version 16.

    Please let me know if I need to be more clear. I'm relatively new to Stata so I may have some terminology wrong. Thanks so much for your help!

    Here is what my data looks like:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 YearofBirth
    "1988"     
    "31dec1988"
    "1987"     
    "1997"     
    "1998"     
    "1989"     
    "1987"     
    "1985"     
    "1992"     
    "1985"     
    "23jul1998"
    "23jul1998"
    "1982"     
    "06oct1999"
    "1987"     
    "16nov1999"
    "1991"     
    "1999"     
    "29jan2001"
    "1985"     
    "04apr1996"
    "10apr1997"
    "05oct1995"
    "1993"     
    "30nov1998"
    "2001"     
    "01jul1983"
    "02feb1982"
    "01oct1985"
    "01oct1986"
    "1998"     
    "1982"     
    "1999"     
    "1996"     
    "1994"     
    "17apr1984"
    "05jun1981"
    "1996"     
    "1998"     
    "1993"     
    "1992"     
    "10feb1999"
    "1983"     
    "1988"     
    "1986"     
    "1987"     
    "1988"     
    "1987"     
    "1991"     
    "1982"     
    "1998"     
    "1981"     
    "1995"     
    "1980"     
    "1994"     
    "1982"     
    "1994"     
    "1989"     
    "25feb1979"
    "1996"     
    "1998"     
    "1979"     
    "1991"     
    "1989"     
    "1982"     
    "1994"     
    "1995"     
    "1980"     
    "1994"     
    "1997"     
    "1984"     
    "2000"     
    "1990"     
    "1986"     
    "2000"     
    "1981"     
    "1998"     
    "1996"     
    "1997"     
    "1992"     
    "1992"     
    "2000"     
    "1986"     
    "1992"     
    "1985"     
    "1986"     
    "1995"     
    "1994"     
    "1993"     
    "1998"     
    "1993"     
    "1990"     
    "1989"     
    "1998"     
    "1992"     
    "1985"     
    "1987"     
    "1981"     
    "1982"     
    "1991"     
    end

  • #2
    My advice would be to not construct a meaningless daily date when only 40 of your observations have an accurate value. Instead, I would create a yearly date and use that for analysis. The fact that 20 observations will be less accurate than the might be isn't important, because the alternative is to make the other 423 observations appear much more accurate than they are.
    Code:
    . generate BirthYear = real(substr(YearofBirth,-4,.))
    
    . list in 11/20, clean abbreviate(16)
    
           YearofBirth   BirthYear  
     11.     23jul1998        1998  
     12.     23jul1998        1998  
     13.          1982        1982  
     14.     06oct1999        1999  
     15.          1987        1987  
     16.     16nov1999        1999  
     17.          1991        1991  
     18.          1999        1999  
     19.     29jan2001        2001  
     20.          1985        1985
    And now your date variable is numeric rather than a string, an important condition for use in analysis.

    Let me offer some advice about dealing with dates and times in Stata, since you are new to Stata.

    Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

    All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

    Comment


    • #3
      Welcome to Statalist. And thank you for using -dataex- on your very first post.

      Code:
      gen quasi_dob = daily(YearofBirth, "DMY")
      replace quasi_dob = mdy(1, 1, real(YearofBirth)) if missing(quasi_dob) & !missing(real(YearofBirth))
      format quasi_dob % td
      should do what you request.

      FWIW, if this were my data, I'd be inclined to pick 30 Jun or 1 Jul (middle of the year) as the "generic day/month." That way calculations involving the date of birth will at least be correct on average. If you like that idea, you just have to change the -mdy()- arguments in the code.

      Added: Crossed with #2.

      Comment


      • #4
        Thank you both for your timely and helpful advice!

        Comment

        Working...
        X