Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with dates - assigning day and month to year of birth

    Hi All

    I have a variable birthyear that specifies the year of birth for all subjects in my dataset (any one year between 1981 and 2006). I would like to add the same day and month to the year of birth for all subjects (1st Oct). I need to do this so that I have full date variable (i.e. day, month and year) to be able to calculate for example age etc. My renaming date variables such as date of diagnosis or date of visit have day, month and year.

    Currently, birthyear is storage type 'int' with a display format '%8.0g".

    Code:
    tab birthyear in 1/8
    1981
    1983
    1987
    1981
    1994
    2006
    2001
    1996


    What I would like to create is:

    Code:
    tab birthyear_new in 1/5
    1oct1981
    1oct1983
    1oct1987
    1oct1981
    1oct1994

    I started with generating two variables to specify day and month:

    Code:
    gen bd=01
    tostring bd, gen(bday)
    
    gen bm=10
    tostring bm, gen(bmon)
    I now have have to add the above bday and bmon to the birthyear variable. I'm not sure how to do this. Any advice most welcome.

    Thanks

    /Amal

  • #2
    Try this:

    Code:
    clear*
    input int birthyear
    1981
    1983
    1987
    1981
    1994
    2006
    2001
    1996
    end
    
    gen long birth_date = mdy(10, 1, birthyear)
    format birth_date %td
    
    list, noobs clean

    Comment


    • #3
      Clyde has solved the problem directly.

      I'll add a couple of comments on the other ideas. Note that

      Code:
      gen bd = "01"
      gen bm = "10"
      would be the direct way to get string variables with those values (and the leading character "0" would be hard to get otherwise if you really needed it). There is no need to invoke tostring at all.

      However, such string variables would not help us much here.

      The year variable is numeric and the daily date variable we want is numeric, which means generating string variables is going in the wrong direction. You could reverse that with

      Code:
      gen long birth_date = mdy(real(bm), real(bd), birthyear)
      but as Clyde has shown there is a simpler way.

      Comment


      • #4
        Thanks Clyde and Nick - always great to learn of more than one way to deal with such issues. Following from the above - I tried to play around with another date variable - year of diagnosis (diagyear) which indicates the year of diagnosis without the day or month. I do not want to assign a random day or month to this variable as I did above. My idea is to calculate age at diagnosis by subtracting diagyear from birthdate: gen agediag = ((diagyear-birthdate)/365.25) if birthdate!=. However, diagyear needs to be formatted correctly first (storage type - int & display - %8) and I'm struggling with this. gen diagyear2 = real(diagyear) gen diagyear3 = date(diagyear2, "Y") format diagyear3 %d tab diagyear3 in 1/5 01jan1991 01jan2005 01jan2006 01jan2007 01jan1998 The above assigns 1 jan to all years of diagnosis, which is probably ok (not perfect) in calculating my diagage variable. However, is there a better way to do this? Just format the variable correctly to have only the year of diagnosis from which one can subtract the birthdate (though it perhaps better to have the day and month included for the subtraction?) Thanks /Amal (APOLOGIES - I'm unable to wrap/indicate stata codes or preview messages before posting for some reason).

        Comment


        • #5
          Ok - trying again! Seems like I had a technical problem before:

          Thanks Clyde and Nick - always great to learn of more than one way to deal with such issues.

          Following from the above - I tried to play around with another date variable - year of diagnosis (diagyear) which indicates the year of diagnosis without the day or month. I do not want to assign a random day or month to this variable as I did above. My idea is to calculate age at diagnosis by subtracting diagyear from birthdate:

          Code:
          gen agediag = ((diagyear-birthdate)/365.25) if birthdate!=.
          However, diagyear needs to be formatted correctly first (storage type - int & display - %8) and I'm struggling with this.

          Code:
          gen diagyear2 = real(diagyear)
          gen diagyear3 = date(diagyear2, "Y")
          format diagyear3 %d
           
          tab diagyear3 in 1/5
          01jan1991
          01jan2005
          01jan2006
          01jan2007
          01jan1998

          The above assigns 1 jan to all years of diagnosis, which is probably ok (not perfect) in calculating my diagage variable. However, is there a better way to do this? Just format the variable correctly to have only the year of diagnosis from which one can subtract the birthdate (though perhaps it's better to have the day and month included for the subtraction if Stata requires this?)

          Thanks

          /Amal

          Comment


          • #6
            The only obvious reason for your thinking that you cannot show code properly is that you haven't selected the Advanced Editor. But even if you don't select that you should still be able to type CODE tags.

            Either way, here is your post re-formatted before I reply. In my experience breaking text into digestible short paragraphs helps greatly with this kind of question.

            % begin edit

            I tried to play around with another date variable - year of diagnosis (diagyear) which indicates the year of diagnosis without the day or month.

            I do not want to assign a random day or month to this variable as I did above.

            My idea is to calculate age at diagnosis by subtracting diagyear from birthdate:

            Code:
            gen agediag = ((diagyear-birthdate)/365.25) if birthdate!=.
            However, diagyear needs to be formatted correctly first (storage type - int & display - %8) and I'm struggling with this.

            Code:
             
            gen diagyear2 = real(diagyear)
            gen diagyear3 = date(diagyear2, "Y")
            format diagyear3 %d
            tab diagyear3 in 1/5
            01jan1991
            01jan2005
            01jan2006
            01jan2007
            01jan1998
            The above assigns 1 jan to all years of diagnosis, which is probably ok (not perfect) in calculating my diagage [??? typo for agediag] variable. However, is there a better way to do this? Just format the variable correctly to have only the year of diagnosis from which one can subtract the birthdate (though it perhaps better to have the day and month included for the subtraction?)

            % end edit

            You don't show examples here but various comments are possible.

            1. If indeed diagyear is int, the conversion from string to real using real() is both impossible and unnecessary.

            2. But as date() worked diagyear must have been string all along, or converted to string somehow. (Changing the type of a variable is not at all the same as changing its display format.)

            3. %8 is not a legal display format. However, the display format of diagyear is irrelevant as such. Display formats control what is shown, and do not affect what is stored.

            4. The conversion from years to daily dates does indeed produce daily dates that are always 1 January. That's an arbitrary choice by Stata, but one supported by its convention on date origin.

            I don't understand your closing questions, if only because you seem to be confusing a change of display formats with other changes of data. (I know many people use terms like "reformatting" in an all-purpose way to refer to some large fraction of data management, but it helps no-one else to be so vague. In a Stata forum only terms used by Stata in the way that Stata uses them should be considered transparent.)

            More general advice which may apply here:

            If all you know is a calendar year for both variables you want to compare, you should work in years.

            If you know a daily date for one variable and just a calendar year in another the most obvious consistent choices are 1 January and 1 July. If you are imagining yet other possibilities, we need to see them.

            Showing examples of the data you have and the data you want would probably have shortened this reply. If your real dataset is too big or too sensitive to post, realistic invented examples are fine by us too.

            Comment


            • #7
              Thanks Nick - it's not that, I was able to re-post my previous message properly with codes once I logged onto Statalist using google chrome (my earlier problem occurs if I use firefox which I will now avoid).

              I made a mistake above while typing out the code. I did convert diagyear to string in the beginning:

              Code:
              tostring diagnosis_year, gen(diagyear)
              gen diagyear2=date(diagyear, "Y")
              format diagyear2 %d
              Code:
              tab diagyear2 in 1/5
              01jan1991 01jan2005 01jan2006 01jan2007 01jan1998 I have two questions:

              1. What could I do so that the above diagyear2 variable instead looks like:

              1991
              2005
              2006
              2007
              1998

              (i.e. no 01jan)

              2. I agree with what your wrote earlier - it is better to use variables as just calender years if that's what's available from the beginning. The reason I chose to assign a day and month to the birthdate variable is because I have several patients (500+) who are born and diagnosed in the same year which some what complicates matters.

              How would I change the 01 Jan in the above diagyear2 variable to say 01 Dec? (I plan to do this for those patients born and diagnosed in the same year but have a diagnosis date that occurs before the birthdate within the same calendar year). The suffix to the code would be:

              Code:
              if diagyear2<birthyear & birthyear!=.
              Thanks

              /Amal

              Comment


              • #8
                If the forum software doesn't work with Firefox, it is surprising that that doesn't seem well publicised, as it's a popular browser.

                Dates that are 1 December could be produced in various ways, e.g.

                Code:
                gen wanted1 = mdy(10, 1, diagnosis_year)
                
                gen wanted2 = daily("1 Dec" + diagyear, "DMY")
                You can format your daily date variable to show only year with %tdCY

                Code:
                . di %tdCY mdy(11,9,2015)
                2015
                This follows from information on standard date functions documented in help dates

                Comment


                • #9
                  I think that for December 1 dates, Nick means

                  Code:
                   
                   gen wanted1 = mdy(12, 1, diagnosis_year)
                  The month of December was at one time the tenth month of the Roman calendar, and took its name from the Latin word for ten. However, two additional months were created subsequently, namely January and February, and placed at the beginning of the calendar, breaking the relationship between the numerical month and the names of the month at the end of the calendar.

                  Comment


                  • #10
                    Dixit Clyde. Mea culpa, mea decima culpa. Eheu!

                    Code:
                    * sorry, my mistake
                    gen wanted1 = mdy(12, 1, diagnosis_year)

                    Comment

                    Working...
                    X