Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • STATA Calculating dates

    Hello,

    I am having problems with the following
    a) Converting my data from a long dataset to a wide. I understand the command function that I need, i.e. reshape wide varlist, i( ) and j( ). What I really need next is to determine my j variable.
    b) In order to get my j variable - suppose I have two dates: 1. a regular date#1 (e.g. November 20, 2010) and 2. I want to determine the months after that date given another regular date#2 (e.g. March 15, 2011). How would I go about doing that?

    Also, if my second set of regular dates#2 contains dates that come before regular date#1 how do I specify that as well?

    The end result is I basically want my data to go from long to wide and for the variable headings to contain measurements for each month after the regular date#1.

    Hope that makes sense, thanks.

  • #2
    If I understand you right (I'm not sure that I do):

    You have a dataset with a date variable. But in some observations this variable represents a first date, in other observations it represents a second date. You want to reshape to a wide format in order to calculate the time intervals between corresponding first and second dates.

    You ask where to find your j() variable. Well, if you don't have it, it means that you cannot identify in your primary dataset whether an observation includes information about a first or a second event. In that case you are in trouble. Hopefully I got it wrong.

    Comment


    • #3
      - and please note the strong preference on this list that participants step forward with their real first and last names. Hit "Contact Us" in the lower right corner of the screen and re-register with your real name.

      Comment


      • #4
        Sorry, it was late last night. So to be more clear, here is my situation.

        Right now I possess 2 dates (call them date1 and date2) both of which are in HRF (i.e. 9/23/2010 for example). What I want to do is I want to figure out the elapsed time between these two points. The date2 variable contains some dates that come before date1, but I want to figure out the elapsed time between the dates in the date2 variable which come after the dates in the date1 variable.

        The hope is that once I have established what should be a new generated variable, that I can then use it to switch from a long to a wide dataset.

        The end result should look something like this going from left to right:

        ID1 Measurement taken Month (e.g. January) Measurement taken Month (e.g. March) Measurement taken Month (e.g. June)
        ID2 Measurement taken...
        ID3 ...
        ID4 ...

        Comment


        • #5
          [IGNORE, BUT FEEL FREE TO ADAPT ANY USEFUL CODE. THE FACT THAT YOUR DATA IS LONG HAS ME CONFUSED, SO MY RESPONSE MAY BE LARGELY IRRELEVANT]

          See blow and see if it gets you partway there. Your talk of reshaping confuses me, but this should got you differences in months.

          Code:
          *======convert to days since January 1, 1960
          gen date_date1=date(date1, "MDY")
          gen date_date2=date(date2, "MDY")
          
          *=====difference in months
          gen month1=ym(year(date_date1),month(date_date1))
          gen month2=ym(year(date_date2),month(date_date2))
          
          gen monthdiff=month2-month1
          replace monthdiff=. if monthdiff<0
          Last edited by ben earnhart; 29 Nov 2014, 20:52.

          Comment


          • #6
            mad_titan: Post #4 is a start: at least we know what you want to end up with. But it's still very unclear from all of your posts what you are starting out with. Please provide an example of that. Ben Earnhart's code works find if your data are already in wide format and there are exactly 2 dates per observation. But it's by no means clear that's what you've got. You really need to show us what the data you're working with look like. Why don't you -list- some representative lines, enough for us to see the lay of the land, and paste the output into a code block. (To create a code block click on the underlined A button, and then on the # button. Two code-block delimiters will appear. Paste the output between them.)

            Comment


            • #7
              Let me try restating my problem in a different way. I will tell you what I have and what I want and perhaps you could inform me of what I should do.

              My current data is in the long format (i.e. multiple IDs as shown above). For each ID I have multiple measurements recorded at different dates (A). As well, for each ID there is another date (B) associated with them that remains constant throughout.

              Now, what I want is I want to be able to determine which IDs satisfy some particular value (doesn't matter what) from the period between date A and B (excluding any A dates that come before B dates - because there are some).

              How would I do that? I am assuming that somewhere in the process I need to convert from a long to a wide dataset.
              Last edited by CEdward; 30 Nov 2014, 00:06.

              Comment


              • #8
                I don't think you would benefit from reshaping to wide format. In the following example I calculate the number of days (positive or negative) between date2 and date1. I guess it is not too difficult to proceed from there.

                Code:
                clear
                input id date1 date2
                101 14000 13912
                101 14101 .
                101 13807 .
                102 13009 12900
                102 12817 .
                end
                
                bysort id: egen date2a=max(date2)
                format date* %td
                gen days= date1-date2a
                
                . list, clean
                        id       date1       date2      date2a   days 
                  1.   101   01may1998   02feb1998   02feb1998     88 
                  2.   101   10aug1998           .   02feb1998    189 
                  3.   101   20oct1997           .   02feb1998   -105 
                  4.   102   14aug1995   27apr1995   27apr1995    109 
                  5.   102   03feb1995           .   27apr1995    -83

                Comment


                • #9
                  Originally posted by ben earnhart View Post
                  [IGNORE, BUT FEEL FREE TO ADAPT ANY USEFUL CODE. THE FACT THAT YOUR DATA IS LONG HAS ME CONFUSED, SO MY RESPONSE MAY BE LARGELY IRRELEVANT]

                  See blow and see if it gets you partway there. Your talk of reshaping confuses me, but this should got you differences in months.

                  Code:
                  *======convert to days since January 1, 1960
                  gen date_date1=date(date1, "MDY")
                  gen date_date2=date(date2, "MDY")
                  
                  *=====difference in months
                  gen month1=ym(year(date_date1),month(date_date1))
                  gen month2=ym(year(date_date2),month(date_date2))
                  
                  gen monthdiff=month2-month1
                  replace monthdiff=. if monthdiff<0

                  I am gonna try something similar to this and see if it gets me what I want. But, just out of curiosity why does it say "days since January 1, 1960? Suppose you have March 27, 2014 and August 3, 2014 and you want to find the difference in days between them. What does 1960 have to do with it?

                  Comment


                  • #10
                    Every date system needs a zero point. Choice of that date is kinda arbitrary, but for most purposes, you'd waste a lot of digits counting miliseconds since CE 1, day 1. They picked 1/1/1960.

                    Comment


                    • #11
                      Ben is correct in underlining the origin of Stata's date system.

                      889 is correct if his or her implication is that the difference between dates does not depend on the origin of the date system.

                      889: As underlined by Svend in #3 we strongly prefer full real names here. Please use the "Contact Us" button at the bottom right to contact the system administrators to effect a change of identifier.

                      Comment

                      Working...
                      X