Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping dates as string variables

    I read a post http://www.stata.com/statalist/archi.../msg00541.html that keeping date variables as strings can be problematic.

    Could someone explain to me a bit more specifically why that could be the case?
    Says in the past post that I have to worry about precision and potential hiatuses?
    But I couldn't really understand what that meant..

    The date variables I use are
    year and yearmonth.
    Both are string variables.
    year takes on 2009, 2010, 2011, 2012....and on
    yearmonth takes on 200901, 200902, 200903....and on.

    I have not encountered any problems in running my data as of yet but wanted to prevent problems from occurring in the future.

    Thanks!

  • #2
    How you are going to work with such strings? is one answer. For example, "200912" was succeeded by "201001". You have no problem in seeing that they are one month apart, but Stata needs to be told that.

    More generally, you often need to do calculations with dates. 2012 - 2009 is easy but "2012" - "2009" is not, as string manipulation any way.

    Comment


    • #3
      One of the most vexing problems with having date variables stored as strings is the sort order. Now, in your case, you have coded them as 200901, 200902, etc. so you don't have this problem. But often when people have dates stored as strings they look like Jan 2009, Feb 2009, Mar 2009, etc., and you can see that if you sort those in alphabetical order they will not be in chronological order. Numerical date variables necessarily sort properly.

      Now, here's a problem you do have. Suppose at some point you need to calculate an interval between dates, say 200910 vs 200806. If you just subtract, you will get the wrong answer. That's because when you go from 200812 to 200901 you jump by 89 instead of by 1.

      Also if you need to use date as a variable in an analysis, such as a regression, you can't do that with strings.

      And if you want to set your data up as panel data using the -xt- commands, you can't -xtset- a string variable.

      Given what you have, it's very simple to convert yearmonth to a Stata numeric monthly variable:

      Code:
      gen numeric_yearmonth = mofd(mdy(real(substr(yearmonth, 5, 2)), 1, real(substr(yearmonth, 1, 4))))
      format numeric_yearmonth %tm
      With a numeric variable having a %tm format you get the best of both worlds: you can do all date calculations correctly and you can also look at the values with your own eyes and know what they mean.





      Comment


      • #4
        Hmm I see.
        Since now I am doing only yearly groupings, the only calculations that I needed to do were creating averages by the year terms.
        For instance, in creating two years' worth of average returns, I used the code

        Code:
        forval yr=1975(2)2013 {
            local yr_2= `yr'+1
            bys LPERMNO: egen quintile_real`yr'=min(quintile) if inrange(year, "`yr'", "`yr_2'")
            bys DataD quintile_real`yr':egen average_twoyrs`yr'=mean(MonthlyTot) if inrange(year, "`yr'", "`yr_2'")
        The last line of the code generates a variable called average_twoyrs`yr' which is an average of all the monthly total returns over the span of two years, say from 1975 to 1976.

        If I do use the date variables, would it be possible for me to denote that 1yr will range from say, April 2009 to April 2010 instead of from January09 to January10?

        Comment


        • #5
          By the way, as for issues of precision, that only arises with date-time variables, not with pure dates. For date-time variables, Stata represents them by the number of milliseconds elapsed since midnight January 1, 1960. Those numbers get large, and in order to have sufficient precision they need to be generated and stored as double. But for just dates, the default float type has enough precision--these are represented as the number of days since Jan 1, 1960 and the numbers are not all that huge.

          Comment


          • #6
            Nick and Clyde, thank you so much for the input.

            Clyde, I was trying to solve that problem.

            In constructing yearly portfolios, I wanted move around the portfolio formation date.

            Currently, the formation date is in January of each year. So I had no problem constructing 1yr, 2yr , and 3yr portfolios.
            But if I move the formation date to, say, April of each year, then there was a problem because I had some trouble setting up the range for those years.

            Thank you so much!

            Comment


            • #7
              So to go in two-year blocks starting in April it would be something like this:

              Code:
              forval yr=1975(2)2013 {
                  local start_month = mofd(mdy(4, 1, `yr'))
                  local end_month = mofd(mdy, 3, 1, `yr'+2)) // OR = `start_month' + 23
                  bys LPERMNO: egen quintile_real`yr'=min(quintile) if inrange(numeric_yearmonth, `start_month', `end_month'
                  bys DataD quintile_real`yr':egen average_twoyrs`yr'=mean(MonthlyTot) if inrange(numeric_yearmonth,`start_month', `end_month')
              }

              Comment


              • #8
                On the specific problem of numeric monthly dates from string monthly dates: the monthly() function exists to do this, although when last I looked it needed coaxing with run-together strings. ym() works fine with two real arguments. In Jun Yoon's case with a variable, see the last example below for another way to do it.

                Code:
                 
                . di monthly("200901", "YM")
                .
                
                . di monthly("2009 01", "YM")
                588
                
                . di monthly(substr("200901", 1, 4) + substr("200901", 5, 2),  "YM")
                .
                
                . di monthly(substr("200901", 1, 4) + "  " + substr("200901", 5, 2),  "YM")
                588
                
                . di ym(real(substr("200901", 1, 4)),  real(substr("200901", 5, 2)))
                588
                
                gen mdate = monthly(substr(yearmonth,1,4) + " " + substr(yearmonth, 5, 2), "YM")

                Comment

                Working...
                X