Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -numdate- available on SSC: program to generate numeric date variables

    Thanks to Kit Baum as usual, a new package numdate is available from
    SSC. Use

    Code:
    ssc inst numdate 
    to install.

    Stata version 12 is required (but see below for a note for any people on
    version 10 or 11 who may be interested).

    numdate is for generating numeric date-time variables. Even if you are
    already fluent with functions for Stata dates and times, it offers some
    convenient features. If you are a new Stata user, or an old Stata user
    who only uses Stata dates and times occasionally, it may offer a simpler
    introduction to them. For whatever reasons, Stata dates and times are
    often found puzzling by users, as witness many questions in this forum.
    (The puzzlement does diminish if you follow the old-fashioned advice of
    reading the documentation slowly and carefully.)

    Use of numdate does requires some minimal understanding of how Stata
    holds dates and times. Here's the essence if you need it; otherwise skip
    the next paragraph.

    With the exception of calendar years, Stata records dates and date-times
    with origin 0 as the start of 1960. For example, for daily dates 0 is 1
    January 1960 and 42 is 12 February 1960; for monthly dates 0 is January
    1960 and 42 is July 1963; for quarterly dates 0 is the first quarter of
    1960 and 42 is the third quarter of 1970. Using numeric variables to
    hold dates makes it very easy to sort observations in date order and to
    calculate differences between dates. Using what is admittedly an
    arbitrary convention is not a real problem for tables or graphs or other
    output, as dates can and should be assigned display formats that make
    sense. But given Stata's convention of origin at the start of 1960, it
    is often necessary to map variables containing date or time data in
    other form to Stata's numeric date-time variables.

    The syntax of numdate is

    numdate datetimetype ndatevar = varlist [if] [in],
    pattern(pattern) [ format(format) dryrun topyear(topyear) ]

    Examples might be

    Code:
    numdate daily mydate = strdate, pattern(MDY) 
      
    numdate q mydate = year quarter, pattern(YQ) 
      
    numdate tc mydate = datetime, pattern(YMD hms)
    First, you have to specify which datetimetype you require; that must
    be one of

    clock
    Clock
    daily
    date (a synonym for daily)
    weekly
    monthly
    quarterly
    halfyearly or
    yearly

    (where any abbreviation is allowed, down to single letters)

    or alternatively one of

    tc
    tC
    td
    tw
    tm
    tq
    th or
    ty.

    Then you specify the name of a new variable to hold the date-time
    information in Stata form, followed by an equals sign, followed by a
    varlist, one or more existing variables that contain the date-time
    information in some form.

    numdate ties together in one command the task of creating a Stata
    date-time variable from a single variable and the task of creating one
    such from several variables. Also, as specified to numdate, the varlist
    may be string or numeric. For example, it could be a single string
    variable containing values indicating daily dates like "2015Mar28" or
    quarterly dates like "2015Q2". Or, it could be a single numeric variable
    containing integers such as 20150328 which are to be parsed (in this
    case) as daily dates. Or, it could be two or more variables, say three
    variables indicating day, month, and year for daily dates.

    You must specify a pattern indicating the order of date-time information
    to the pattern() option. numdate can't figure out otherwise whether
    20150706 is 6 July or 7 June 2015 or indeed something quite different:
    if you have dates like that, you must spell out whether you have YMD or
    YDM (or ...). (Nor can numdate help you directly if your dates have
    inconsistent patterns in different observations!)

    The syntax of pattern() will be familiar to users of functions like
    clock() or daily() or monthly() or quarterly(), because it is precisely
    that syntax. But it is extended so that it applies to numeric as well as
    string variables, and to multiple variables specifying date-time
    components as well as to single variables. If you have three variables
    indicating day, month and year, then your pattern is DMY, just as if you
    had a string variable with values like "28032015" or a numeric variable
    with values like 28032015.

    Other features of numdate are:

    * By default it applies a minimal appropriate display format. Hitherto
    you needed to follow the generation of (say) a daily date with a call to
    the format command. The default format at least gets you started: daily
    date values shown like 28mar2015 mean something, whereas such values
    shown like 20175 mean little, unless you have exceptional powers of
    mental arithmetic. A format() option lets you go beyond the default.

    * It automatically generates tc (clock) and tC (Clock) variables as
    double, as is necessary to maintain precision. Since version 10, the
    help files have shown stern and repeated warnings to specify double when
    generating date-times, warnings that many users have not read or not
    understood or not believed or not remembered, and in any case have just
    ignored. numdate is smart on your behalf despite your best inclinations
    to be dumb. Given the command-subcommand syntax it knows precisely when
    you want a clock or Clock variable and so specifies double in either
    instance.

    * It also allows users a dry run. The option dryrun indicates that
    results of the conversion should be shown without generating a new
    variable. Results are listed to show at most no more than 5 non-missing
    values of the implied date variable, and no more than 20 missing values,
    depending on which condition is satisfied first. This dry run should
    allow the user to check assumptions about the structure of values of
    varlist and/or to see the results of a particular format, whether
    default or specified.

    (So what about users of version 10 or 11? The code works fine in those
    versions. The problem is that the help files are organised differently
    from those since version 12. Further, some SMCL directives that work in
    Stata 12 up will not work in 10 or 11. Anyone who downloaded the files
    from SSC, edited the version statement in the ado file and edited the
    help files would get a serviceable variant on numdate if they did that
    correctly, but that's your responsibility.)
    Last edited by Nick Cox; 29 Aug 2015, 16:39.

  • #2
    I've fixed a small bug that seemingly bit me first. Thanks to Kit Baum, the fixed version is now up on SSC. Anybody who installed it should want to update:

    Code:
    ssc install numdate, replace

    Comment


    • #3
      This is really useful! (and I got the syntax right from the first try, which is rare)

      In light of this thread (http://www.statalist.org/forums/foru...able-to-a-time), perhaps one additional feature could be a builtin "," to "." conversion. This would be especially useful to French-cultured countries, which often use the comma as separator. That said, I don't know if it would cause issues elsewhere, in which case I'd argue it's not worth it.

      Comment


      • #4
        Thanks.

        That sounds like a good idea. I can't think of any downsides. I will put it on the to do list.

        Comment


        • #5
          Thanks as always to Kit Baum, I can now report a substantial update to the numdate package on SSC. Download with

          Code:
          ssc inst numdate 
          or use adoupdate if desired.

          The aim of the package is convenience. Each command is designed to bundle together various function calls, without being overwhelmingly complicated.

          There are now three commands:

          1. the original numdate, which has been extended, which is to create numeric date variables from raw materials in one or more other variables

          2. a new convdate, which converts one date-time variable to another of different kind

          3. a new extrdate, which extracts components from date-time variables. A definition of a component is that it is just part of the information. Today is Monday as I write, but that doesn't pin the date down uniquely. Day of the week is one component.

          Stata's date and time handling is based on functions. There are lots of them and people often get confused about which functions to use for handling dates, although some of that confusion is a matter of not reading the documentation carefully enough (or at all...). Similarly date-time display formats are often misunderstood.

          This package does what I can to make things easier, in so far as the package provides

          * Automated assignment of display format. You say you want a daily date variable; then %td is assigned by default. You say you want a quarterly date variable; then %tq is assigned by default as display format. Naturally, there is a format() option for you to specify something more detailed.

          * Automated assignment of variable labels. The default variable label documents what a new variable is and what it's based on. Similarly, there is a varlabel() option for other preferences.

          * A dryrun option. You can always try out a calculation to check out that you have the right idea.

          * Date-times are automatically created as doubles. Although it is clearly and repeatedly documented that you need doubles to hold date-times, this is often missed. numdate and convdate do what functions cannot do, create one when it is needed.

          The package also has some detailed features that might help.

          * As I write, functions like monthly() and quarterly() aren't smart about runtogether dates:

          Code:
          . di monthly("201709", "YM")
          .
          
          . di quarterly("20173", "YQ")
          .
          As of this update, numdate now tries harder on your behalf, by splitting such strings so that the elements (e.g. year and month) are separated on the fly.

          * convdate has an option to generate the last possible date in converting from coarse dates to fine dates. For example, by default Stata functions give you (e.g.) the first day (fine) of a month (coarse) or the first month (fine) of a quarter (coarse):

          Code:
          . di %td dofm(ym(2017, 9))
          01sep2017
          When generating a variable convdate with the last option would yield values that look like 30sep2017

          * convdate works out for you whether you need to call a single function or to nest function calls. Thus quarterly to monthly sounds simple but you need qofd(dofm()) with existing functions.

          Let's mention some downsides of this package:


          * It is not comprehensive, and doesn't include everything you might need to do with dates. It won't do scalar calculations or displays.

          * So, more commands and more help files to read. That's great. numdate announced in 2015 did attract a few users, but not many, so far as I can tell. Perhaps there is no better solution for those determined not to read the help.

          PS: In coming here to announce this, I realise that I had forgotten about Jesse's suggestion in #2 of this thread. Perhaps next time!

          Comment


          • #6
            Dear Nick,

            I have installed updated (*! 1.5.2 NJC 31 August 2017) package -numdate- and it is generating the variable but giving following error.

            . numdate td dov = formdate, pattern(DM20Y hm)
            program error: matching close brace not found
            r(198);

            I have added close brace "}" after line # 185 to remove this error and is working fine. Kindly review it upload the corrected version.

            Thanks.

            Comment


            • #7
              I confirm this bug, now fixed in my version. Thanks! I've sent the revised version to Kit Baum.

              Comment


              • #8
                Just to confirm that the revised files are up.

                Comment


                • #9
                  Dear Nick,
                  I try to use this programme and find it quite handy for data management but I found an issue with the version:
                  . which numdate
                  C:\data\ado\plus\n\numdate.ado
                  *! 1.5.3 NJC 12 September 2017
                  *! 1.5.2 NJC 31 August 2017

                  I do have the following code:
                  Code:
                  numdate clock call_end_c = CallClosureDateandTime, pattern(DM20Yhms) varlabel("Time of Call closure")
                  numdate daily call_end_d = CallClosureDateandTime, pattern(DM20Yhms) varlabel("Date of Call closure")
                  numdate monthly call_end_m = CallClosureDateandTime, pattern(DM20Yhms) varlabel("Month/Year of Call closure")
                  numdate yearly call_end_y = CallClosureDateandTime, pattern(DM20Yhms) varlabel("Year of Call closure")
                  The first two lines work perfectly. In lines 3 and 4 the command does not work and generates variables with all missing data.
                  I am puzzled...

                  Comment


                  • #10

                    Thanks for your interest. I can't see your data (no example given) but I believe you've found a limitation (bug if you wish) of numdate.

                    The ambition is that numdate should, handle all imaginable (or reasonable) date generations, but here for example your request is equivalent to trying to do things like

                    Code:
                    di monthly("20-09-17 11:22:33", "DM20Yhms")
                    which just yields missing as monthly() is not that smart.

                    A positive here is that convdate can be used in conjunction. So, I would do this

                    Code:
                    clear 
                    set obs 1 
                    gen sandbox = "20-09-17 11:22:33"
                    
                    numdate clock call_end_c = sandbox, pattern(DM20Yhms) varlabel("Time of Call closure")
                    convdate daily call_end_d = call_end_c, varlabel("Date of Call closure")
                    convdate monthly call_end_m = call_end_c, varlabel("Month/Year of Call closure")
                    convdate yearly call_end_y = call_end_c, varlabel("Year of Call closure")
                    
                    . d
                    
                    Contains data
                      obs:             1                          
                     vars:             5                          
                     size:            37                          
                    -------------------------------------------------------------------------------------------------------------
                                  storage   display    value
                    variable name   type    format     label      variable label
                    -------------------------------------------------------------------------------------------------------------
                    sandbox         str17   %17s                  
                    call_end_c      double  %tc                   Time of Call closure
                    call_end_d      float   %td                   Date of Call closure
                    call_end_m      float   %tm                   Month/Year of Call closure
                    call_end_y      float   %ty                   Year of Call closure
                    -------------------------------------------------------------------------------------------------------------
                    Sorted by: 
                         Note: Dataset has changed since last saved.
                    
                    . l
                    
                         +--------------------------------------------------------------------------+
                         |           sandbox           call_end_c   call_en~d   call_e~m   call_e~y |
                         |--------------------------------------------------------------------------|
                      1. | 20-09-17 11:22:33   20sep2017 11:22:33   20sep2017     2017m9       2017 |
                         +--------------------------------------------------------------------------+
                    So, I need to hit the code or the help or both. Thanks again for the example.

                    Comment


                    • #11
                      Hi Nick, thanks for the alternative code with -convdate-. I think it would be very helpful if you provide a clarification on this issue in the helpfile. As I regard it as an inherent limitation if I understand your argument correctly. Switching commands is fine - although I have nothing against a more elaborated version of -numdate- ;-)

                      My data looks like:
                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input double id str18 CallClosureDateandTime
                       1 "27-Sep-17 17:00:00"
                       2 "27-Sep-17 17:00:00"
                       3 "17-Oct-17 17:00:00"
                       4 "17-Oct-17 17:00:00"
                       5 "17-Oct-17 17:00:00"
                       6 "14-Sep-17 17:00:00"
                       7 "14-Sep-17 17:00:00"
                       8 "14-Sep-17 17:00:00"
                       9 "14-Sep-17 17:00:00"
                      10 "14-Sep-17 17:00:00"
                      11 "14-Sep-17 17:00:00"
                      12 "14-Sep-17 17:00:00"
                      13 "31-Aug-17 17:00:00"
                      14 "31-Aug-17 17:00:00"
                      15 "24-Aug-17 17:00:00"
                      16 "21-Sep-17 17:00:00"
                      17 "25-Apr-17 17:00:00"
                      18 "25-Apr-17 17:00:00"
                      19 "25-Apr-17 17:00:00"
                      20 "11-Apr-17 17:00:00"
                      end

                      Comment


                      • #12
                        Thanks for this. More discussion in the help is easy enough. Feedback on how commonly users of numdate get bitten is also a guide to how much effort I want to spend on it, given other projects too.

                        It's arguable that the problem is that some of Stata's own functions are not smart enough. I've pointed out several times, here, on Stack Overflow, and directly to StataCorp, that functions like monthly() are not smart enough to cope with users' expectations. But it's also possible that programmers are surprised that people might want to get out a monthly date out of a date-time, or would argue that the problem is trivially a piped call to two functions.

                        Comment


                        • #13
                          Originally posted by Nick Cox View Post
                          But it's also possible that programmers are surprised that people might want to get out a monthly date out of a date-time, or would argue that the problem is trivially a piped call to two functions.
                          I think there are several use cases, but in my case I rather work with monthly dates than with daily dates because most of the time daily dates provide an ignorable level of precision.
                          Of course, to some degree it is trivial to use this but it takes me some time again and again to understand what I should do to get the result I want (especially in cases when I re-do this years later). Anyway, the -help datetime- is not the easiest read.

                          Comment


                          • #14
                            Thanks as always to Kit Baum, an update to this package, consisting of a revised help file for numdate, is now available on SSC.

                            This text has been added

                            More remarks: What if numdate disappoints?

                            When numdate works as you expect, you can be happy. When it does not, then you
                            need to know more about how it should work and what else you can do.

                            The main idea of numdate is to serve as a wrapper for whichever function out of

                            clock()
                            Clock()
                            daily()
                            weekly()
                            monthly()
                            quarterly()
                            halfyearly() or
                            yearly()

                            is appropriate for generating a new date variable. However, these functions are
                            not equally smart. Thus, note that for example

                            . di monthly("20-09-17 11:22:33", "DM20Yhms")

                            just yields missing, as monthly() is not smart enough to ignore irrelevant
                            detail. Short of adding work-arounds, this can be considered a limitation of
                            numdate.

                            Experience is that this bites most often with generation of coarse dates from
                            fine information. The positive advice to to use convdate, part of the same
                            package. The following examples illustrates some technique.

                            . clear
                            . set obs 1
                            . gen sandbox = "20-09-17 11:22:33"
                            . numdate clock call_end_c = sandbox, pattern(DM20Yhms) varlabel("Time of Call
                            closure")
                            . convdate daily call_end_d = call_end_c, varlabel("Date of Call closure")
                            . convdate monthly call_end_m = call_end_c, varlabel("Month/Year of Call
                            closure")
                            . convdate yearly call_end_y = call_end_c, varlabel("Year of Call closure")

                            In short, consider using numdate to generate the first date variable and then
                            convdate to convert that to other dates.
                            together with an acknowledgment of Marc's example and discussion.

                            I've just spotted a typo there (should be "example illustrates"), and that will be fixed in the next (substantive) update.
                            Last edited by Nick Cox; 26 Sep 2017, 03:48.

                            Comment


                            • #15
                              Two quality of life suggestions
                              - perhaps it would be an interesting idea to order the newly generated date variable before the first old date variable?
                              - in a similar vein, one rarely needs the old date variables afterwards (except to check numdate worked as expected), so a [dropold] option might be appreciated by some users

                              Both can of course trivially be done by the user afterwards, but having them incorporated in the command removes some "technical" lines from users' code, which in my opinion helps readability. Also, I am lazy.

                              That said, they are just suggestions and not very critical ones at that.

                              Comment

                              Working...
                              X