Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting a numeric (double) variable type to Stata Internal Date

    I am attempting to convert a variable <date> obtained from a registry database, which displays using list as: 17oct1940 00:00:00.

    Codebook identifies this as a numeric (double) variable type, with range: [-8.795e+11,9.938e+11] .

    I have used the following 3 approaches to attempt to convert this to a Stata Internal Date ( with error message following "//"):

    gen newborn=date(string(zanzip_dateborn), "DMY") // generates empty var
    gen newborn2=substr(string(zanzip_dateborn), 1,9) // generates odd numbers such as -772000000000
    numdate newborn3 = zanzip_dateborn, pattern(DMYhms) // Unrecognized kind of date-time

    I'm familiar with using -date- and -todate- from simple strings or numeric variables, but this one has me stumped.
    I'm not able to simulate an example, as the following code returns error "cannot be read as number":

    Code:
    clear
    input date
    17jul1935 00:00:00
    end
    Any advice would be very much appreciated.
    Michael

  • #2
    Codebook identifies this as a numeric (double) variable type, with range: [-8.795e+11,9.938e+11] .
    This suggests that the variable is already a Stata internal format date-time (clock) variable. Since it displays as 17oct1940 00:00:00, it is already has a %tc format. So if what you need is to extract the date from it and eliminate the 00:00:00 time part, there are two approaches you can use.

    1. If all of the values of this variable have 00:00:00 as their time component, so that the time is, in fact, uninformative, you might just change the display format to %tcCCYYMonDD. That won't change the actual values of the variable in any way, just how it looks when you -display- or -list-, etc. If you are using it in calculations, you need to make sure that anything you calculate from it gets stored as a -double- or you will lose precision.

    2. Alternatively, if the time part is informative and you need a separate variable that contains only the date, you can get that with:
    Code:
    gen date_only = dofc(zanzip_dateborn)
    format date_only %td
    This will create a new variable, date_only, that contains only the date part of the variable. It does not require -double- storage type and is, for most purposes, easier to use than the original clock variable.

    Concerning your unsuccessful attempts:

    Applying the string() function to a numeric variable that has a date-time display format will not get you a string that looks like a date time. It will get you a string that looks like the numeric or scientific notation display of the number it contains, which in this case will be some gigantic floating point double-precision number (# of milliseconds since 1 jan 1960). By contrast, the date() function expects its first argument to look like "9aug2016" or something like that. When it doesn't find that kind of string, it returns missing value. When you apply the substr() function you are just selecting certain digits out of that number, hence the "odd" results. Not being myself a user of -numdate-, a user-written command from SSC, I can't give you an explanation of what happened there, but I suspect it is the same thing: it was expecting something that looked like a string that appears like a date to human eyes and instead found some huge number and therefore threw an error message your way.

    Comment


    • #3
      Clyde is right.

      I can comment on numdate. See http://www.statalist.org/forums/foru...date-variables for the original story.

      The problem with using numdate (SSC, as you are asked to explain) is exactly the same problem as with other commands. The data you have are already converted so numdate can't recognise them as dates in DMYhms form.

      Considering this example

      Code:
      . di %13.0f  clock("10 August 2016 08:20:00", "DMY hms")
      1786436400000
      the dialogue is tacitly of the following form

      Michael: My data like 1786436400000 are in the form DMYhms and should be converted to date-times.

      Stata: I can't see that to be true.

      In fact, Michael was presenting just the first so many digits, but that won't help at all.

      The response Michael wants is

      Stata: Not necessary, as you have already have data in the right form.

      But it's hard, and in the long run a very bad idea, to try to build checks for misunderstanding into syntax checkers.

      Clearly the example you tried isn't numeric, but as suggested in the FAQ you could have used dataex (SSC) to give examples of your data.

      The bottom line is this: if you have numeric variables with a date or date-time display format and the dates look good, there is nothing to do. Conversely numeric variables with date or date-time display format looking nothing like dates or date-time in raw numeric terms (the only exception, not relevant here, being calendar years such as 2016), but that isn't a problem in itself.

      Comment


      • #4
        Fabulous, Clyde and Nick. The explanations are very meaningful.

        Comment

        Working...
        X