Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert string YYYY-MM-DD to numeric but generate strange values

    Hello!

    As per the title of the question, I have some data imported from excel with dates in string YYYY-MM-DD format (DisclosureDate) and I want to convert them to numeric format (create a new variable disclosedate) so that I can work with the data. I tried
    Code:
    gen disclosedate = daily(DisclosureDate, "YMD")
    but then something strange occurred
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 DisclosureDate float disclosedate
    "2010-03-12" 18333
    "2011-02-25" 18683
    "2012-03-09" 19061
    "2013-03-08" 19425
    "2014-03-07" 19789
    "2015-03-13" 20160
    "2016-03-10" 20523
    "2017-03-17" 20895
    "2018-03-15" 21258
    "2019-03-07" 21615
    "2020-02-14" 21959
    "2010-03-02" 18323
    "2011-03-08" 18694
    "2018-03-27" 21270
    "2019-03-26" 21634
    "2020-03-18" 21992
    "2010-04-17" 18369
    "2011-04-23" 18740
    "2012-04-21" 19104
    "2013-04-20" 19468
    "2014-04-22" 19835
    "2015-04-30" 20208
    "2016-04-30" 20574
    "2017-04-11" 20920
    end
    It seems disclosedate is float rather than numeric, so I wonder how to convert the variable into a format that stata can work with. Thanks a lot for any help!

  • #2
    Byte, int, float, and double are all particular types of numeric variables in Stata. So your new variable disclosedate is, indeed, numeric. And it is exactly what Stata needs to work with. If you would like to be able to read and understand the dates represented by disclosedate yourself, you can beautify it by applying a date format, e.g. -format disclosedate %td-. If you do that, when you -browse- or -list- the data, it will look to your eye like a date. But within the computer's memory it is still a number like 20574 or 19104, etc--which is exactly what Stata needs to use it for calculations.

    The four types differ in the amount of storage, and therefore the number of digits they can hold. Byte is too small to hold the values that are needed for dates. Int, float, and double all are big enough. Whenever you use the -generate- command to create a numeric variable and don't specify otherwise, by default, Stata will create it as a float. That works for most purposes. Sometimes you have to be careful, though. For example, to create a variable that holds both date and time (e.g. using the -clock()- function), only double is large enough and if you forget to specify -gen double new_variable = clock(whatever)-, the results will be incorrect and unusable. On the other end, with a simple date variable, if you are working with a very large data set and are pushing the limits of available memory, an int is large enough and you could have created the variable as -gen int disclosedate = daily(DisclosureDate, "YMD")-. And, if you belatedly discover that you need more space after having originally created it as a float, you can squeeze it down to an int by running -compress disclosedate-.

    Comment


    • #3
      Hi Clyde,

      Thanks for your detailed reply, it solves my confusion!

      Comment

      Working...
      X