Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting string dates

    Dear statalist,

    I have been trying to convert string dates but I got inconsistent results.

    My data look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 Date
    "13jan2015"
    "24may1976"
    "12jun2012"
    "20mar1962"
    "07mar1877"
    "23dec1974"
    "17sep2014"
    "21dec1962"
    "05mar2008"
    "11nov2015"
    end
    Since I am only interested to the years, to destring Date I type:
    Code:
     encode Date, generate(Date2)
    generate Date3=year(Date2)
    However, what I got is:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float Date3
    1962
    1964
    1961
    1963
    1961
    1963
    1962
    1963
    1960
    1961
    end
    As you see, I obtained strange results.

    Thank you in advance for your support.

    Best,
    Stefano

    Last edited by Stefano Amato; 04 Sep 2020, 10:20.

  • #2
    Stata dates are SIF values, not strings. When you encode, the strings are sorted and assigned values starting from 1. As it happens, 2nd Jan 1960 is 1, 3rd Jan 1960, 2, and so on. See -help datetime-

    Code:
    . di date("01jan1960", "DMY")
    0
    
    . di date("02jan1960", "DMY")
    1
    
    . di date("03jan1960", "DMY")
    2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 Date
    "13jan2015"
    "24may1976"
    "12jun2012"
    "20mar1962"
    "07mar1877"
    "23dec1974"
    "17sep2014"
    "21dec1962"
    "05mar2008"
    "11nov2015"
    end
    
    gen date= date(Date, "DMY")
    format date %td
    gen year= year(date)
    Res.:

    Code:
    . l
    
         +------------------------------+
         |      Date        date   year |
         |------------------------------|
      1. | 13jan2015   13jan2015   2015 |
      2. | 24may1976   24may1976   1976 |
      3. | 12jun2012   12jun2012   2012 |
      4. | 20mar1962   20mar1962   1962 |
      5. | 07mar1877   07mar1877   1877 |
         |------------------------------|
      6. | 23dec1974   23dec1974   1974 |
      7. | 17sep2014   17sep2014   2014 |
      8. | 21dec1962   21dec1962   1962 |
      9. | 05mar2008   05mar2008   2008 |
     10. | 11nov2015   11nov2015   2015 |
         +------------------------------+

    Comment


    • #3
      Not puzzling at all... but naturally not what you want.

      encode with nothing else said maps distinct strings, sorted alphanumerically, to distinct integers 1 up. So, on that ground alone, it could never be faithful to dates before 2 January 1960,

      year() with nothing else said treats its argument as a daily date with origin 1 January 1960. So, for this data example alone the results are all in 1960.

      Note that the output you cite is for a larger dataset and is not reproducible from your data example.

      The fact that the value labels which you see look right is a distraction. Underneath the value labels, the numeric dates are not even in the right order.

      I show two ways here to do it right.

      Code:
      . clear
      
      . input str10 Date
      
                 Date
        1. "13jan2015"
        2. "24may1976"
        3. "12jun2012"
        4. "20mar1962"
        5. "07mar1877"
        6. "23dec1974"
        7. "17sep2014"
        8. "21dec1962"
        9. "05mar2008"
       10. "11nov2015"
       11. end
      
      .
      . encode Date, generate(Date2)
      
      . generate Date3=year(Date2)
      
      . sort Date
      
      .
      . list
      
           +-------------------------------+
           |      Date       Date2   Date3 |
           |-------------------------------|
        1. | 05mar2008   05mar2008    1960 |
        2. | 07mar1877   07mar1877    1960 |
        3. | 11nov2015   11nov2015    1960 |
        4. | 12jun2012   12jun2012    1960 |
        5. | 13jan2015   13jan2015    1960 |
           |-------------------------------|
        6. | 17sep2014   17sep2014    1960 |
        7. | 20mar1962   20mar1962    1960 |
        8. | 21dec1962   21dec1962    1960 |
        9. | 23dec1974   23dec1974    1960 |
       10. | 24may1976   24may1976    1960 |
           +-------------------------------+
      
      .
      . list, nolabel  
      
           +---------------------------+
           |      Date   Date2   Date3 |
           |---------------------------|
        1. | 05mar2008       1    1960 |
        2. | 07mar1877       2    1960 |
        3. | 11nov2015       3    1960 |
        4. | 12jun2012       4    1960 |
        5. | 13jan2015       5    1960 |
           |---------------------------|
        6. | 17sep2014       6    1960 |
        7. | 20mar1962       7    1960 |
        8. | 21dec1962       8    1960 |
        9. | 23dec1974       9    1960 |
       10. | 24may1976      10    1960 |
           +---------------------------+
      
      .
      . gen year = real(substr(Date, -4, 4))
      
      .
      . gen dailydate = daily(Date, "DMY")
      
      .
      . gen year2 = year(dailydate)
      
      .
      . list, nolabel
      
           +-----------------------------------------------------+
           |      Date   Date2   Date3   year   dailyd~e   year2 |
           |-----------------------------------------------------|
        1. | 05mar2008       1    1960   2008      17596    2008 |
        2. | 07mar1877       2    1960   1877     -30249    1877 |
        3. | 11nov2015       3    1960   2015      20403    2015 |
        4. | 12jun2012       4    1960   2012      19156    2012 |
        5. | 13jan2015       5    1960   2015      20101    2015 |
           |-----------------------------------------------------|
        6. | 17sep2014       6    1960   2014      19983    2014 |
        7. | 20mar1962       7    1960   1962        809    1962 |
        8. | 21dec1962       8    1960   1962       1085    1962 |
        9. | 23dec1974       9    1960   1974       5470    1974 |
       10. | 24may1976      10    1960   1976       5988    1976 |
           +-----------------------------------------------------+
      For documentation of this pitfall see e.g.


      https://journals.sagepub.com/doi/abs...867X1801800413
      Last edited by Nick Cox; 04 Sep 2020, 10:59.

      Comment


      • #4
        Hi,

        I have a similar issue. I have dates in my data file as a string variable listed in this format: 12/20/2004 00:00:00, I would like to convert this into a numeric variable and also drop the time format. Any ideas on where to start with syntax on this. I'm guessing the conversation may be a multi-step process.

        ESG

        Comment


        • #5
          Code:
          help datetime

          Comment


          • #6
            Andrew and Nick, thak you for your valuable suggestions!
            Best,
            Stefano

            Comment

            Working...
            X