Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract year, month, and day from a string variable

    Hi, I have a string variable with the date in the following format

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long date
    20170927
    20170928
    20170929
    20171002
    20171003
    20171004
    20171005
    20171006
    20171007
    20171009
    20171010
    20171011
    20171012
    26032018
    26042018
    26052018
    26062018
    26082019
    26092019
    27012018
    27022018
    27022019
    27032018
    27042018
    27062018
    20122018
    21022018
    21022019
    11032018
    1062018
    1082019
    1102019
    2012018
    2022018
    2022019
    end
    I intend to extract the month, year, and day from it. Though I successfully extracted the year information using the following command, however, I still can't seem to figure out how to extract the day and month.

    moss date, regex match(^"(201[4-9]+)") max(1)


  • #2
    You need to get clarity from your data provider. There is no consistency in the order of your month, day and year (MDY). Secondly, you have "21022019" which is DMY. "2022019" can imply either 20th February 2019 or 2nd February 2019. Which one is it?

    Comment


    • #3
      As Andrew Musau pointed out, the information inside the variable doesn't seem to follow any principle. For example, the year appears in the beginning in the first observations, but changes to the end inf the final observations.

      Shall you have consistent data, this is a way to produce what you wish:

      Code:
      . set obs 3
       
      . input str10 mydate1
      
              mydate1
        1. 20171026
        2. 20180613
        3. 20150319
      
      . gen mydate2 = date(mydate1, "YMD")
      
      . clonevar mydate3 = mydate2
      
      . format mydate3 %td
      
      . gen myyear = year(mydate3)
      
      . gen mymonth = month(mydate3)
      
      . gen myday = day(mydate3)
      
      . list
      
           +-----------------------------------------------------------+
           |  mydate1   mydate2     mydate3   myyear   mymonth   myday |
           |-----------------------------------------------------------|
        1. | 20171026     21118   26oct2017     2017        10      26 |
        2. | 20180613     21348   13jun2018     2018         6      13 |
        3. | 20150319     20166   19mar2015     2015         3      19 |
           +-----------------------------------------------------------+

      Hopefully that helps.
      Best regards,

      Marcos

      Comment


      • #4
        Thank you Marcos Almeida and Andrew Musau for the working above.

        But, yeah, my time is inconsistent, which is why I wanted to ask if there is a possibility of a solution.
        The data was collected through mobile phone whose date time format may vary, and I unfortunately do not have access to the source database to retrieve it in a consistent format.

        Comment


        • #5
          Andrew Musau and Marcos Almeida are right. I gather from your data example that all dates fall within 2017 to 2019. If that's so, you can make progress.


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long date
          20170927
          20170928
          20170929
          20171002
          20171003
          20171004
          20171005
          20171006
          20171007
          20171009
          20171010
          20171011
          20171012
          26032018
          26042018
          26052018
          26062018
          26082019
          26092019
          27012018
          27022018
          27022019
          27032018
          27042018
          27062018
          20122018
          21022018
          21022019
          11032018
          1062018
          1082019
          1102019
          2012018
          2022018
          2022019
          end
          
          gen work = strofreal(date, "%8.0f")
          
          gen dailydate = daily(work, "YMD") if inrange(substr(work, 1, 4), "2017", "2019")
          
          replace dailydate = daily(work, "DMY") if inrange(substr(work, -4, 4), "2017", "2019") & missing(daily)
          
          format dailydate %td
          
          list
          
              +---------------------------------+
               |     date       work   dailydate |
               |---------------------------------|
            1. | 20170927   20170927   27sep2017 |
            2. | 20170928   20170928   28sep2017 |
            3. | 20170929   20170929   29sep2017 |
            4. | 20171002   20171002   02oct2017 |
            5. | 20171003   20171003   03oct2017 |
               |---------------------------------|
            6. | 20171004   20171004   04oct2017 |
            7. | 20171005   20171005   05oct2017 |
            8. | 20171006   20171006   06oct2017 |
            9. | 20171007   20171007   07oct2017 |
           10. | 20171009   20171009   09oct2017 |
               |---------------------------------|
           11. | 20171010   20171010   10oct2017 |
           12. | 20171011   20171011   11oct2017 |
           13. | 20171012   20171012   12oct2017 |
           14. | 26032018   26032018   26mar2018 |
           15. | 26042018   26042018   26apr2018 |
               |---------------------------------|
           16. | 26052018   26052018   26may2018 |
           17. | 26062018   26062018   26jun2018 |
           18. | 26082019   26082019   26aug2019 |
           19. | 26092019   26092019   26sep2019 |
           20. | 27012018   27012018   27jan2018 |
               |---------------------------------|
           21. | 27022018   27022018   27feb2018 |
           22. | 27022019   27022019   27feb2019 |
           23. | 27032018   27032018   27mar2018 |
           24. | 27042018   27042018   27apr2018 |
           25. | 27062018   27062018   27jun2018 |
               |---------------------------------|
           26. | 20122018   20122018   20dec2018 |
           27. | 21022018   21022018   21feb2018 |
           28. | 21022019   21022019   21feb2019 |
           29. | 11032018   11032018   11mar2018 |
           30. |  1062018    1062018           . |
               |---------------------------------|
           31. |  1082019    1082019           . |
           32. |  1102019    1102019           . |
           33. |  2012018    2012018           . |
           34. |  2022018    2022018           . |
           35. |  2022019    2022019           . |
               +---------------------------------+
          
          .
          Perhaps the last block really are DMY, but MDY does not seem impossible either.
          Last edited by Nick Cox; 05 Nov 2019, 05:57.

          Comment

          Working...
          X