Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to convert this date format to Stata date format

    Hi Everyone,

    I've been dealing with this somewhat special type of data format and wanted to convert it to Stata date format.
    Below is the sub-samples of the variable.
    In many cases, the date formats are represented as yyyymmdd (numerical).
    I could always search how to convert those types easily.
    But I couldn't find any good solutions to deal with the type in data example.
    I tried combining several commands by myself but it kept giving me missing values, guessing that those commands were completely wrong.
    Could anyone please help me out how to convert this date variable into Stata date format?
    (This variable is a web-scraped variable and I could get year and month but not day values)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str13 join_date
    "2016September"
    "2016January"  
    "2015November"
    "2015October"  
    "2016October"  
    "2016November"
    "2016December"
    "2016February"
    "2017January"  
    "2017February"
    "2017March"    
    "2017April"    
    "2017May"      
    "2016March"    
    "2017June"    
    "2017July"    
    "2018April"    
    "2018February"
    "2017November"
    "2015December"
    "2018January"  
    "2016April"    
    "2017August"  
    "2017September"
    "2017December"
    "2017October"  
    "2018May"      
    "2018March"    
    "2016May"      
    "2018June"    
    "2018July"    
    "2018August"  
    "2016June"    
    "2018September"
    "2018October"  
    "2018November"
    "2018December"
    "2019January"  
    "2019February"
    "2016July"    
    "2019March"    
    "2019April"    
    "2019May"      
    "2019June"    
    "2019July"    
    "2016August"  
    "2019August"  
    "2019September"
    "2019October"  
    "2019November"
    "2019December"
    "2020January"  
    "2020February"
    "2020March"    
    "2020April"    
    "2020May"      
    end
    Last edited by Ryan Kim; 11 Oct 2020, 15:13.

  • #2
    Code:
    gen date = monthly(join_date, "YM")
    format date %tm
    You can use a more elaborate date format if you wish, of course.

    Comment


    • #3
      Code:
      gen date=date(join_date,"YM")
      Stata's date function can handle almost any date formats. You just have to tell Stata, where it has to look for what.

      Comment


      • #4
        Note the differences between the code in #3 and that in #2. They do different things.

        #2 creates a Stata monthly date variable, that requires formatting with a monthly date format (in the %tm family) and encodes only information about the month and year. If you want to add or subtract months to the values, you can do that using the number of months difference. If you are working with time series or panel data, these will be recognized as monthly dates and treated as evenly spaced. Working with time-series operators, 2020April will be considered the first lagged value of 2020May.

        #3 creates a Stata daily date variable, requiring formatting with a daily date format (in the %tm family), and imputing the first day of the month. If you want to add or subtract months to the values, basically, you can't because different months are different numbers of days. (Well, you can do it but it's much more complicated.) If you are working with time series or panel data, these daily dates will not be treated as evenly spaced, again because different months have different numbers of days. And if you try to use time series operators, the first lag of 2020May will be April 30, 2020--which does not occur in your data.

        Which of these is better for your purpose depends on what you're going to do with this variable. If you are going to be merging this data with other data sets that have actual daily-level information, then #3 is better. If everything you are working with, however, is monthly data, #2 is better for you.

        Comment


        • #5
          For future reference, Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

          All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

          Comment


          • #6
            Nick Cox's -numdate- is also useful, because -date- is very flexible and understands pretty much anything, but -monthly- is a bit dumber. In this particular example -monthly- understood, but if the date was 202010, -monthly- would not have understood, and would have generated missings. -numdate- also automatically formats the output, so saves one line of code.

            Code:
            . gen m = monthly(join,"YM")
            
            . format m %tm
            
            . numdate monthly m2 = join, pattern(YM)
            
            . list in 1/5, sep(0)
            
                 +-----------------------------------+
                 |     join_date         m        m2 |
                 |-----------------------------------|
              1. | 2016September    2016m9    2016m9 |
              2. |   2016January    2016m1    2016m1 |
              3. |  2015November   2015m11   2015m11 |
              4. |   2015October   2015m10   2015m10 |
              5. |   2016October   2016m10   2016m10 |
                 +-----------------------------------+
            Here is -monthly- being dumb:

            Code:
            . dis monthly("202010", "YM")
            .
            It return a missing, whereas the outcome should have been:

            Code:
            . dis monthly("2020m10", "YM")
            729
            Last edited by Joro Kolev; 12 Oct 2020, 03:47.

            Comment


            • #7
              Thank you so much, everyone.
              It perfectly worked!

              Comment

              Working...
              X