Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Date Conversion

    Hello,

    I am trying to expend one data set with another data set by using the command "append using, keep (...)". However, I have realised that the "date" variable in both data sets do not have the same format. In the master data set the date is formatted as " 5 August 2020" while in the other data set it is formatted as "Sunday, 17th November 2019". I have realised that there seems a problem in expanding variable observations if they are not in the same format. It looks like the data entries for the dates are ignored when expanding the master data set. Can someone help with formatting "Sunday, 17th November 2019" into a proper date format for appending?

    Thank you.

  • #2
    I take it that expend and expand are typos for append: that's more than trivial because there is an expand process in Stata which is covered by the expand command.

    The word format is overloaded in computing. It can mean file format, data format (structure or layout), display format, .... Here I guess you mean something else, namely that your dates are string variables with weekday as preamble in one place but not in another. If so, that's soluble. Appending wouldn't work to put values in the same variable if the original variables were of quite different types.


    Code:
    clear
    input str26 problemdate
    "5 August 2020"
    "Sunday, 17th November 2019"
    end
    
    gen betterdate = daily(problemdate, "DMY")
    
    gen problemdate2 = substr(problemdate, strpos(problemdate, ",") + 1, .) if missing(betterdate)
    
    
    foreach el in st nd rd th {
      replace problemdate2 = subinstr(problemdate2, "`el'", "", .)
    }
    
    replace betterdate = daily(problemdate2, "DMY") if missing(betterdate)
    format betterdate %td
    order problemdate problemdate2 betterdate
    
    list
    
         +------------------------------------------------------------+
         |                problemdate        problemdate2   betterd~e |
         |------------------------------------------------------------|
      1. |              5 August 2020                       05aug2020 |
      2. | Sunday, 17th November 2019    17 November 2019   17nov2019 |
         +------------------------------------------------------------+
    Otherwise put, string dates such as "5 August 2020" are easy. But if your strings contain other elements you may need to strip those other elements out.

    If this answer doesn't make sense, or other problems remain, then you please need to do as we ask in https://www.statalist.org/forums/help#stata (emphasis added on date variables).


    12.2 What to say about your data



    We can understand your dataset only to the extent that you explain it clearly.

    The best way to explain it is to show an example. The community-contributed command dataex makes it easy to give simple example datasets in postings. It was written to support Statalist and its use is strongly recommended. Usually a copy of 20 or so observations from your dataset is enough to show your problem. See help dataex for details.

    As from Stata 15.1 (and 14.2 from 19 December 2017), dataex is included with the official Stata distribution. Users of Stata 15 (or 14) must update to benefit from this. Users of Stata 16 or 17 need not do anything: dataex is already part of your installation.

    Users of earlier versions of Stata must install dataex from SSC before they can use it. Type ssc install dataex in your Stata.

    The merits of dataex are that we see your data as you do in your Stata. We see whether variables are numeric or string, whether you have value labels defined and what is a consequence of a particular display format. This is especially important if you have date variables. We can copy and paste easily into our own Stata to work with your data.

    If your dataset is confidential, then provide a fake example instead.

    The second best way to explain your situation is to use one of Stata's own datasets and adapt it to your problem. Examples are the auto data and the Grunfeld data (a simple panel dataset). That may be more work for you and you may not find an analog of your problem with such a dataset.

    The worst way to explain your situation is to describe your data vaguely without a concrete example. Note that it doesn't help us much even to be given your variable names. Often that leaves unclear both your data structure and whether variables are numeric or string or their exact contents. If you explain only vaguely, quick answers to your question, or even any answers at all, are less likely.


    Last edited by Nick Cox; 16 Oct 2021, 05:18.

    Comment


    • #3
      Hello Nick,

      thank you for your help. I am quite new to the world of Stata, hence, apologies for the misleading use of the words.

      Your solutions help me a lot. However, my problematic dates e.g.
      Code:
      Sunday, 17th November 2019
      are already stored as numeric variables. So before I can use
      Code:
      gen problemdate2 = substr(problemdate, strpos(problemdate, ",") + 1, .)
      I need to convert problemdate into a string variable. I have already tried out
      Code:
      tostring
      and also
      Code:
      gen st_problemvariable=string(problem variable,"%8.0g")
      , but they both seem to be incorrect.

      Comment


      • #4
        I am quite sure that Nick would not advise you to convert these numeric date variables back to strings. Rather, if these are both numeric variables, but they appear to human eyes in the Browser or -list- output the way you describe in #1, then they simply have different display formats. In that case, they are perfectly compatible for purposes of -merge- or -append-. I'm not sure which format will be kept after you combine the data sets. But either way, no information will be lost in the process. In short, what you describe is a non-problem.

        It is important to remember that in Stata what you see is not necessarily what Stata understands. Display formatting is just that: it does not affect the underlying information that Stata is working with, it just affects the way Stata dresses it up for human eyes when it shows it to you.

        If this problem you are working on is not just a one-time engagement with date variables in Stata, you are well advised to read the Datetime chapter of the Data Management [D] volume of the PDF manuals that come installed with Stata. You can reach that material through the Help dropdown menu. It is a long read, and the material is complicated. And you won't remember all of it--some of it you may never even use again. But it will help you cement in your mind the way Stata deals with dates and times, and show you the basics of creating and working with such variables. When, in application, your memory fails you on the details, the help files will then generally be enough to get you through. The time you invest in this reading will be amply and quickly repaid.

        Comment


        • #5
          Hello Clyde,

          I have already tried to append the master data set with the data set that contains the date in the form of:
          Code:
           Sunday, 17th November 2019
          .
          Unfortunately, it does not work at all.

          In both data sets the dates are stored as byte type and %8.0g format. However, as mentioned above, in the master data set the date is displayed as:
          Code:
          17th November 2019
          . I have looked up the date formats and I couldn't find an official date format in STATA with weekday included (also with by reading the date time chapter). Hence, I think there is the need to convert the weekday display format somehow.

          This is so strange.

          Last edited by Penelope Smart; 16 Oct 2021, 14:50.

          Comment


          • #6
            In both data sets the dates are stored as byte type and %8.0g format.
            Oh, that's terrible! Then what you actually have are value labeled consecutive integers. These are neither date variables nor string variables, and of all the ways to represent dates it is the least useful. This is yet another way in which what you see is not what Stata perceives--a way that is useful in many contexts, but is dreadful for dates. If somebody else gave you these data sets, that person is not your friend!

            Before going on to tell you how to fix all this, let me again point out that had you used -dataex- to show example data either originally, or at some other point along the way, this would have been immediately clear to Nick or me and you would have gotten correct advice right away. So please, please, always show example data and always use -dataex- to do so. Anything else is just wasting both your time and that of those who want to help you. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            Now, the way out of this mess is a bit convoluted, but not too bad. First, you actually do have to take them back to string variables. Assume for illustration that the name of this variable you have is messed_up_date. You do

            Code:
            // CONVERT TO STRING VARIABLE FIRST
            decode messed_up_date, gen(string_date)
            
            // CLEAN UP THE STRING VARIABLE
            replace string_date = substr(string_date, strpos(string_date, ",") +1, .)
            foreach el in st nd rd th {
                replace string_date= subinstr(string_date, "`el'", "", .)
            }
            
            // CONVERT STRING DATE TO STATA DATE VARIABLE
            gen stata_date = daily(string_date, "DMY")
            format stata_date %td
            To the reading list I suggested above, do add the -label- chapter of the [D] volume. It's pretty short and not all that complicated. But you need to understand what value labeled integer variables are, how to recognize them, and how to use them.


            Comment


            • #7
              Hello Clyde,

              thank you so much! Thanks to your code it finally worked!

              I am sorry to you and Nick for the inconvenience caused due to limited knowledge in using Stata/such a forum. For the next time, I will take your suggestions into consideration.

              PS.: The data was provided by an institute.

              Comment

              Working...
              X