Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing Date Format from Long to Byte

    Hello,
    I have a dataset with a date variable that is in long format. It's a numeric variable, but Stata is still reading it as a string variable. I have already tried the conversion from string to numeric option, but it says that it is already in numeric format so it cannot convert it. When I try to merge this dataset with another dataset and match on ID and date, it tells me that this date variable is in string format while the other is in numeric format so it cannot merge. I also tried encoding it, but it still reads it as string. Does anyone know how to work with dates in stata and have them be read as numeric variables (byte or double format)? My date variable is currently in long format (yyyymmdd).

    Any help is much appreciated!

  • #2
    It might be a Stata14 bug.
    Code:
    update all
    solve it.

    Code:
    set obs 1
    gen a="1"
    destring a, replace
    tostring a, replace
    destring a, replace
    check whether it break when destring in the second time

    Comment


    • #3
      Dates are somewhat complicated in Stata. In addition, with dates as with other variables, you have to distinguish between the way they look to your eye when you -list- them or view them in the Browser/Editor, and the way they are actually stored in the data set. Finally, -destring- and -encode- do very different things and are used for different purposes; interestingly, neither of them is applicable to date variables! From your narrative, it appears you are confused about all of these things.

      But I can't help you out with your specific data without seeing examples from both data sets. And the only way you can show that data with all of the information, including metadata, that is necessary, is to use the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      If you post back using -dataex- to show examples of both data sets, I'm sure that somebody will give you a timely and helpful response.

      With regard to your final question, dates have too much information to fit within the confines of a byte. If they are just dates without time of day, they can fit into an int.* Float is often used just because it is the default storage type for variables created with the -generate- command. If they include time of day, then the amount of information requires a double.

      Edit: Crossed with #2. I seriously doubt that there is a bug with -destring-: that command is very commonly used and a bug would almost surely have turned up and been fixed during the life of version 14. Certainly a bug that could be detected with the simple example in #2 would have long been found and fixed.

      Even if there were a bug, the -destring- command actually plays no role in the management of Stata dates anyway. String variables that read as dates to human eyes can be converted to dates, not with the -destring- command, but with the -daily()- or -date()- functions.

      Added: * Dates through 21 Aug 2049 can be stored in ints. It is rare to need to refer to dates later than that, but if you do, then they must be stored as long or float.
      Last edited by Clyde Schechter; 11 Sep 2018, 22:53.

      Comment


      • #4
        Clyde gives excellent advice in #3 as usual.

        I will just add that format is not, in Stata, another word for storage or variable type. It's an over-worked term in computing (file format, display format, storage format, ...) but Stata's meaning is the one to follow here.

        You can make sure that you're following Stata terminology by echoing error messages literally -- and by giving data examples which show us Stata's view of your data, not your rewording.

        Comment


        • #5
          Thank you for all your advice. Below is the -dataex- output for my date variable in the dataset which I'm trying to merge with the master dataset.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str9 date
          "19971007"
          "19970805"
          "19970828"
          "19970906"
          "19970805"
          "19970923"
          "19970821"
          "19970820"
          "19970814"
          "19970827"
          "19970830"
          "19970827"
          "19970807"
          "19970830"
          "19970507"
          "19970501"
          "19971009"
          "19970715"
          "19970501"
          "19970507"
          "19970812"
          "19970607"
          "19970520"
          "19970612"
          "19970504"
          "19970513"
          "19970520"
          "19970610"
          "19970605"
          "19970516"
          "19970607"
          "19970603"
          "19990224"
          "19990120"
          "19990225"
          "19990217"
          "19990217"
          "19990513"
          "19990223"
          "19990203"
          "19990203"
          "19990211"
          "19990302"
          "19990224"
          "19980423"
          "19970628"
          "19970620"
          "19970620"
          "19990205"
          "19990205"
          end


          This is the -dataex- output of the date variable that is from my master dataset.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long date
          19970805
          19970828
          19970906
          19970805
          19970923
          19970821
          19970820
          19970814
          19970827
          19970830
          19970827
          19970807
          19970830
          19970507
          19970501
          19971018
          19971009
          19970715
          19970501
          19970507
          19970812
          19970607
          19970520
          19970612
          19970604
          19970513
          19970520
          19970610
          19970605
          19970516
          19970607
          19970603
          20000304
          20000301
          20000316
          19990224
          19990120
          19990225
          19990217
          19990217
          19990513
          19990223
          19990203
          19990203
          19990211
          19990302
          19990224
          19980423
          19970628
          19970620
          end
          Once again, I appreciate your help.

          Comment


          • #6
            Let's put together an example to show technique. Here all data are in one dataset: you'll need to work in each separately first.

            1. Run-together dates like "19971007" or 19971007 make sense to people but are useless for serious Stata analysis. To see this, you need only consider the jump from 19971231 to 19980101 which people can see is 1 day but is a meaningless difference to Stata if values are string and a numeric difference of 8870 days(???) otherwise.

            2. The daily() function of Stata (also known as date()) can help, but it needs a string argument and quite possibly an explicit format.

            3. You need to apply a date display format of choice too.

            4. Once you have dates, merge should be easy, or at least easier.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str9 date1 
            "19971007"    
            "19970805"
            "19970828"
            "19970906"
            "19970805"
            "19970923"
            "19970821"
            "19970820"
            "19970814"
            "19970827"
            "19970830"
            "19970827"
            "19970807"
            "19970830"
            "19970507"
            "19970501"
            "19971009"
            "19970715"
            end 
            input long date2
            19970805
            19970828
            19970906
            19970805
            19970923
            19970821
            19970820
            19970814
            19970827
            19970830
            19970827
            19970807
            19970830
            19970507
            19970501
            19971018
            19971009
            19970715
            gen Date1 = daily(date1, "YMD")
            gen Date2 = daily(string(date2, "%9.0f"), "YMD")
            format Date? %td
            l
            
                 +---------------------------------------------+
                 |    date1      date2       Date1       Date2 |
                 |---------------------------------------------|
              1. | 19971007   19970805   07oct1997   05aug1997 |
              2. | 19970805   19970828   05aug1997   28aug1997 |
              3. | 19970828   19970906   28aug1997   06sep1997 |
              4. | 19970906   19970805   06sep1997   05aug1997 |
              5. | 19970805   19970923   05aug1997   23sep1997 |
                 |---------------------------------------------|
              6. | 19970923   19970821   23sep1997   21aug1997 |
              7. | 19970821   19970820   21aug1997   20aug1997 |
              8. | 19970820   19970814   20aug1997   14aug1997 |
              9. | 19970814   19970827   14aug1997   27aug1997 |
             10. | 19970827   19970830   27aug1997   30aug1997 |
                 |---------------------------------------------|
             11. | 19970830   19970827   30aug1997   27aug1997 |
             12. | 19970827   19970807   27aug1997   07aug1997 |
             13. | 19970807   19970830   07aug1997   30aug1997 |
             14. | 19970830   19970507   30aug1997   07may1997 |
             15. | 19970507   19970501   07may1997   01may1997 |
                 |---------------------------------------------|
             16. | 19970501   19971018   01may1997   18oct1997 |
             17. | 19971009   19971009   09oct1997   09oct1997 |
             18. | 19970715   19970715   15jul1997   15jul1997 |
                 +---------------------------------------------+

            Comment


            • #7
              I successfully generated the new variables in each dataset and applied the suggested format to both, however it still does not let me merge the datasets. It gives the following error message:

              Code:
              variable date is long in master but str9 in using data
                  You could specify merge's force option to ignore this numeric/string mismatch.  The
                  using variable would then be treated as if it contained numeric missing value.
              r(106);

              Comment


              • #8
                As Stata is telling you, you must merge on variables of the same kind. That was established in #1 of this thread.

                If you're starting with string variable date in one and numeric variable date in the other you would be well advised to map both to a numeric daily date with a different name from date but the same variable name in each dataset.

                As Jorrit points out below, there is no point in trying again with the original variables. They were not, and are not, suitable.

                Comment


                • #9
                  Possibly because you have successfully created the new variables, but are merging using the old variables.

                  Comment

                  Working...
                  X