Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help: Stata imports Excel dates with different formats

    Hi all,

    hope you can help me.

    I have an Excel file with some date variables, listed in the format: dd-mm-yyyy. (ex. 01-07-2014 = 1st July 2014)

    When I import the excel file into Stata (12.1), the formats look differently.

    I started formatting the date variable like this:

    gen new_date= date(old_date, "ddmmyyyy")
    format new_date %td

    after running this command, only some of the output looks like what I want (01July2014), but some of the variables are still listed as: 01-07-2014, and some are listed with a mm-dd-yyyy format, seperated with "/": 01/07/2014

    At first, I was wondering if I had spaces infront of the dates in excel, so I tried to trim the data. However, this still doenst work.

    Could it be that Stata automaticlly converts the date to american formats (mm/dd/yyyy).

    Much appreciated with all your help!

    Thanks in advance,
    Malinna J.



  • #2
    This is a silly work-around that you shouldn't have to do, but have you tried formatting them in Excel as text, then running the import, then converting back to dates? If it's only a few variables, this may be your best option: between the USA vs. The World formatting, and the other issues, this might be simplest. As an extreme, depending on the size of your dataset) might try importing all as strings. Which you shouldn't have to do, but Excel and the Excel import function are quirky. Some people even save-as .csv from Excel, then import the .csv to avoid unpredictable quirks.

    Comment


    • #3
      I'm skeptical that you're getting the output you say you're getting after running the date command and formatting the resulting variables as %td.

      If you run:
      Code:
      gen new_date= date(old_date, "ddmmyyyy")
      format new_date %td
      new_date will be missing for any values of old_date that don't make sense in the ddmmyyyy form. You'll never get a value like "01-07-2014" after running date unless you change from the default %td format.

      Maybe what you're saying is that after the import from excel some of your variables are formatted as sensible dates and some are not. In that case you just need to run the appropriate date command on any of the variables that are imported as string in a date form other than ddmmyyyy.

      At any rate for us to give useful suggestions you need to be clearer about exactly what you have after the import statement and exactly what commands you're running. I do a LOT of importing data from excel. It is very rare that I run into an issue with dates that requires pre-processing in excel. Most things that are recognizable dates Stata automatically imports as dates in the %td format. Any other variables can usually be easily transformed into Stata dates with a single date command. If you have a variable that has a mix of strings with the date elements in different orders it requires a few lines of code but is still easily doable. If you describe in detail what you have I expect someone here can help you figure out how to clean the data.

      Comment


      • #4
        Hi again,

        Let me be more clearer.

        I have a variable, named "responsedate" in Excel, where the date is written like this: 01-12-2014 (equal 1st December 2014). The whole coloumn is formatted to date-format in excel before importing it to Stata.

        When I import the excel file to Stata, codebook of this responsdate is:

        ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        type: string (str10)

        unique values: 250 missing "": 11793/18002

        examples: ""
        ""
        ""
        "8/15/2014"

        warning: variable has leading blanks
        __________________________________________________ __________________________________________________ ______________


        sorting the variable, and listing them, showed me these different values:

        1) 04-10-2014, which is a dd-mm-yyyy format (the study was not started in april, so it is not april)
        2) 11/13/2014, which is a mm/dd/yyyy format


        So, my question is, how do I format all the values to one same date-format? (so that I can count how many days from an invitation to a response for the participants?)

        I have tried the command:

        gen new_responsedate= date(responsdate, "ddmmyyyy")
        format new_responsedate %td

        But this give me only missing values in the new variable.

        Since all the input values are not in "ddmmyyyy" format, I guess that is why I get missing values? Do I have to convert all the values to ddmmyyyy in Excel before importing to Stata?

        Or what should I do??? Hope you can help me with this.



        Comment


        • #5
          Hi again,

          Let me be more clearer.

          I have a variable, named "responsedate" in Excel, where the date is written like this: 01-12-2014 (equal 1st December 2014). The whole coloumn is formatted to date-format in excel before importing it to Stata.

          When I import the excel file to Stata, codebook of this responsdate is:

          ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

          type: string (str10)

          unique values: 250 missing "": 11793/18002

          examples: ""
          ""
          ""
          "8/15/2014"

          warning: variable has leading blanks
          __________________________________________________ __________________________________________________ ______________


          sorting the variable, and listing them, showed me these different values:

          1) 04-10-2014, which is a dd-mm-yyyy format (the study was not started in april, so it is not april)
          2) 11/13/2014, which is a mm/dd/yyyy format


          So, my question is, how do I format all the values to one same date-format? (so that I can count how many days from an invitation to a response for the participants?)

          I have tried the command:

          gen new_responsedate= date(responsdate, "ddmmyyyy")
          format new_responsedate %td

          But this give me only missing values in the new variable.

          Since all the input values are not in "ddmmyyyy" format, I guess that is why I get missing values? Do I have to convert all the values to ddmmyyyy in Excel before importing to Stata?

          Or what should I do??? Hope you can help me with this.

          Comment


          • #6
            There are several problems here.

            First, you cannot use the "ddmmyyyy" mask to extract a date value from a string date; it is (since Stata 10):
            Code:
            gen new_date= date(old_date, "DMY")
            Second, when you state that
            after running this command, only some of the output looks like what I want (01July2014), but some of the variables are still listed as: 01-07-2014, and some are listed with a mm-dd-yyyy format, seperated with "/": 01/07/2014
            I assume that it means that different variables are displayed differently, not that one variable is displayed differently in different observations - which is possible in Excel, but not in Stata. From your description I get the impression (but I am not sure) that you issue a single generate command, but expect it to generate several new variables.

            Use the describe command to see the variable types. Next. use the date() function to generate Stata date variables from the string variables.

            Be aware that a Stata date value (the number of days since 01jan1960) is independent of the display format, so if two date values are displayed with different formats, it does not affect the possibility to calculate a time interval. If one of them is string, it is another story.

            Extra warning: I have experienced Excel to behave strangely with respect to dates, leading to quite a different date after translation to Stata. Take a look at the Excel formula window (the one displaying the formula generating the content of a cell). Reformatting to another Excel format may (or may not - look carefully!!) solve the problem.

            Comment


            • #7
              Svend gives excellent advice.

              You have more problems than you say as a mix of MDY and DMY dates carries a risk of confusion between e.g. 6/3 meaning 6 March or June 3. But

              Code:
              gen date = cond(strpos(responsedate, "/"), date(responsedate, "MDY"), date(responsedate, "DMY"))
              format date %td
              would expect MDY pattern if it finds a forward slash and DMY otherwise. Look carefully for whether you have more missings than 11793.
              Last edited by Nick Cox; 03 Dec 2014, 09:17.

              Comment


              • #8
                Dear Svend and Nick.

                Yes, what I meant was different variables are displayed differently, and not that one variable is displayed differently in different observations.

                I tried the code Nick suggested, and it helped! Thanks a lot for your help! Now I can continue cleaning my data

                Comment


                • #9
                  Hi all, I encounter a problem slightly different from Janni.
                  I have formatted all dates in Excel into the format "Dates", for ex: 14/03/2012.
                  When I import from Excel to Stata, dates appear in different formats as follows:
                  9/12/2023 for Sept 12 2023
                  23279 for Sep 26 2023
                  In Stata, they are all converted to "string" because I need them in same data type for appending.
                  I want them to have similar format as expected in Excel as 12/9/2023. I have tried different ways, but I could not make it.
                  I would appreciate any advice on my case.

                  Thanks in advance.

                  Comment

                  Working...
                  X