Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue importing time

    Hi,

    I've started using Stata17 and never experienced the following problem with past version:
    I am importing an excel sheet which contains a column with dates (in a "Date" format in excel) in the following form:
    "2020-10-14 11:11:10 AM"
    "2020-10-14 4:15:30 PM"
    ...

    Only the date (and not the time) is displayed in excel. When importing the file, Stata automatically imports the column in a %td format, though when tabulating the data, the table contains a different row per observation, as if it imported the data in %tC format. Moreover, when converting the column to %tC format (or %tc format), I get the following column instead:
    "01jan1960 00:00:22"
    "01jan1960 00:00:22"
    ...
    Again, I get a different row per observation when tabulating.

    In the past, date variables would be imported in strings - it would then be easy to convert them into %td and %tC variables. Unsure if it's do to the Stata version or if there's an issue with my Excel file. I was able to import the same Excel sheet without an issue with Stata15.

    Thanks

  • #2
    Well, the number 22000 is how Stata internally represents the date 26 Mar 2020 and also how it internally represents the date-time (clock) 01jan21960 00:00:22. So I would expect that dates in the year 2020 when reformatted as %tc would look roughly like the datetimes you show. No mystery there.

    More of a mystery is why the output from tabulating looks different from the date variable you are importing. I think to troubleshoot that it will be necessary to see example data that demonstrates the problem (use the -dataex- command for this; anything else is just a waste of everyone's time). And also show the exact -tab- command that gave the unexpected output, and show the output itself.

    If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    Comment


    • #3
      Originally posted by Nicolas Orgeira View Post
      I am importing an excel sheet which contains a column with dates (in a "Date" format in excel) . . . Only the date (and not the time) is displayed in excel. When importing the file, Stata automatically imports the column in a %td format, though when tabulating the data, the table contains a different row per observation, as if it imported the data in %tC format. Moreover, when converting the column to %tC format (or %tc format), I get the following
      I can confirm this behavior in Stata Release 17.0 updated to two days ago. (I no longer have Release 15 and so cannot confirm that version's behavior.)

      It is unexpected and illogical behavior and so I'd call it a bug; it seems to result from Stata's naïvely believing what Microsoft is telling it. If the identical data are placed in an adjacent worksheet where the cells are formatted with a custom format that displays the time as well, then the date-time values are imported correctly. The bug occurs with Excel 2010 and whatever version Excel is with Office 365. Importing the data via ODBC circumvents the problem.

      Files are attached that reproduce (.do and .xlsx) and report (.smcl) the problem.

      I recommend contacting Stata Technical Support.
      Attached Files

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        More of a mystery is why the output from tabulating looks different from the date variable you are importing.
        Yeah, that was kind of puzzling to me, too, and so I went back to explore things further. The answer is that the time information is actually hiding there in the imported date values, which in retrospect makes sense. See below. (It uses the same Excel 2010 workbook attached above yesterday.)

        When a value is formatted in the Excel workbook as a date, Stata correctly imports that value as a date (days from 1960-01-01) and formats it as such. Nevertheless, if there is time information in the value stored in the worksheet cell, then Stata does retain that information, too, during importation. It's just not visible in any %td format. It's trivial to get at that time information by converting the date value to a date-time value using the cofd() function.

        .ÿ
        .ÿversionÿ17.0

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿlocalÿline_sizeÿ`c(linesize)'

        .ÿsetÿlinesizeÿ80

        .ÿ
        .ÿ*
        .ÿ*ÿNaiveÿimportationÿofÿdataÿas-formattedÿinÿExcelÿworksheet
        .ÿ*
        .ÿimportÿexcelÿDateProbe2010.xlsx,ÿsheet(TestDates)ÿfirstrow
        (1ÿvar,ÿ2ÿobs)

        .ÿ
        .ÿ//ÿRecognizedÿcorrectlyÿasÿaÿdate
        .ÿdescribeÿ*_dt

        VariableÿÿÿÿÿÿStorageÿÿÿDisplayÿÿÿÿValue
        ÿÿÿÿnameÿÿÿÿÿÿÿÿÿtypeÿÿÿÿformatÿÿÿÿlabelÿÿÿÿÿÿVariableÿlabel
        --------------------------------------------------------------------------------
        test_dtÿÿÿÿÿÿÿÿÿdoubleÿÿ%td..ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtest_dt

        .ÿformatÿ*_dtÿ%tdCY-N-D

        .ÿlist,ÿnoobs

        ÿÿ+------------+
        ÿÿ|ÿÿÿÿtest_dtÿ|
        ÿÿ|------------|
        ÿÿ|ÿ2020-10-14ÿ|
        ÿÿ|ÿ2020-10-14ÿ|
        ÿÿ+------------+

        .ÿ
        .ÿ//ÿNevertheless,ÿdoesÿcontainÿtimeÿinformation
        .ÿtabulateÿtest_dt

        ÿÿÿÿtest_dtÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
        ------------+-----------------------------------
        ÿÿ14oct2020ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿ50.00ÿÿÿÿÿÿÿ50.00
        ÿÿ14oct2020ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿ50.00ÿÿÿÿÿÿ100.00
        ------------+-----------------------------------
        ÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿÿÿ100.00

        .ÿformatÿ*_dtÿ%10.0g

        .ÿlist,ÿnoobs

        ÿÿ+-----------+
        ÿÿ|ÿÿÿtest_dtÿ|
        ÿÿ|-----------|
        ÿÿ|ÿ22202.466ÿ|
        ÿÿ|ÿ22202.677ÿ|
        ÿÿ+-----------+

        .ÿ
        .ÿ*
        .ÿ*ÿHere'sÿhowÿtoÿgetÿwhatÿyou'reÿafter
        .ÿ*
        .ÿquietlyÿreplaceÿtest_dtÿ=ÿcofd(test_dt)

        .ÿ//ÿAndÿnowÿyouÿcanÿseeÿwhatÿyou'veÿbeenÿexpectingÿtoÿsee
        .ÿformatÿ*_dtÿ%tcCCYY-NN-DD_HH:MM:SS

        .ÿlist,ÿnoobs

        ÿÿ+---------------------+
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿtest_dtÿ|
        ÿÿ|---------------------|
        ÿÿ|ÿ2020-10-14ÿ11:11:10ÿ|
        ÿÿ|ÿ2020-10-14ÿ16:15:30ÿ|
        ÿÿ+---------------------+

        .ÿ
        .ÿsetÿlinesizeÿ`line_size'

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .


        Stata's behavior is unexpected (that is, a surprise), but not illogical, and so it was inappropriate of me to label it as a bug.

        Comment


        • #5
          Hi Joseph Coveney ,

          Thank you so much for your help and your detailed explanation. Problem solved! Much appreciated

          Comment

          Working...
          X