Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Date formatting

    Hello Statalist,

    I am working with dates and I am encountering a problem I have not encountered before. I have survey data, imported into Stata from an .xlsx file. See example below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double start
    22604.393076203716
    22604.414221446757
    22604.437764884256
    22604.459414791672
    22604.472943287037
    22604.490354120368
     22604.50474366898
     22604.51664253472
     22604.53772903935
    22604.548425034718
    22604.560529351853
    22604.572168333332
    22604.594228993054
     22604.60194003472
     22604.61860224537
    22604.411773032407
    22604.430122534726
     22604.44816359954
    22604.459407881943
    22604.470708449073
    end
    format %td start
    If we check this variable we see it is stored as double and displayed as %td:
    Code:
    desc start
                  storage   display    value
    variable name   type    format     label      variable label
    ---------------------------------------------------------------------------
    start           double  %td                   start
    However, when we tabulate, it shows multiple values of the same date:
    Code:
    tab start
    
          start |      Freq.     Percent        Cum.
    ------------+-----------------------------------
      20nov2021 |          1        5.00        5.00
      20nov2021 |          1        5.00       10.00
      20nov2021 |          1        5.00       15.00
      20nov2021 |          1        5.00       20.00
      20nov2021 |          1        5.00       25.00
      20nov2021 |          1        5.00       30.00
      20nov2021 |          1        5.00       35.00
      20nov2021 |          1        5.00       40.00
      20nov2021 |          1        5.00       45.00
      20nov2021 |          1        5.00       50.00
      20nov2021 |          1        5.00       55.00
      20nov2021 |          1        5.00       60.00
      20nov2021 |          1        5.00       65.00
      20nov2021 |          1        5.00       70.00
      20nov2021 |          1        5.00       75.00
      20nov2021 |          1        5.00       80.00
      20nov2021 |          1        5.00       85.00
      20nov2021 |          1        5.00       90.00
      20nov2021 |          1        5.00       95.00
      20nov2021 |          1        5.00      100.00
    ------------+-----------------------------------
          Total |         20      100.00
    If I change the format to %tc, it changes the displayed date to 1960:

    Code:
    format start %tc
    
    desc start
    
                  storage   display    value
    variable name   type    format     label      variable label
    ---------------------------------------------------------------------------
    start           double  %tc                   start
    
    list start in 1/20
    
         +--------------------+
         |              start |
         |--------------------|
      1. | 01jan1960 00:00:22 |
      2. | 01jan1960 00:00:22 |
      3. | 01jan1960 00:00:22 |
      4. | 01jan1960 00:00:22 |
      5. | 01jan1960 00:00:22 |
         |--------------------|
      6. | 01jan1960 00:00:22 |
      7. | 01jan1960 00:00:22 |
      8. | 01jan1960 00:00:22 |
      9. | 01jan1960 00:00:22 |
     10. | 01jan1960 00:00:22 |
         |--------------------|
     11. | 01jan1960 00:00:22 |
     12. | 01jan1960 00:00:22 |
     13. | 01jan1960 00:00:22 |
     14. | 01jan1960 00:00:22 |
     15. | 01jan1960 00:00:22 |
         |--------------------|
     16. | 01jan1960 00:00:22 |
     17. | 01jan1960 00:00:22 |
     18. | 01jan1960 00:00:22 |
     19. | 01jan1960 00:00:22 |
     20. | 01jan1960 00:00:22 |
    I would like to know how can I change the format and keep the correct date td(20nov2021).

    Thanks in advance for your time.

  • #2
    I'm not sure I understand what you want to do. But if what you are hoping for is a numeric variable that, when formatted %tc reads like 20nov2021 00:00:22, it cannot be done simply by changing the display format. You must calculate a new datetime variable:

    Code:
    gen double wanted = cofd(td(20nov2021)) + start
    format wanted %tc

    Comment


    • #3
      Thanks for your reply Clyde.

      I want my variable start from the .xlsx file (same as the example from dataex in #1) to be in %tc format. But after:

      Code:
      import excel using ...
      format start %tc
      The date shown is 01jan1960 00:00:22, instead of the date shown when the format is %td (20nov2021)

      Comment


      • #4
        I don't understand your response in #3. Of course the date shown when you format the variable %tc is different from the date shown when you format the variable %td. And if you were to reformat it to, say, %tw you would get something completely different yet again. The numerical encoding that Stata uses for date and clock (and month and week) variables are completely different because they are denominated in different units. So the date you will see will depend on the display format you choose, because when you select a display format you are implicitly telling Stata to interpret the actual numeric value as number of milliseconds since midnight on 1Jan1960 (%tc) vs number of days since 1Jan1960 (%td).

        What does the variable start look like when it is in Excel? If you highlight one of the cells containing a value of start in the Excel file and apply Format Cells, what does that show you? What do you want it to look like after you import it to Stata?
        Last edited by Clyde Schechter; 04 Dec 2021, 12:43.

        Comment


        • #5
          The variable in Excel is in this format: "11/20/2021 9:26:02 AM".

          Importing into Stata using
          Code:
          import excel using filepath.xlsx, replace firstrow
          formats the start variable as %td:
          Code:
          desc start
                        storage   display    value
          variable name   type    format     label      variable label
          ---------------------------------------------------------------------------
          start           double  %td                   start
          
          list start in 1/5
               +-----------+
               |     start |
               |-----------|
            1. | 20nov2021 |
            2. | 20nov2021 |
            3. | 20nov2021 |
            4. | 20nov2021 |
            5. | 20nov2021 |
               +-----------+
          I want the display format of start to be %tc, instead of %td, so I can see the time (hh:mm:ss), not only the date.

          Comment


          • #6
            I believe the problem arises in the interface between Excel and Stata in the -import excel- command. I think that Excel, notwithstanding the appearance in the cell, is telling Stata that this is just a date variable, and Stata is importing it accordingly. If I am right, you have to fix this in Excel. Highlight the column where this variable is found in Excel, and open the Format menu, then select Format Cells. Go to the Number tab. Under Category, select Custom, and under Type select m/d/yyyy h:mm. Save the spreadsheet and re-import to Stata. I believe it will come in this time as a clock variable, formatted double, and showing the both the date and time.

            Comment


            • #7
              I cannot reproduce your problem, using a workbook created in Excel 16.55 for Mac and Stata 17.0 for Mac.

              Here is my workbook, with the start variable appearing as shown, followed by the import into Stata.
              Click image for larger version

Name:	datetime.png
Views:	1
Size:	346.2 KB
ID:	1639445


              Code:
              . import excel datetime.xlsx, firstrow
              (1 var, 1 obs)
              
              . describe
              
              Contains data
               Observations:             1                  
                  Variables:             1                  
              ------------------------------------------------------------------------------------------------
              Variable      Storage   Display    Value
                  name         type    format    label      Variable label
              ------------------------------------------------------------------------------------------------
              start           double  %tc                   start
              ------------------------------------------------------------------------------------------------
              Sorted by:
                   Note: Dataset has changed since last saved.
              
              . list, noobs clean
              
                               start  
                  20nov2021 09:26:02  
              
              .
              Last edited by William Lisowski; 04 Dec 2021, 14:56.

              Comment


              • #8
                Thanks Clyde and William for your responses.

                As you have rightly pointed out, formating the variable in Excel as "yyyy-mm-dd hh:mm" and importing into Stata solves the problem, importing the variable in the correct time format:
                Code:
                import          excel using "date.xlsx", clear firstrow
                * start variable is formated as "yyyy-mm-dd hh:mm" in the .xlsx file
                
                describe        start
                
                              storage   display    value
                variable name   type    format     label      variable label
                --------------------------------------------------------------------------------
                start           double  %tc                   start
                
                . list start in 1/4, noobs clean
                
                                 start  
                    20nov2021 09:26:01  
                    20nov2021 09:56:28  
                    20nov2021 10:30:22  
                    20nov2021 11:01:33

                Comment

                Working...
                X