Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • STATA automatically changing my numeric values to dates

    I have a variable collected with numeric variables rounded to 2 decimal places such as 7.45, 8.40, 10.30, 5.15 etc. When I upload this to Stata, the column is automatically translated into dates rather than the numeric values. I want to keep the values the same. I don't even know where to start with a code to reverse this process.

    May

  • #2
    How did you import your data? I've never seen this happen.

    Comment


    • #3
      Where are these data coming from? Are you importing them from a spreadsheet? If so, it may be that in the spreadsheet, they are formatted as date variables. Open the spreadsheet and highlight the variables involved and use Excel's Format --> Cells to see. If Format --> Cells shows that these are Dates, just change that to General and that should fix the problem.

      If this is not the issue, use the -dataex- command to show the Stata data set you are getting and perhaps a way to fix it within Stata can be found. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Note: It is particularly important in this case that you show the example using -dataex-. Only -dataex- can provide the metadata without which your problem cannot even be approached. Don't waste your time with a screenshot, or -list- output or anything else. This is one case where nothing but -dataex- will suffice.

      Comment


      • #4
        Hi Nick and Clyde,

        This has never happened to me either. It is very odd. Below is what the excel document looks like. I have added what the dataex looks like. It does show as a time variable, not a date (my mistake).
        Click image for larger version

Name:	Screen Shot 2022-06-09 at 10.00.46 AM.png
Views:	1
Size:	85.6 KB
ID:	1668513


        ----------------------- copy starting from the next line -----------------------
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int(Day ID) double Interval
          1 100      -1.8934176e+12
          2 100 -1893430200000.0002
          3 100      -1893429420000
          4 100      -1893435540000
          5 100      -1.8934272e+12
          6 100      -1.8934281e+12
          7 100      -1.8934326e+12
          8 100      -1.8934299e+12
          9 100      -1.8934236e+12
         10 100      -1.8934236e+12
         11 100      -1.8934236e+12
         12 100      -1.8934209e+12
         13 100      -1.8934254e+12
         14 100      -1.8934341e+12
         15 100      -1.8934182e+12
         16 100      -1.8934272e+12
         17 100      -1.8934263e+12
         18 100      -1.8934254e+12
         19 100      -1.8934176e+12
         20 100      -1.8934164e+12
         21 100      -1.8934233e+12
         22 100      -1.8934206e+12
         23 100      -1.8934203e+12
         24 100 -1893432900000.0002
         25 100      -1.8934281e+12
         26 100       -1.893429e+12
         27 100      -1.8934281e+12
         28 100      -1.8934317e+12
         29 100      -1.8934272e+12
         30 100      -1.8934371e+12
         31 100      -1.8934344e+12
         32 100      -1.8934344e+12
         33 100      -1.8934425e+12
         34 100      -1.8934299e+12
         35 100      -1.8934344e+12
         36 100      -1.8934335e+12
         37 100      -1.8934308e+12
         38 100      -1.8934353e+12
         39 100      -1.8934317e+12
         40 100      -1.8934245e+12
         41 100      -1.8934326e+12
         42 100      -1.8934236e+12
         43 100      -1.8934335e+12
         44 100      -1.8934344e+12
         45 100       -1.893429e+12
         46 100      -1.8934362e+12
         47 100      -1.8934416e+12
         48 100      -1.8934335e+12
         49 100      -1.8934308e+12
         50 100      -1.8934371e+12
         51 100      -1.8934362e+12
         52 100      -1.8934362e+12
         53 100      -1.8934326e+12
         54 100      -1.8934326e+12
         55 100      -1.8934326e+12
         56 100      -1.8934272e+12
         57 100      -1.8934308e+12
         58 100      -1.8934299e+12
         59 100      -1.8934308e+12
         60 100      -1.8934371e+12
         61 100       -1.893435e+12
         62 100      -1.8934389e+12
         63 100      -1.8934335e+12
         64 100      -1.8934281e+12
         65 100      -1.8934461e+12
         66 100      -1.8934371e+12
         67 100      -1.8934254e+12
         68 100      -1.8934371e+12
         69 100 -1893433199999.9998
         70 100      -1.8934326e+12
         71 100      -1.8934296e+12
         72 100      -1.8934227e+12
         73 100      -1.8934299e+12
         74 100      -1.8934443e+12
         75 100      -1.8934317e+12
         76 100      -1.8934497e+12
         77 100      -1.8934236e+12
         78 100       -1.893429e+12
         79 100      -1.8934272e+12
         80 100      -1.8934344e+12
         81 100      -1.8934236e+12
         82 100       -1.893447e+12
         83 100       -1.893435e+12
         84 100      -1.8934443e+12
         85 100      -1.8934344e+12
         86 100      -1.8934281e+12
         87 100       -1.893429e+12
         88 100      -1.8934281e+12
         89 100      -1.8934317e+12
         90 100      -1.8934272e+12
         91 100      -1.8934371e+12
         92 100      -1.8934344e+12
         93 100      -1.8934344e+12
         94 100      -1.8934425e+12
         95 100      -1.8934299e+12
         96 100      -1.8934344e+12
         97 100      -1.8934335e+12
         98 100      -1.8934308e+12
         99 100      -1.8934353e+12
        100 100      -1.8934317e+12
        end
        format %tcHH:MM:SS Interval
        ------------------ copy up to and including the previous line ------------------


        Screen Shot 2022-06-09 at 10.01.58 AM.png



        It looks very odd and cannot be used for analysis.

        Comment


        • #5
          This gets you close -- but not quite there. The best solution is to undo the date-time formatting in MS Excel.


          Code:
          gen wanted1 = 100 * hhC(Interval) + mmC(Interval) 
          gen wanted2 = wanted1 / 100 
          format wanted2 %3.2f 
          
          list Interval wanted*

          Comment


          • #6
            I agree with what Nick says in #5. But I can't help wondering, given that the name of the variable is Interval, whether perhaps the original Excel (and, thereafter Stata) actually have it right. That is, it is quite odd to have an ordinary (i.e. not a time) numeric variable where the decimal part never equals or exceeds .60. Assuming the decimal part of the correct variable is uniformly distributed, the probability that all 31 of the values you show in your screenshot would be in this restricted range is about 1.3 x 10-7.

            Perhaps this variable is really meant to be a time in units of hours, minutes (and seconds), and converting, say, 05:15:00 to 5.15 would be inappropriate.

            Comment


            • #7
              Conversely, if it was something else -- say time elapsed since an origin -- and converted in MS Excel to a time -- meaning clock time -- then you'd never see anything outside hh:00 to hh:59 either.

              Other way round to May Blake: it seems that you have to tell us what the variable really is, or really should be, which should be documented somewhere.

              Comment


              • #8
                Hi Nick and Clyde,

                The original variable is calculated as number of hours between two dates. So for example, what is the number of hours between 12: 00 PM and 10:40 PM = 10.40 hours. In the MS excel, when I try to change the hours to a general number, it converts it totally different to two decimal places such as 0.44. This makes no sense at all.

                Nick's suggestion gets me exactly what I want. Thank you both very much.

                Comment


                • #9
                  You may want to round first to the nearest millisecond: This gets better results than what i was trying before -- in terms of reproducing your screen shot (apart from dividing by 100).


                  Code:
                  . replace Interval = round(Interval)
                  (3 real changes made)
                  
                  .
                  . gen wanted = 100 * hh(Interval) + mm(Interval)
                  
                  .
                  . list if mod(wanted, 5)
                  
                       +---------------------------------+
                       | Day    ID     Interval   wanted |
                       |---------------------------------|
                    3. |   3   100   -1.893e+12      723 |
                    4. |   4   100   -1.893e+12      541 |
                       +---------------------------------+
                  But -- if your results should be hours and minutes -- then either work in minutes (my recommendation) or in hours. You can't take clock format as a decimal.

                  (Sorry; I count as a long-term occasional but barely competent user of MS Excel and can't advise on what went wrong there.)
                  Last edited by Nick Cox; 10 Jun 2022, 08:23.

                  Comment

                  Working...
                  X