Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • convert date and time to string

    Hello,

    in my dataset I have got a date/time variable (e.g. "08jul2009 08:10:00"). Unfortunately during data entry a lot of date information was accidentally entered incorrectly. Therefore I have to extract the time information and use it in conjunction with another date variable that has the accurate date information stored. In order to go about this task I wanted to use tostring to convert my date/time variable to string format and back convert the time information to a numeric variable in the correct format later on. Unfortunately this does not work due to the elapsed time format date/ime information is stored in in Stata. Is there an alternative way of how to separate time information from combined date/time variables in Stata. Thank you a lot for your hints in advance.

    Kind regards,

    Michael

  • #2
    So let's call your problematic datetime variable datetime1. Then you have some other date variable, call it date, that you want to combine with just the time components of datetime, to create a new variable, call it datetime_corrected.

    Code:
    gen double time_only = datetime1 - cofd(dofc(datetime1))
    gen double datetime_corrected = cofd(date) + time_only
    See -help cofd()- and -help dofc()-. These are extremely useful functions and worth familiarizing yourself with.

    Here's another way that some might consider simpler (though I view it as more complicated):

    Code:
    gen double datetime_corrected = dhms(date, hh(datetime1), mm(datetime1), ss(datetime1))

    Comment


    • #3
      Thank you a lot for your hints. I am still struggling with wraping my head around how Stata deals with dates and times.

      I guess I could use the variable time_only above (e.g. assuming that it is life expectancy at birth) as the dependent variable in a linear regression model (e.g. regress time_only gender). Are the slope and the intercept paramters given in milliseconds? Is there a straight forward way to change the units of the parameters to something more meaningful (e.g. years or days)?

      While adding a constant (e.g. 5000 milliseconds) to all inidivudal values of the variable time_only should not change the slope parameter of the regression model above it would change the intercept if I am not mistaken. Is there an easy way to change transform time_only in such a way that the intercept could be interpreted as the mean life expectancy of the baseline group of gender (e.g. female)?

      Thanks a lot for your hints in advance.

      Best,

      Michael

      Comment


      • #4
        The variable time_only is the time of day represented in milliseconds, so it ranges from 0 to 86,400,000.If you regress it as a dependent variable against a dimensionless variable such as gender, then the coefficient will also be denominated in milliseconds. If you want to express it in days, just divide it by 86,400,000. If you want to express it in years, divide it by 86,400,000*365.25. But it might be more sensible to create a new variable: -gen time_only_in_days = time_only/86400000- and then just regress that against gender. I think re-scaling the variables to units that are more appropriate to whatever your studying makes more sense than using strange variables and then correcting the coefficients. Also, the numerical algorithms that estimate regression coefficients work better when the variables are all of the same general order of magnitude. A gender variable is 0/1, and the time_only variable is of order 107. So that's another argument for creating a time_only_in_days variable.

        If you add a constant to all individual values of the variable time only, as you note, the slope does not change. The intercept will increase by the amount of the same constant. You can pick that constant to be anything you want. But I don't understand using mean life expectancy of females (which is 80+years) for this purpose when you are trying to estimate a time outcome that varies between 0 and 1 days (because time_only cuts off the date, so it never gets beyond 11 hours 59 minutes 59 seconds 999 milliseconds.) Again, keeping things sort of proportionate to each other seems a better way to go from both the comprehensibility and computational stability perspective.

        Comment

        Working...
        X