Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating dummy variables using dates

    Hello,

    I been trying to create a dummy variable using the Dateofcrash variabel, where 1 is assigned to the dates after 9/4/2019 00:00, and 0 before that. I used the codes below but kept running into errors"
    gen Dateofcrash_stata = clock(Dateofcrash, "MDY HM")
    gen MandateDate = mdy(1, 15, 2019)
    gen MandateIndicator = (Dateofcrash_stata >= MandateDate) & (Dateofcrash_stata != .)

    this is my first time working with dates like and would need help with this



    Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 Year str8 Month str11 Time str17 Dateofcrash
    "2017" "January" "12:03:00 PM" " 7/10/2018 00:00"
    "2017" "february" "10:05:00 AM" " 7/16/2017 00:00"
    "2017" "march" "10:06:00 PM" "11/10/2019 00:00"
    "2018" "april" "10:13:00 PM" " 8/25/2021 00:00"
    "2018" "December" "10:17:00 PM" " 7/4/2021 00:00"
    "2018" "November" "10:24:00 AM" " 9/16/2020 00:00"
    "2018" "January" "10:25:00 AM" "12/17/2021 00:00"
    "2019" "february" "10:41:00 AM" " 9/8/2021 00:00"
    "2019" "march" "10:44:00 AM" " 5/15/2020 00:00"
    "2019" "april" "10:45:00 AM" " 5/26/2021 00:00"
    "2019" "December" "10:46:59 PM" " 7/31/2021 00:00"
    "2020" "November" "10:52:00 PM" " 7/13/2017 00:00"
    "2020" "January" "10:54:00 AM" " 8/10/2021 00:00"
    "2020" "february" "11:00:00 PM" " 9/3/2021 00:00"
    "2017" "march" "11:02:00 PM" " 5/8/2020 00:00"
    "2017" "april" "11:10:00 AM" " 4/14/2021 00:00"
    "2017" "December" "11:10:59 AM" " 6/27/2018 00:00"
    "2017" "November" "11:15:00 AM" " 7/10/2021 00:00"
    "2018" "January" "11:17:00 AM" " 7/11/2021 00:00"
    "2018" "february" "11:22:00 PM" " 9/4/2019 00:00"
    "2018" "march" "11:23:00 AM" "11/26/2021 00:00"
    "2018" "april" "11:23:00 PM" "10/31/2019 00:00"
    "2019" "December" "11:26:00 PM" " 8/3/2021 00:00"
    "2019" "November" "11:29:00 AM" " 5/13/2021 00:00"
    "2019" "January" "11:30:00 AM" " 8/9/2021 00:00"
    "2019" "february" "11:33:00 AM" " 5/30/2021 00:00"
    "2020" "march" "11:35:00 PM" " 5/29/2021 00:00"

  • #2
    There is no road to understanding date-times in Stata that doesn't include a reading of help datetime. You won't need, and you won't grasp, all of it of once, but you need to keep revisiting.

    Various errors here.

    First off, always use a double to hold a numeric date-time variable. We can keep that one at bay because surely we can ignore 00:00 as uninformative.

    Second, you're subtracting a daily date in days from a datetime in milliseconds, and that's an uneven fight.

    mdy(1, 15, 2019) is 21564 (days from the beginning of 1960) whereas datetimes in your example are in trillions (of milliseconds since the start of 1960). The units of time matter. and Stata won't try to work out what you mean.

    It seems to me that time of day in your data example is immaterial. So, I suggest just subtracting the constant mdy(1, 15, 2019) from a daily date variable.

    Your needed (0, 1) indicator is then dailydateofcrash > mdy(1,15,2019) (with as in your code a catch for missing values).

    Code:
    clear
    input str4 Year str8 Month str11 Time str17 Dateofcrash
    "2017" "January" "12:03:00 PM" " 7/10/2018 00:00"
    "2017" "february" "10:05:00 AM" " 7/16/2017 00:00"
    "2017" "march" "10:06:00 PM" "11/10/2019 00:00"
    "2018" "april" "10:13:00 PM" " 8/25/2021 00:00"
    "2018" "December" "10:17:00 PM" " 7/4/2021 00:00"
    "2018" "November" "10:24:00 AM" " 9/16/2020 00:00"
    "2018" "January" "10:25:00 AM" "12/17/2021 00:00"
    "2019" "february" "10:41:00 AM" " 9/8/2021 00:00"
    "2019" "march" "10:44:00 AM" " 5/15/2020 00:00"
    "2019" "april" "10:45:00 AM" " 5/26/2021 00:00"
    "2019" "December" "10:46:59 PM" " 7/31/2021 00:00"
    "2020" "November" "10:52:00 PM" " 7/13/2017 00:00"
    "2020" "January" "10:54:00 AM" " 8/10/2021 00:00"
    "2020" "february" "11:00:00 PM" " 9/3/2021 00:00"
    "2017" "march" "11:02:00 PM" " 5/8/2020 00:00"
    "2017" "april" "11:10:00 AM" " 4/14/2021 00:00"
    "2017" "December" "11:10:59 AM" " 6/27/2018 00:00"
    "2017" "November" "11:15:00 AM" " 7/10/2021 00:00"
    "2018" "January" "11:17:00 AM" " 7/11/2021 00:00"
    "2018" "february" "11:22:00 PM" " 9/4/2019 00:00"
    "2018" "march" "11:23:00 AM" "11/26/2021 00:00"
    "2018" "april" "11:23:00 PM" "10/31/2019 00:00"
    "2019" "December" "11:26:00 PM" " 8/3/2021 00:00"
    "2019" "November" "11:29:00 AM" " 5/13/2021 00:00"
    "2019" "January" "11:30:00 AM" " 8/9/2021 00:00"
    "2019" "february" "11:33:00 AM" " 5/30/2021 00:00"
    "2020" "march" "11:35:00 PM" " 5/29/2021 00:00"
    end
    
    gen dailydateofcrash = daily(word(Dateofcrash, 1), "MDY")
    format dailydateofcrash %td
    gen wanted = dailydateofcrash - mdy(1, 15, 2019)
    sort dailydateofcrash
    
    list dailydateofcrash wanted
    
         | dailyda~h   wanted |
         |--------------------|
      1. | 13jul2017     -551 |
      2. | 16jul2017     -548 |
      3. | 27jun2018     -202 |
      4. | 10jul2018     -189 |
      5. | 04sep2019      232 |
         |--------------------|
      6. | 31oct2019      289 |
      7. | 10nov2019      299 |
      8. | 08may2020      479 |
      9. | 15may2020      486 |
     10. | 16sep2020      610 |
         |--------------------|
     11. | 14apr2021      820 |
     12. | 13may2021      849 |
     13. | 26may2021      862 |
     14. | 29may2021      865 |
     15. | 30may2021      866 |
         |--------------------|
     16. | 04jul2021      901 |
     17. | 10jul2021      907 |
     18. | 11jul2021      908 |
     19. | 31jul2021      928 |
     20. | 03aug2021      931 |
         |--------------------|
     21. | 09aug2021      937 |
     22. | 10aug2021      938 |
     23. | 25aug2021      953 |
     24. | 03sep2021      962 |
     25. | 08sep2021      967 |
         |--------------------|
     26. | 26nov2021     1046 |
     27. | 17dec2021     1067 |
         +--------------------+
    What your first three variables have to do with the question you're asking isn't clear to me.
    Last edited by Nick Cox; 04 Sep 2023, 14:08.

    Comment


    • #3
      this is my first time working with dates like and would need help with this
      Well, in a sense this is not really true as you do not have dates in your data. Your data set has string variables that represent years, months, and clocks, but no dates, in Stata terms. And the paradoxical results you are getting are the result of trying to compare a Stata date Mandate_date to a Stata clock variable, Dateofcrash_stata. This never works correctly. Stata offers many types of date and time variables: clocks, daily dates, half-yearly dates, quarterly dates, monthly dates, and weekly dates--the last of these are seldom used. Only two variables having the same type of date can be directly compared with correct results. If you need to make a comparison between two different types, you must apply some function(s) to transform them to a common type.

      This type of mistake happens often, and the use of date as part of a variable name when the variable is actually something other than a daily date is a setup for making it. So here's how I would approach your data. First, I would get rid of all the string variables that represent dates or times and replace them with suitable Stata internal format substitutes. Next, at least for the purposes you describe, there is no need to create a variable for MandateDate--you can just use a daily date constant without creating a "variable" that doesn't vary. Then I would do the comparison:
      Code:
      gen double crash_dttm = clock(Dateofcrash, "MDYhm")
      assert missing(crash_dttm) == missing(Dateofcrash)
      format crash_dttm %tc
      drop Dateofcrash
      
      gen mdate = monthly(Month+Year, "MY")
      assert missing(mdate) == missing(Month, Year)
      format mdate %tm
      drop Month Year
      
      gen double time = clock(Time, "hms")
      assert missing(time) == missing(Time)
      format time %tcHh:MM:SS_AM
      drop Time
      
      gen mandate_indicator = dofc(crash_dttm) > mdy(1, 15, 2019) ///
          & !missing(crash_dttm)
      Note that I use _dttm to name the clock variable crash_dttm. That way when I'm working with it, I don't need to remember, or check, whether it's a date or a clock: the name tells me it's date and time, hence clock. Similarly, the monthly date gets a name mdate, the m signalling that it is monthly. The name time speaks for itself.

      In the final command that creates mandate_indicator, note espeically the use of the -dofc()- function and the -mdy()- function. -dofc()- calculates the date part of a clock variable. -mdy()- calculates the Stata internal format date value of the date whose month, day, and year are the three arguments specified.

      I suggest that you invest some time reading the sections of the PDF documentation that came with your Stata installation about datetime variables, datetime functions, and datetime display formats. If you will be working with this kind of data on a regular basis, this is indispensable knowledge. It is a lot to digest. And you won't remember it all after just one reading. In fact, you probably will never remember it all. But you will know and understand the logic behind Stata's datetime system, and you will quickly learn to use the parts that come up most in your workflow without having to go back to reference the documentation. There will probably always remain parts of that system that you seldom use, and hence will need to refer to the help files or documentation for. But overall, the investment of your time in going through this will be amply repaid in short order.

      Added: Crossed with #2.

      Comment


      • #4
        Hi Nick Cox ,

        Thank you very much. The other variables are part of the data that I am working with, that's why i added them. This is my first time working with data of this nature, I really appreciate the help.

        Comment

        Working...
        X