Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clock to deal with patent issuance

    Hi All

    A basic question that I would appreciate if someone can help with:

    I have data that shows the patent id and the issuance date and the exact time. The patent identifier is "patent" and the variable "issuance" includes both the date and the time of issuance.

    First, how to separate the time (e.g.,19:54:00) from the date (e.g.,15feb2004) in two different variables?
    Second, how to create a dummy=1 when the issuance time is not between 10:30 am to 7.00pm, and 0 otherwise?

    I think I should use the function clock but not sure how it can be done here. Thanks

    The patent data looks like this below:
    Code:
    patent      issuance
    10001     15feb2004 00:00:00
    10001     18may2004 00:00:00
    10001     17may2005 15:13:00
    10001     27sep2005 16:55:00
    10001     14nov2005 19:54:00
    10002     12apr2005 17:18:00
    10002     23jul2005 18:27:00
    10002     12oct2005 18:54:00
    10002     20jan2006 18:58:00
    10002     21apr2006 18:20:00

  • #2
    Is issuance is a string variable or a numeric variable with datetime display format?

    Please show the full output of dataex if you used it to produce your data example, or else the output of
    Code:
    describe issuance

    Comment


    • #3
      Thanks!
      I provide the requested information below.

      Code:
      storage    display    value
      variable    name    type    format    label    variable    label
      issuance        double    %tc
      Code:
      * Example generated    by -dataex-.    To    install:    ssc    install    dataex
      clear
      input double(patent    issuance)
      10001  9.216288e+11
      10001  9.313056e+11
      10001  9.359712e+11
      10001  9.416736e+11
      10001  9.500544e+11
      10001  9.579168e+11
      10001  9.675072e+11
      10001  9.764928e+11
      10001  9.813312e+11
      end
      format %tc issuance

      Comment


      • #4
        Your second example is nothing at all like the first, but this may help. Note that clock() doesn't help; it was presumably already used to create your datetime variable, but the problem you pose is now pulling out components. Starting from the original date-time information you don't show might have been easier.

        Code:
        * Example generated    by -dataex-.    To    install:    ssc    install    dataex
        clear
        input double(patent    issuance)
        10001  9.216288e+11
        10001  9.313056e+11
        10001  9.359712e+11
        10001  9.416736e+11
        10001  9.500544e+11
        10001  9.579168e+11
        10001  9.675072e+11
        10001  9.764928e+11
        10001  9.813312e+11
        end
        format %tc issuance
        
        gen ddate = dofc(issuance)
        format ddate %td 
        
        gen double time = mod(issuance, 24 * 60 * 60000)
        format time %tcHH:MM 
        
        scalar ms_in_hour = 60 * 60000
        gen wanted = !inrange(time, 7.5 * ms_in_hour, 19 * ms_in_hour)
        
        list, sep(0)
        
             +----------------------------------------------------------+
             | patent             issuance       ddate    time   wanted |
             |----------------------------------------------------------|
          1. |  10001   16mar1989 00:00:00   16mar1989   00:00        1 |
          2. |  10001   06jul1989 00:00:00   06jul1989   00:00        1 |
          3. |  10001   29aug1989 00:00:00   29aug1989   00:00        1 |
          4. |  10001   03nov1989 00:00:00   03nov1989   00:00        1 |
          5. |  10001   08feb1990 00:00:00   08feb1990   00:00        1 |
          6. |  10001   10may1990 00:00:00   10may1990   00:00        1 |
          7. |  10001   29aug1990 00:00:00   29aug1990   00:00        1 |
          8. |  10001   11dec1990 00:00:00   11dec1990   00:00        1 |
          9. |  10001   05feb1991 00:00:00   05feb1991   00:00        1 |
             +----------------------------------------------------------+
        A check with your first example shows that the principle is right.


        Code:
        clear 
        input patent  str42 Issuance
        10001     "17may2005 15:13:00"
        10001     "27sep2005 16:55:00"
        end 
        
        gen issuance = clock(Issuance, "DMY hms")
        
        gen double time = mod(issuance, 24 * 60 * 60000)
        format time %tcHH:MM 
        
        scalar ms_in_hour = 60 * 60000
        gen wanted = !inrange(time, 7.5 * ms_in_hour, 19 * ms_in_hour)
        
        list, sep(0)
        
            +---------------------------------------------------------+
             | patent             Issuance   issuance    time   wanted |
             |---------------------------------------------------------|
          1. |  10001   17may2005 15:13:00   1.43e+12   15:13        0 |
          2. |  10001   27sep2005 16:55:00   1.44e+12   16:53        0 |
             +---------------------------------------------------------+
        There is no way to understanding dates and times that doesn't include thorough study of

        Code:
        help datetime 
        In this problem, I found it easier just to focus on the units of measurement being milliseconds, so that 07:30 for example is 7.5 * 60 * 60000 ms from midnight.

        Comment


        • #5
          Consider the following (I am making up an 11 digit number for each observation of issuance because your data example shows the variable in scientific notation and results in an identical time of 00:00 for all observations, if I use it as-is.)

          Code:
          clear
          input double(patent    issuance)
          10001  92162880234
          10001  93130564281
          10001  93597121234
          10001  94167365678
          10001  95005449101
          10001  95791681213
          10001  96750721415
          10001  97649281617
          10001  98133121819
          end
          format %tc issuance
          
          gen issuance_date = dofc(issuance)
          format issuance_date %td
          gen double issuance_time = hms(hh(issuance), mm(issuance), ss(issuance))
          format issuance_time %tcHH:MM:SS
          
          gen byte wanted = cond(!missing(issuance), !inrange(issuance_time, hms(10,30,00), hms(19,00,00)), .)
          which produces:

          Code:
          . list, noobs sep(0) abbrev(20)
          
            +----------------------------------------------------------------------+
            | patent             issuance   issuance_date   issuance_time   wanted |
            |----------------------------------------------------------------------|
            |  10001   02dec1962 16:48:00       02dec1962        16:48:00        0 |
            |  10001   13dec1962 21:36:04       13dec1962        21:36:04        1 |
            |  10001   19dec1962 07:12:01       19dec1962        07:12:01        1 |
            |  10001   25dec1962 21:36:05       25dec1962        21:36:05        1 |
            |  10001   04jan1963 14:24:09       04jan1963        14:24:09        0 |
            |  10001   13jan1963 16:48:01       13jan1963        16:48:01        0 |
            |  10001   24jan1963 19:12:01       24jan1963        19:12:01        1 |
            |  10001   04feb1963 04:48:01       04feb1963        04:48:01        1 |
            |  10001   09feb1963 19:12:01       09feb1963        19:12:01        1 |
            +----------------------------------------------------------------------+
          Edit: cross-posted with #4
          Last edited by Hemanshu Kumar; 20 Jun 2023, 02:12.

          Comment


          • #6
            Thanks all.
            As a follow-up on Nick's post #4:

            As the issuance variable is hh:mm:ss, and all seconds are 00, would it be easier to just drop the seconds in the beginning and code?

            A minor issue, I think you meant
            Code:
             
             gen wanted = !inrange(time, 10.5 * ms_in_hour, 19 * ms_in_hour)
            I corrected 7.5 to 10.5 as per my original post #1

            Last point, if I want to create a dummy for issuance time say before 10.30 am on the same day, I use:
            Code:
            gen ind= 0
            replace ind=1 if time< 09.5 * ms_in_hour
            And if from 7pm and after on the same day, I use:
            Code:
            gen indAfter= 0
            replace indAfter=1 if time>=17.5 * ms_in_hour
            Is there a more professional way to do the last two time slots' indicators?

            Comment


            • #7
              Sorry, I sent my post before I add the main time slot I am struggling with, which is an indicator equal to 1 for the time from 18:30 on day t to 07:00 the next day.

              Comment


              • #8
                I have drafted a detailed answer to #6 and #7 but a local power cut will delay my posting it here.

                Comment


                • #9
                  You're correct that 7.5 should be 10.5 in #4. Sorry about that. I don't know where that came from.

                  If you're using Stata's date-time machinery, the units are milliseconds, and there is no sense in which you can "drop" seconds. If times are specified to the nearest minute, then all date-time values are multiples of 60000, which is not a problem so long as you are using an appropriate storage type. (I set aside leap seconds, not an issue here.)

                  You can step outside that machinery and work in units of minutes by all means, which just implies integer calculations. but that is a different story.

                  See e.g. https://www.stata.com/support/faqs/d...rue-and-false/ https://www.stata-journal.com/articl...article=dm0087 https://journals.sagepub.com/doi/pdf...36867X19830921 on how to get indicator (so-called dummy) variables in one step.

                  I will mix a modification of Hemanshu Kumar's example and my code for convenience, a polite word here for my laziness. Any time after 18:30 is before 07:00 the next day, so here is some technique.

                  Code:
                  clear
                  input double(patent    issuance)
                  10001  92162880234
                  10001  93130564280 
                  10001  93597121234
                  10001  94167365678
                  10001  95005449101
                  10001  95791681213
                  10001  96750721415
                  10001  97649281617
                  10001  98133121819
                  end
                  
                  replace issuance = round(issuance, 60000)
                  
                  format %tc issuance
                  
                  gen issuance_date = dofc(issuance)
                  
                  gen double mytime_ms = mod(issuance, 24 * 60 * 60000)
                  format mytime_ms %tcHH:MM 
                  
                  gen mytime_min = mytime_ms / 60000 
                  
                  gen late_in_day = mytime_min >= 18.5 * 60 
                  gen early_in_day = mytime_min <= 7 * 60 
                  gen early_or_late = early_in_day | late_in_day
                  
                  list, sep(0)
                  
                      +-----------------------------------------------------------------------------------------------+
                       | patent             issuance   issua~te   mytime~s   mytime~n   late_i~y   early_~y   early_~e |
                       |-----------------------------------------------------------------------------------------------|
                    1. |  10001   02dec1962 16:48:00       1066      16:48       1008          0          0          0 |
                    2. |  10001   13dec1962 21:36:00       1077      21:36       1296          1          0          1 |
                    3. |  10001   19dec1962 07:12:00       1083      07:12        432          0          0          0 |
                    4. |  10001   25dec1962 21:36:00       1089      21:36       1296          1          0          1 |
                    5. |  10001   04jan1963 14:24:00       1099      14:24        864          0          0          0 |
                    6. |  10001   13jan1963 16:48:00       1108      16:48       1008          0          0          0 |
                    7. |  10001   24jan1963 19:12:00       1119      19:12       1152          1          0          1 |
                    8. |  10001   04feb1963 04:48:00       1130      04:48        288          0          1          1 |
                    9. |  10001   09feb1963 19:12:00       1135      19:12       1152          1          0          1 |
                       +-----------------------------------------------------------------------------------------------+
                  I can't see a gain in stepping outside Stata's date-time framework here, but it can be done.

                  Detail: Need to consider whether you want > or >= or < or <= in each case.

                  Comment


                  • #10
                    Thanks a lot. I think the code proposed in #4 is simpler, so I will use that one.

                    Sorry, I am a bit confused with the time here. If 7.5 represents 7.30am, how would I have something like 7.59 am or 7.58am for example?

                    Comment


                    • #11
                      We agree. You asked whether you can drop seconds, and my answer is only by stepping outside Stata date-time machinery, which here is not easier.

                      Otherwise the use of 7.5 * 60 for minutes after midnight represented by 7:30 is just a short cut because we all know that 30 minutes is exactly half an hour.

                      Times like 7:58 can be worked with in any convenient way, such as 7 * 60 + 58. Again there are quite possibly some relevant Stata functions, but sometimes it is easier to fall back on what you've known since childhood.

                      Comment

                      Working...
                      X