Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to check if a variable is well-coded between a range of time variables?

    Dear all,

    I would like to know if my variable -ser_blue_tariffs- is well coded. To verify that, it should be true that for each individual that has taken more than one trip, this variable takes value of 0 if the arrival time of the previous trip was 8pm (up to 9pm), and departure time of the current trip should be between 8pm and 9pm also.

    Here is what I coded, but not sure of it at all:

    Code:
    * if individual lives in a given barrio (ser_regulado_free), parking costs = 0 if 8pm-9pm
    gen eight_pm = clock("20:00:00", "hms")
    gen nine_pm = clock("21:00:00", "hms")
    format %tC eight_pm nine_pm
    
    
    by individ_ID (start_time), sort: gen actual_start_time = start_time if _n > 1
    format %tC actual_start_time
    
    forval i = 2/`=_N' {
        if actual_start_time[`i'] < nine_pm & end_time[`i'-1] > eight_pm {
            replace ser_blue_tariffs = 0 if !missing(blue_slots[`i'-1]) in `i'
            
        }
    }
    
    
    drop eight_pm actual_start_time nine_pm

    I am using -browse- to check it "manually". Surely, there are better ways to do that, and I would like to know how please.
    Here's what I've coded to check. However it takes a long time to visually check some of the inconsistencies.

    Code:
    browse if (inrange(end_hour[_n-1],20,21) & inrange(start_hour,20,21) & ID_VIAJE >1)
    A small dataex:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 individ_ID byte ID_VIAJE double(start_time end_time) float interlude double ser_blue_tariffs
    "1000154_1" 1 59400000 60300000   . .
    "1000154_1" 2 79200000 81000000 315 .
    "1000154_2" 1 64800000 66600000   . .
    "1000154_2" 2 79200000 81000000 210 .
    "1000289_1" 1 28800000 30600000   . .
    end
    format %tC start_time
    format %tC end_time
    Suggestions would be most welcome.
    Thank you.

    Best,
    Michael
    Last edited by Michael Duarte Goncalves; 25 Oct 2023, 09:27.

  • #2
    You should not use %tC formats with clock() variables. %tC variables are for use with Clock() variables--which are datetime variables that are adjusted for leap seconds. (Similarly, you should not use %tc formats with Clock() variables.) I don't know whether your variables are clock() or Clock(), so I don't know which "flavor" you should be using, but I do know you should not mix the two together.

    I don't quite know what to make of your proposed code. It involves a variable blue_slots which is not mentioned in your description. I don't know if your description is wrong or the code has gone wrong by introducing an irrelevant criterion.

    The variable actual_start_time serves no apparent purpose: as you are not making modifications to start_time, I see no reason not to use start_time itself in the code modifying ser_blue_tariffs. In any case, if you do need such a variable, perhaps for something else, you must create it as a double. Just -gen newvarname = some_date_time_variable- will, by default, create it as a float. And floats do not have enough bits to hold the information in a datetime variable.

    Looping over observations, i.e. -forvalues i = 2/`=_N'- is occasionally necessary in Stata, but I don't think this is one of those circumstances. When it is not absolutely necessary it should be avoided because it will be far slower than other approaches. And the use of variables to hold constants, though legal, is not good programming style. Better to store such values in scalars or locals.

    Here's how I would implement the criterion in your description. I'm assuming all the times are clock(), not Clock(), and I assume that arrival time refers to the end_time variable and departure time refers to the start_time variable.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 individ_ID byte ID_VIAJE double(start_time end_time) float interlude double ser_blue_tariffs
    "1000154_1" 1 59400000 60300000   . .
    "1000154_1" 2 79200000 81000000 315 .
    "1000154_2" 1 64800000 66600000   . .
    "1000154_2" 2 79200000 81000000 210 .
    "1000289_1" 1 28800000 30600000   . .
    end
    format %tc start_time
    format %tc end_time
    
    local 8pm = tc(08:00PM)
    local 9pm = tc(09:00PM)
    by individ_ID (start_time), sort: replace ser_blue_tariffs = 0 ///
        if inrange(end_time[_n-1], `8pm', `9pm') & inrange(start_time, `8pm', `9pm')
    By the way, your example data contains no instances of anything happening between 8:00 PM and 9:00 PM.

    Now, as for verifying whether a variable meets some criterion, browsing is really only adequate in very small data sets. Even then, it leaves no record showing that you made the verification. Do read -help assert- for a better way. Use it liberally.

    Added: It dawns on me that the code I have shown above sets ser_blue_tariffs = 0 when the times in question are between 8PM and 9PM inclusive at both ends. If what you really mean is on or after 8PM but before, and not including, 9:00 PM, it is actually a bit simpler:
    Code:
    by individ_ID (start_time), sort: replace ser_blue_tariffs = 0 ///
        if hh(end_time[_n-1]) == 20 & hh(start_time) == 20
    Last edited by Clyde Schechter; 25 Oct 2023, 11:41.

    Comment


    • #3
      Hi Clyde Schechter,

      Thank you very much for your very detailed explanations and feedback really appreciated.

      My start_time and end_time variables correspond perfectly to what you said, i.e. the start time and the end time. However, they are in %tC format. So I'm adapting your code to suit me.


      Finally, I'd like to thank you warmly for your help and your exhaustive explanations. I really enjoy learning to program, especially on stata, and learning from people like you who excel in stata is just fantastic. Thank you very much.
      Have a great rest of your evening/day.

      What i really meant was the following, that I quote from #2:


      "what you really mean is on or after 8PM but before, and not including, 9:00 PM, it is actually a bit simpler[...]"
      Thanks for providing me both forms of code.
      Best regards,

      Michael
      Last edited by Michael Duarte Goncalves; 26 Oct 2023, 01:26.

      Comment


      • #4
        Perhaps just one more question please:

        I have done similar things in other upper code parts.
        But, this is a little bit more complex.
        • I need to calculate a difference between two variables. Essentially, if the previous journey ends at 7.50pm, say. This means that the person in question parks his car at that time. And the next journey starts at 9.30pm, so he doesn't have to pay 100 minutes (7.50-9.30pm) as parking time, but only the time from 7.50 to 9.00pm, because the car parks are free from 9pm until 9am.

          I have my interlude variable which measures the total time in minutes (float type). So for this example, it takes into account the total time: 100min.

          Now, I'd like to deduct from interlude the excess 30 minutes that the individual doesn't pay for, so that the tariff calculation is based on 70 minutes, not 100 minutes.

          I've coded similarly to #1, but I imagine better solutions exist based on #2.

          Any suggestions please?
        Code:
        *****************************
        * Journeys sometimes finish before 9pm, and the next one leaves between 9pm and 9am. It's worth noting that between 9pm and 9am, car parks are free in the SER blues zones.
        * So only pay for the time up to 9pm, or from 9am if the car is parked.
        *****************************
        gen interlude_time_extra_blue = .
        local 9pm = tC(09:00PM)
        format %tC nine_pm
        
        
        
        by individ_ID (start_time), sort: gen actual_start_time = start_time if _n > 1
        format %tC actual_start_time
        
        forval i = 2/`=_N' {
            if actual_start_time[`i'] > `9pm' & end_time[`i'-1] < `9pm' {
                replace interlude_time_extra_blue = clockdiff(`9pm', actual_start_time [`i'], "m") if !missing(blue_slots[`i'-1]) in `i'
                
            }
        }
        
        
        forval i = 2/`=_N' {
            if (actual_start_time[`i'] > `9pm' & end_time[`i'-1] < `9pm') & !missing(blue_slots[`i'-1]) {
                replace interlude = interlude[`i'] - interlude_time_extra_blue[`i'] in `i'
                
            }
        }

        Thank you in advance.

        Comment


        • #5
          This is tricky because of the possibility of a change of date between one observation and the next. Does your data not have a date variable as well as the times? If so, please post a new data example that includes the date variable. If there is no date variable, please post a new example that does include some parking that extends into the 9PM-next day 9AM range, and also includes some parking that begins before 9PM and continues past 9AM the next day. I will try to work it out.

          Comment


          • #6
            Hi Clyde Schechter:

            Yes, it includes a date and time variables:

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str9 individ_ID byte ID_VIAJE double(start_time end_time) float(date interlude) byte(start_hour start_minute end_hour end_minute) double(ser_blue_tariffs ser_green_tariffs)
            "1000154_1" 1 59400000 60300000 21328   . 16 30 16 45 . .
            "1000154_1" 2 79200000 81000000 21328 315 22  0 22 30 . 0
            "1000154_2" 1 64800000 66600000 21328   . 18  0 18 30 . .
            "1000154_2" 2 79200000 81000000 21328 210 22  0 22 30 . .
            "1000289_1" 1 28800000 30600000 21325   .  8  0  8 30 . .
            end
            format %tC start_time
            format %tC end_time
            format %td date

            Thank you for your help Clyde.
            Best,

            Michael
            Last edited by Michael Duarte Goncalves; 27 Oct 2023, 04:06.

            Comment


            • #7
              OK, this is, at least, a start:
              Code:
              gen double start_dttm = Cofd(date) + start_time
              gen double end_dttm = Cofd(date) + end_time
              replace end_dttm = end_dttm + msofhours(24) if end_time < start_time // DATE FLIP
              gen end_date = dofC(end_dttm)
              format *_dttm %tC
              format end_date %td
              
              by individ_ID (date start_time), sort: assert start_dttm <= Cdhms(end_date[_n-1]+2, 9, 0, 0) if _n > 1
              
              by individ_ID (date start_time), sort: gen double free_interlude_begins = ///
                  max(end_dttm[_n-1], Cdhms(date[_n-1], 21, 0, 0))
              by individ_ID (date start_time): gen double free_interlude_ends = ///
                  min(start_dttm, Cdhms(date[_n-1]+1, 9, 0, 0))
              format free_interlude_* %tC
              gen double free_interlude = max(Clockdiff(free_interlude_begins, free_interlude_ends, "m"), 0)
              gen paid_interlude = interlude - free_interlude
              To be clear, this code will not work correctly if the interlude from the end of one trip to the beginning of the next extends over more than one day. So, it's fine if the person parks on Friday night and then takes the car out sometime before Saturday 9PM. But if the car is left parking past Saturday 9PM it is now entering a second free period: the code does not handle that situation. It will only count the first 9PM-9AM period as free--anything after that is paid. It would be pretty complicated to make the code handle this multi-day situation, so I'm hoping that that's never necessary in your data. The -assert- command that appears in the middle of the code checks that. If it finds any multi-day parking, it will halt execution and give an "assertion is false" error message.

              Comment


              • #8
                Hi Clyde Schechter,

                Thank you very much for your post in #7!

                I shouldn't have any of the cases you mention as problematic... but a double-check is never too much.
                In any case, thank you very much for your time and energy in helping me write this code!


                All the best,

                Michael
                Last edited by Michael Duarte Goncalves; 30 Oct 2023, 01:57.

                Comment


                • #9
                  Hi again,

                  I have just a quick question in relation with code posted in #7:
                  • Why do we have to generate specifically "doubles", please?
                  Michael

                  Comment


                  • #10
                    Because clock variables, which represent time in milliseconds from midnight 1 Jan 1960, are very large numbers with too many digits to represent in any smaller data storage type. If you just -gen new_datetime_variable = ...- without specifying -double-, you will get a float. Because a float cannot hold the required number of digits, the low order digits will be chopped off, with the remainder rounded to the nearest number that can fit inside a float, and as a result, some times will be stored as incorrect numbers. For example:
                    Code:
                    . clear
                    
                    . set obs 1
                    Number of observations (_N) was 0, now 1.
                    
                    . gen test = clock("30Oct2023 08:59:00AM", "DMYhms")
                    
                    . format test %tc
                    
                    . list
                    
                         +--------------------+
                         |               test |
                         |--------------------|
                      1. | 30oct2023 08:59:27 |
                         +--------------------+

                    Comment


                    • #11
                      Hi Clyde Schechter,

                      Nice! Thank you very much for #10.

                      So I understand better why I sometimes got strange seconds that had nothing to do with the 'real' seconds of any variable.

                      Thanks for these nice explanations.
                      Have a nice day,

                      Michael

                      Comment

                      Working...
                      X