Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Make a data looks like a panel


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pump_id double timestamp float pumped
    110020 1935171979000  1.47
    110020 1935172277000  1.93
    110020 1935172537000  2.61
    110020 1935172779000  3.13
    110020 1935173008000  3.44
    110020 1935173298000  4.17
    110020 1935173572000  4.63
    110020 1935173797000  5.15
    110020 1935173974000  5.63
    110020 1935174245000  6.08
    110020 1935174621000  6.81
    110020 1935174986000  7.71
    110020 1935175253000  8.27
    110020 1935175495000  8.68
    110020 1935175689000   9.1
    110020 1935175892000  9.51
    110020 1935176169000 10.03
    110020 1935176418000 10.62
    110020 1935176657000 11.14
    110020 1935176871000 11.66
    110020 1935177113000 12.01
    110020 1935177362000 12.53
    110020 1935177598000 13.05
    110020 1935177918000 13.67
    110020 1935178142000 14.19
    110020 1935178341000  14.6
    110020 1935178569000 15.12
    110020 1935178749000 15.54
    110020 1935178943000 15.85
    110020 1935179127000 16.27
    110020 1935179381000 16.87
    110020 1935179622000 17.31
    110020 1935179988000 18.14
    110020 1935180348000 18.86
    end
    format %tc timestamp

    I have three variables, timestamp, pump_id, and a cumulative continuous variable, pumped. I have uneven intervals and my ultimate goal is to have a panel dataset with a time variable, which shows 10 minutes intervals. The issue is dividing the amount of pumped between two overlapping intervals. For example, from 19:59:39 to 20:03:28, the total amount pumped is 0.33. So, 21 seconds belong to the 19:50:00 interval and 3 minutes and 28 seconds belong to the 20:00:00 interval. I want to divide the pumped amount of 0.33 between the two intervals based on the linear weights given the time pumped in each interval. Is there any syntax to make this happen quickly?

    Code:
    pump_id    timestamp     pumped
    110020    27apr2021 19:59:39    3.13
    110020    27apr2021 20:03:28    3.44

  • #2
    I've worked on this a while and encountered an obstacle I can't overcome without additional information. While it is clear, for example, that betweeen 27apr2021 19:59:39 and 20:03:28, an incremental amount of 0.31 was pumped, it is unclear how to handle the first observation for the pump. Looking there, we see that by 19:46:19, 1.47 units had already been pumped. But when did that pumping start? Should I just assume that it was all accomplished after 19:40:00 (which would be the start of the 10 minute interval in which 19:46:19 falls)?

    Added: If the answer to my question is yes, assume it was all accomplished after 19:40:00, then the following code will work:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pump_id double timestamp float pumped
    110020 1935171979000  1.47
    110020 1935172277000  1.93
    110020 1935172537000  2.61
    110020 1935172779000  3.13
    110020 1935173008000  3.44
    110020 1935173298000  4.17
    110020 1935173572000  4.63
    110020 1935173797000  5.15
    110020 1935173974000  5.63
    110020 1935174245000  6.08
    110020 1935174621000  6.81
    110020 1935174986000  7.71
    110020 1935175253000  8.27
    110020 1935175495000  8.68
    110020 1935175689000   9.1
    110020 1935175892000  9.51
    110020 1935176169000 10.03
    110020 1935176418000 10.62
    110020 1935176657000 11.14
    110020 1935176871000 11.66
    110020 1935177113000 12.01
    110020 1935177362000 12.53
    110020 1935177598000 13.05
    110020 1935177918000 13.67
    110020 1935178142000 14.19
    110020 1935178341000  14.6
    110020 1935178569000 15.12
    110020 1935178749000 15.54
    110020 1935178943000 15.85
    110020 1935179127000 16.27
    110020 1935179381000 16.87
    110020 1935179622000 17.31
    110020 1935179988000 18.14
    110020 1935180348000 18.86
    end
    format %tc timestamp
    
    //  DEFINE 10 MINUTES
    local 10min = msofminutes(10)
    
    //  GET RANGE OF TIME VALUES FOR EACH PUMP_ID
    by pump_id (timestamp), sort: gen double start = timestamp[1]
    by pump_id (timestamp): gen double end = timestamp[_N]
    format start end %tc
    
    //  CREATE OBSERVATIONS CORRESPONDING TO INTERVALS BETWEEN THE
    //  ORIGINAL OBSERVATIONS
    by pump_id (timestamp): gen expander = cond(_n == 1, 2, 1)
    expand expander
    drop expander
    by pump_id (timestamp), sort: replace timestamp ///
        = `10min'*floor(timestamp/`10min') if _n == 1
    by pump_id (timestamp), sort: replace pumped = 0 if _n == 1
    by pump_id (timestamp), sort: gen delta = pumped[_n+1] - pumped
    by pump_id (timestamp): gen double timestamp2 = timestamp[_n+1]
    by pump_id (timestamp): drop if _n == _N
    format timestamp2 %tc
    gen `c(obs_t)' obs_no = _n
    tempfile holding
    save `holding'
    
    //  CREATE TEN MINUTE INTERVALS
    keep pump_id start end
    duplicates drop
    replace start = `10min'*floor(start/`10min')
    replace end = `10min'*ceil(end/`10min')
    expand `=(end-start)/`10min'' + 1
    by pump_id (start), sort: gen double from = start if _n == 1
    by pump_id (start): replace from = from[_n-1] + `10min' if _n > 1
    format from %tc
    by pump_id (from), sort: gen double to = from[_n+1]
    format to %tc
    by pump_id (from): drop if _n == _N
    drop start end
    
    //  COMBINE WITH PUMPING DATA; RETAIN ONLY OVERLAPS
    joinby pump_id using `holding'
    keep if min(timestamp2, to) >= max(timestamp, from)
    gen double overlap_time = min(timestamp2, to) - max(timestamp, from)
    
    //  WEIGHT EACH ORIGINAL OBSERVATION'S CONTRIBUTION TO THE 10 MINUTE INTERVAL
    //  IN PROPORTION TO THE TIME OVERLAPPING THAT INTERVAL
    gen weight = overlap_time/(timestamp2-timestamp)
    gen contribution = weight*delta
    by obs_no, sort: egen checksum = total(weight)
    
    //  AGGREGATE UP TO 10 MINUTE INTERVAL LEVEL
    collapse (sum) pumped = contribution, by(pump_id from to)
    Last edited by Clyde Schechter; 16 Apr 2023, 14:33.

    Comment


    • #3
      Dear Clyde,

      Thank you for your help. I really appreciate it.

      Comment

      Working...
      X