Hello,
I am trying to create a running count variable of preceding observations, not necessarily restricted to the observation immediately prior, and given certain conditions. My data include 900,000 observations.
I want a running count (X) of OCCUPIED (1 or 0), grouped by SITE (A, B, C...H), if ARRIVAL date occurs before DEPARTURE date of a preceding observation. For example, the value of X for Obs 3 is 1 because its ARRIVAL occurs before the DEPARTURE for Obs 2. Likewise, for Obs 7 and 8, X is 1 and 2, respectively, because ARRIVALS occurred before DEPARTURE for Obs 6, i.e., Obs 6 had not departed Site A when Obs 7 and 8 arrived.
I tried the following groups of commands but (1) my ARRIVAL order gets mixed up and (2) the X values are not what they should be:
Does anyone have any suggestions?
Thank you.
Emily
I am trying to create a running count variable of preceding observations, not necessarily restricted to the observation immediately prior, and given certain conditions. My data include 900,000 observations.
Code:
clear all input str2 SITE str18(ARRIVAL DEPARTURE) byte OCCUPIED X "A" "23sep2011 15:38" "23sep2011 18:49" 1 . "A" "06oct2011 18:17" "06oct2011 23:06" 1 0 "A" "06oct2011 20:35" "07oct2011 01:45" 1 1 "A" "15oct2011 17:19" "15oct2011 19:27" 1 0 "A" "04nov2011 11:50" "04nov2011 17:19" 1 0 "B" "07aug2014 22:55" "08aug2014 17:41" 1 . "B" "07aug2014 23:05" "08aug2014 03:23" 1 1 "B" "07aug2014 23:55" "08aug2014 12:25" 1 2 "B" "06dec2014 13:31" "09dec2014 12:59" 0 0 "B" "06dec2014 15:03" "06dec2014 16:23" 1 0 "C" "21mar2015 08:04" "21mar2015 13:28" 0 . "C" "21mar2015 11:21" "21mar2015 17:27" 0 1 end list
I tried the following groups of commands but (1) my ARRIVAL order gets mixed up and (2) the X values are not what they should be:
Code:
sort SITE ARRIVAL by SITE: gen long order = _n bysort SITE (order): gen Y = sum(OCCUPIED[_n-1]) if ARRIVAL<DEPARTURE[_n-1] & OCCUPIED==1.
Code:
bysort SITE: gen long order = _n bysort SITE (order): gen Y = sum(OCCUPIED[_n-1]) if ARRIVAL<DEPARTURE[_n-1] & OCCUPIED==1.
Code:
bysort SITE ARRIVAL: gen long order = _n bysort SITE ARRIVAL (order): gen Y = sum(OCCUPIED[_n-1]) if ARRIVAL<DEPARTURE[_n-1] & OCCUPIED==1.
Thank you.
Emily
Comment