Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Daily Population of Facilities

    Hello experts,

    I have a dataset with 10,181 observations where each observation represents an individual held at a facility. Each element has information about when the individual was "booked into" (i.e., admitted) custody, when the person was "booked out" (i.e., released) from custody, and the facility where the individual was held. There are 113 unique facilities. The "book in" variable ranges from 09mar2012 to 01mar2018. The "book out" variable ranges from 18may2013 to 01jun2018. There is no missingness.

    How do I compute the daily population for each facility? How do I count the number of individuals in custody at each facility for everyday from 09mar2012 to 01jun2018?

    I have included a simple data example below. I am using Stata 15.1. Any help is great appreciated--thank you for your time.

    Code:
    *Example generated by -dataex-...to install: ssc install dataex
    clear
    input int(bookin bookout) long facility
    19061 19541 75
    19113 19536 36
    19143 19661 49
    19199 19675 72
    19222 19670 36
    19278 19766 57
    19279 19556 36
    19311 19666 85
    19334 19542 49
    19337 19612 15
    19340 19682 34
    19347 19606 87
    19385 19517 33
    19418 19535 57
    19419 19661 30
    19423 19606 36
    19426 19600 1
    19428 19536 36
    19430 19677 72
    19436 19561 112
    19443 19666 85
    19444 19694 55
    19444 19564 112
    19447 19675 74
    19449 19575 49
    end
    
    format %tdnn/dd/CCYY bookin
    format %tdnn/dd/CCYY bookout
    label values facility facility_destrung
    label def facility_destrung 1 "FAC1", modify
    label def facility_destrung 15 "FAC2", modify
    label def facility_destrung 30 "FAC3", modify
    label def facility_destrung 33 "FAC4", modify
    label def facility_destrung 34 "FAC5", modify
    label def facility_destrung 36 "FAC6", modify
    label def facility_destrung 49 "FAC7", modify
    label def facility_destrung 55 "FAC7", modify
    label def facility_destrung 57 "FAC8", modify
    label def facility_destrung 72 "FAC9", modify
    label def facility_destrung 74 "FAC10", modify
    label def facility_destrung 75 "FAC11", modify
    label def facility_destrung 85 "FAC12", modify
    label def facility_destrung 87 "FAC13", modify
    label def facility_destrung 112 "FAC14", modify
    label var bookin "Book-in Date"
    label var bookout "Book-out Date"
    label var facility "Facility Number"

  • #2
    How do I compute the daily population for each facility?
    You can't, because you do not know what the population of the census was just before your data set begins. But if you are willing to assume that the facilities were empty before the start of your data, this will do it:
    Code:
    gen long obs_no = _n
    reshape long book, i(obs_no) j(movement) string
    gen change = cond(movement == "in", 1, -1)
    gsort facility book -change
    by facility , sort: gen census = sum(change)
    Thank you for giving a clear explanation of your problem, and using -dataex- on your very first post. Welcome to Statalist.

    Comment


    • #3
      Clyde,

      Thank you for your very timely response and posted solution (it is quite helpful). Your assistance is sincerely appreciated! The point about assuming that the facilities were empty prior to the data set beginning is a safe assumption for my purposes.

      Is it possible to extend your solution to compute the average daily population across a year? I reckon that I need to sum the census for each facility across a year long period and then divide by the number of days in a year, but I haven't been able to successfully implement this.

      Comment


      • #4
        OK. This is somewhat complicated to do, because your dates can have gaps that straddle the turn of a year, so the census has to be allocated to both years. Also, we have to account for the 0 census the exists prior to the start of your data.

        In the code below, I have also slightly changed the calculation of the daily census, because the original code could potentially have more than one observation per date if somebody was booked in and somebody booked out on the same date. So in this code, there is only one observation per day, representing the census at the end of that day.

        Code:
        *Example generated by -dataex-...to install: ssc install dataex
        clear
        input int(bookin bookout) long facility
        19061 19541 75
        19113 19536 36
        19143 19661 49
        19199 19675 72
        19222 19670 36
        19278 19766 57
        19279 19556 36
        19311 19666 85
        19334 19542 49
        19337 19612 15
        19340 19682 34
        19347 19606 87
        19385 19517 33
        19418 19535 57
        19419 19661 30
        19423 19606 36
        19426 19600 1
        19428 19536 36
        19430 19677 72
        19436 19561 112
        19443 19666 85
        19444 19694 55
        19444 19564 112
        19447 19675 74
        19449 19575 49
        end
        
        format %tdnn/dd/CCYY bookin
        format %tdnn/dd/CCYY bookout
        label values facility facility_destrung
        label def facility_destrung 1 "FAC1", modify
        label def facility_destrung 15 "FAC2", modify
        label def facility_destrung 30 "FAC3", modify
        label def facility_destrung 33 "FAC4", modify
        label def facility_destrung 34 "FAC5", modify
        label def facility_destrung 36 "FAC6", modify
        label def facility_destrung 49 "FAC7", modify
        label def facility_destrung 55 "FAC7", modify
        label def facility_destrung 57 "FAC8", modify
        label def facility_destrung 72 "FAC9", modify
        label def facility_destrung 74 "FAC10", modify
        label def facility_destrung 75 "FAC11", modify
        label def facility_destrung 85 "FAC12", modify
        label def facility_destrung 87 "FAC13", modify
        label def facility_destrung 112 "FAC14", modify
        label var bookin "Book-in Date"
        label var bookout "Book-out Date"
        label var facility "Facility Number"
        
        gen long obs_no = _n
        reshape long book, i(obs_no) j(movement) string
        gen change = cond(movement == "in", 1, -1)
        rename book date
        collapse (sum) change, by(facility date)
        by facility (date), sort: gen census = sum(change)
        //    IDENTIFY YEARLY AVERAGES
        gen year = yofd(date)
        //    ADD A 1/1 TO THE BEGINNING AND A 12/31 TO THE END OF EACH YEAR
        gen expander = 1
        by facility year (date), sort: replace expander = expander + 1 if _n == 1 & doy(date) != 1
        by facility year (date): replace expander = expander + 1 if _n == _N & date != mdy(12, 31, year)
        expand expander
        by facility year (date), sort: replace date = mdy(1, 1, year) if _n == 1 & expander > 1
        by facility year (date), sort: replace date = mdy(12, 31, year) if _n == _N & expander > 1
        by facility year (date), sort: assert date[_N] == mdy(12, 31, year) & date[1] == mdy(1, 1, year)
        //    SET CENSUS AS OF 1/1 TO CENSUS AT END OF PRECEDING YEAR, UNLESS THERE WAS AN ACTUAL
        //    CENSUS CALCULATION FOR THAT DATE
        by facility (date), sort: replace census = census[_n-1] if date == mdy(1, 1, year) & expander > 1 & _n > 1
        //    SET CENSUS AS OF 1/1 TO 0 IF THERE WASS NO ACTUAL CENSUS CALCULATION FOR THAT DATE
        //    AND THERE IS NO EARLIER YEAR'S DATA
        by facility (date): replace census = 0 if date == mdy(1, 1, year) & expander > 1 & _n == 1
        
        //    NOW WE CAN CALCULATE YEARLY AVERAGES
        by facility year (date), sort: gen interval = date[_n+1] - date
        by facility year (date): replace interval = 1 if date == mdy(12, 31, year)
        by facility year: egen person_days = total(interval*census)
        by facility year: egen days_reckoned = total(interval)
        gen yearly_average_census = person_days/days_reckoned

        Comment


        • #5
          Clyde,

          This is wonderful--thank you! This solution works well and I've learned so much from how you've approached this problem.

          Comment

          Working...
          X