Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating gaps between spells with overlapping dates

    Hi Statalist community,

    I am trying to calculate gaps between employment for young adults. Since young adults are just starting to enter the workforce, they may have irregular work histories or may juggle multiple jobs at the same time. I am trying calculate the number days between employment. You could also see it as the numbers of days of being unemployed between jobs. I have sample data for the period from January 1st, 2011 to December 31st, 2013.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id float( start_date end_date)
    1  18628 19723
    1  18628 19723
    2  19034 19600
    3  18628 19723
    4  18628 19175
    4  19176 19723
    5  18861 18971
    5  19015 19132
    5  19225 19335
    5  19386 19503
    5  19596 19706
    6  18868 18979
    6  19262 19343
    6  19232 19357
    end
    format %td start_date
    format %td end_date
    Below is my attempt at finding the gaps between employment.

    Code:
    *Drop entries where someone had multiple jobs that had the same start and end date.
    duplicates drop id start_date end_date, force
    *Calculate gap between employment by subtracting the start date of a new job with the end date of the most recent job
    bys id (start_date): gen gap_between_employment = start_date[_n+1] - end_date-1
    • Individual 1 had 2 jobs and was continuously employed and did not have any gaps between employment.
    • Individual 2 had 1 job during this period and was not reemployed; thus he also didn't have a gap since he did not find another job within this period. He would be considered right censored.
    • Individual 3 had 1 job and was continuous employed during this period. He had no gaps between employment.
    • Individual 4 had 2 jobs was continuous employed and did not have any gaps between employment
    • Individual 5 had 5 jobs and had several gaps between employment. The largest gap between employment was 92.
    • Individual 6 had 3 jobs and some overlapping dates
    I am not sure how to fix the overlapping dates. Help would be appreciated. Thank you.
    Last edited by James Lee; 08 May 2023, 11:03.

  • #2
    See https://journals.sagepub.com/doi/pdf...867X1301300116 for one device. Here's some sample code following your helpful data example.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id float( start_date end_date)
    1  18628 19723
    1  18628 19723
    2  19034 19600
    3  18628 19723
    4  18628 19175
    4  19176 19723
    5  18861 18971
    5  19015 19132
    5  19225 19335
    5  19386 19503
    5  19596 19706
    6  18868 18979
    6  19262 19343
    6  19232 19357
    end
    format %td start_date
    format %td end_date
    
    gen long obsno = _n 
    expand 2 
    bysort obsno : gen end = _n == 2 
    by obsno : gen date = cond(_n == 1, start_date, end_date)
    format date %td 
    bysort id (date end) : gen gap = cond(end[_n-1] & !end & _n > 1, date - date[_n-1], 0)
    
    list, sepby(id)
    
    
         +------------------------------------------------------------+
         | id   start_d~e    end_date   obsno   end        date   gap |
         |------------------------------------------------------------|
      1. |  1   01jan2011   31dec2013       1     0   01jan2011     0 |
      2. |  1   01jan2011   31dec2013       2     0   01jan2011     0 |
      3. |  1   01jan2011   31dec2013       1     1   31dec2013     0 |
      4. |  1   01jan2011   31dec2013       2     1   31dec2013     0 |
         |------------------------------------------------------------|
      5. |  2   11feb2012   30aug2013       3     0   11feb2012     0 |
      6. |  2   11feb2012   30aug2013       3     1   30aug2013     0 |
         |------------------------------------------------------------|
      7. |  3   01jan2011   31dec2013       4     0   01jan2011     0 |
      8. |  3   01jan2011   31dec2013       4     1   31dec2013     0 |
         |------------------------------------------------------------|
      9. |  4   01jan2011   01jul2012       5     0   01jan2011     0 |
     10. |  4   01jan2011   01jul2012       5     1   01jul2012     0 |
     11. |  4   02jul2012   31dec2013       6     0   02jul2012     1 |
     12. |  4   02jul2012   31dec2013       6     1   31dec2013     0 |
         |------------------------------------------------------------|
     13. |  5   22aug2011   10dec2011       7     0   22aug2011     0 |
     14. |  5   22aug2011   10dec2011       7     1   10dec2011     0 |
     15. |  5   23jan2012   19may2012       8     0   23jan2012    44 |
     16. |  5   23jan2012   19may2012       8     1   19may2012     0 |
     17. |  5   20aug2012   08dec2012       9     0   20aug2012    93 |
     18. |  5   20aug2012   08dec2012       9     1   08dec2012     0 |
     19. |  5   28jan2013   25may2013      10     0   28jan2013    51 |
     20. |  5   28jan2013   25may2013      10     1   25may2013     0 |
     21. |  5   26aug2013   14dec2013      11     0   26aug2013    93 |
     22. |  5   26aug2013   14dec2013      11     1   14dec2013     0 |
         |------------------------------------------------------------|
     23. |  6   29aug2011   18dec2011      12     0   29aug2011     0 |
     24. |  6   29aug2011   18dec2011      12     1   18dec2011     0 |
     25. |  6   27aug2012   30dec2012      14     0   27aug2012   253 |
     26. |  6   26sep2012   16dec2012      13     0   26sep2012     0 |
     27. |  6   26sep2012   16dec2012      13     1   16dec2012     0 |
     28. |  6   27aug2012   30dec2012      14     1   30dec2012     0 |
         +------------------------------------------------------------+

    Comment


    • #3
      Nick Cox Thank you so much for the Stata Journal entry and thank you for the code. You are so gracious with your time. It works great and I was hoping to ask you what the following code that you provided meant. Sorry if the code is intuitive. I struggled to figure it out.
      Code:
       
       bysort id (date end) : gen gap = cond(end[_n-1] & !end & _n > 1, date - date[_n-1], 0)
      I want to be sure I understand the mechanics behind your code because my ultimate goal is to count the number of non-duplicates days that a person was employed. I will use the gaps between employment to estimate how many days someone was employed. Below is code to illustrate my logic.

      Code:
      *Days employed
      *make the gap variable invariant
      bysort id (start_date): egen gap_max = max(gap)
      
      *find the earliest date of employment ever by id
      by id: egen earliest_date =min(start_date)
      format earliest_date %td
      
      *find the latest date of employment ever by id
      by id: egen latest_date = max(end_date)
      format latest_date %td
      
      *calculate days employed
      gen days_range = (latest_date- earliest_date)- gap_max

      Comment


      • #4
        First note that the prefix

        Code:
        by id (date end) :
        ensures separate calculations for each individual, sorted first by date and then by end.

        The condition

        Code:
        end[_n-1] & !end
        is short for

        Code:
        end[_n-1] == 1 & end == 0
        given that the definition of end as binary, 1 if an observation holds details on an end date and 0 otherwise. The extra condition

        Code:
        _n > 1
        rules out the first observation for each individual, which can't be for an end date any way but for which end[_n-1] would return missing, which would just be a complication.

        If observations satisfy that condition, we calculate the difference which will be (this start date MINUS the previous end date). In fact because the difference between 1 July as an end and 2 July as a start (say) is arguably not a gap at all, you should work with


        Code:
         
         date - date[_n-1] - 1 
        so that (for example) that code returns 92 not 93 as the longest gap, and so matches your calculations. If observations do not satisfy that condition, we return 0. For more on true or false see e.g. https://www.stata.com/support/faqs/d...rue-and-false/ For more on cond() see https://journals.sagepub.com/doi/pdf...867X0500500310 Putting a version of your code together with mine I get this
        Code:
         * Example generated by -dataex-. For more info, type help dataex clear input byte id float( start_date end_date) 1  18628 19723 1  18628 19723 2  19034 19600 3  18628 19723 4  18628 19175 4  19176 19723 5  18861 18971 5  19015 19132 5  19225 19335 5  19386 19503 5  19596 19706 6  18868 18979 6  19262 19343 6  19232 19357 end format %td start_date format %td end_date  save SAFECOPY , replace   gen long obsno = _n  expand 2  bysort obsno : gen end = _n == 2  by obsno : gen date = cond(_n == 1, start_date, end_date) format date %td  bysort id (date end) : gen gap = cond(end[_n-1] & !end & _n > 1, date - date[_n-1] - 1, 0) collapse (sum) gap, by(id)  merge 1:m id using SAFECOPY   bysort id (start_date) : gen first = start_date[1] bysort id (end_date) : gen last = end_date[_N]  gen wanted = last - first + 1 - gap   tabdisp id, c(first last gap wanted)  ----------------------------------------------------------        id |      first        last   (sum) gap      wanted ----------+-----------------------------------------------         1 |      18628       19723           0        1096         2 |      19034       19600           0         567         3 |      18628       19723           0        1096         4 |      18628       19723           0        1096         5 |      18861       19706         277         569         6 |      18868       19357         252         238 ----------------------------------------------------------
        Notes: Other way round being employed on 1 July and 2 July counts as 2 days of employment which is why we add 1. Your code finds the maximum gap, which is fine if and only if there is only one gap. In general, you want the total over all gaps. egen will be needed to get first and last dates overall if there are any missing dates; otherwise the method in thc code is more direct.

        Comment


        • #5
          Nick Cox Thank you for your explanation to the code. I get it now. Your code is very clear. Thanks also for the feedback about finding the maximum gap. I will subtract 1 to get the proper values.

          Comment

          Working...
          X