Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimate spell gaps that occur between events

    Hi Statalist community,

    I have employment data and I am trying to calculate gaps between employment. I define gaps between employment as episodes where a person is employed, then experiences a period of unemployment, and subsequently a period of employment.


    The sample dataset is a balance panel dataset. In the sample dataset, the variables worker_id identifies a unique individual and quarter_id identifies the wage quarter. The variable participated is a dummy variable with 1 representing employment and 0 otherwise. The variable gap_between_employment is a dummy variable indicating if the quarter is part of a gap between employment. The variable number_of_gaps counts the number of gaps between employment. The variable longest_gap counts the longest length between employment.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(worker_id quarter_id participated gap_between_employment number_of_gaps longest_gap)
    1  1 1 0 0 0
    1  2 1 0 0 0
    1  3 1 0 0 0
    1  4 1 0 0 0
    1  5 1 0 0 0
    1  6 1 0 0 0
    1  7 1 0 0 0
    1  8 1 0 0 0
    1  9 1 0 0 0
    1 10 1 0 0 0
    1 11 1 0 0 0
    1 12 1 0 0 0
    2  1 1 0 0 0
    2  2 1 0 0 0
    2  3 1 0 0 0
    2  4 0 0 0 0
    2  5 0 0 0 0
    2  6 0 0 0 0
    2  7 0 0 0 0
    2  8 0 0 0 0
    2  9 0 0 0 0
    2 10 0 0 0 0
    2 11 0 0 0 0
    2 12 0 0 0 0
    3  1 1 0 1 3
    3  2 1 0 1 3
    3  3 0 1 1 3
    3  4 0 1 1 3
    3  5 0 1 1 3
    3  6 1 0 1 3
    3  7 1 0 1 3
    3  8 1 0 1 3
    3  9 1 0 1 3
    3 10 1 0 1 3
    3 11 1 0 1 3
    3 12 1 0 1 3
    4  1 1 0 0 4
    4  2 1 0 0 4
    4  3 0 1 1 4
    4  4 0 1 1 4
    4  5 1 0 0 4
    4  6 1 0 0 4
    4  7 1 0 0 4
    4  8 0 1 2 4
    4  9 0 1 2 4
    4 10 0 1 2 4
    4 11 0 1 2 4
    4 12 1 0 0 4
    5  1 1 0 0 3
    5  2 0 1 1 3
    5  3 0 1 1 3
    5  4 1 0 0 3
    5  5 0 1 2 3
    5  6 0 1 2 3
    5  7 0 1 2 3
    5  8 1 0 0 3
    5  9 0 0 0 3
    5 10 0 0 0 3
    5 11 0 0 0 3
    5 12 0 0 0 3
    end
    • Individual 1 was employed for all 12 quarters so they didn't have any gaps between unemployment.
    • Individual 2 was employed for quarter 1, 2, and 3 and then was unemployed for the rest of the quarters; individual 2 is right censored since he never found reemployment within the quarters of observation. Thus, he also doesn't have any gaps between unemployment.
    • Individual 3 was employed in quarter 1 and 2, unemployed in quarter 3, 4, and 5 and then remployed from quarters 7-12. Individual 3 had 1 gap between unemployment with the longest length being 3 quarters.
    • Individual 4 was employed in quarter 1 and 2, then unemployed between quarter 3 and 4, then employed between quarter 5, 6, and 7, then unemployed between quarter 8, 9, 10, and 11, and then finally employed in quarter 12. Individual 4 had 2 gaps between unemployment with the longest length being 4 quarters.
    • Individual 5 was employed in quarter 1, then unemployed in quarter 2 and 3, employed in quarter 4, unemployed in quarter 5, 6, and 7, employed in quarter 8, and then unemployed in quarters 9-12. Individual 5 had 2 gaps between unemployment with the longest length being 3 quarters.
    To account for right censoring, I ran the following code:

    Code:
    bysort worker_id: egen last_quarter =max(quarter_id) if participated==1
    Now, I'm stuck. I don't know how to create the variables gap_between_employment, number_of_gaps, and longest_gap. Any help would be appreciated. Thank you so much.


  • #2
    This code assumes, and verifies, that there is only one observation for any given quarter for any given workers, and also that the quarters are consecutive with no gaps.

    Code:
    xtset worker_id quarter_id
    assert `r(gaps)' == 0
    
    by worker_id (quarter_id), sort: gen spell_start = (participated != participated[_n-1])
    by worker_id (quarter_id): gen spell_num = sum(spell_start)
        by worker_id spell_num, sort: gen duration = cond(participated, 0, _N)
    frame put worker_id spell_num participated duration if spell_start, into(spells)
    frame spells {
        isid worker_id spell_num, sort
        by worker_id (spell_num): gen byte gap_between_employment ///
            = participated == 0 & participated[_n-1] == 1 & participated[_n+1] == 1
        by worker_id (spell_num): gen number_of_gaps ///
            = sum(gap_between_employment)
        replace number_of_gaps = 0 if !gap_between_employment
        by worker_id (spell_num): egen longest_gap = ///
            max(cond(gap_between_employment, duration, 0))
    }
    
    frlink m:1 worker_id spell_num, frame(spells)
    frget *gap*, from(spells)
    will give you what you said in words you want. However, it does not agree with your calculated values. In particular, for worker_id = 3, you show number_of_gaps = 1 in every quarter, but by your words it should be zero during quarters when he/she is employed. (And I think you mean your words because for all the others worker_ids the variable number_of_gaps does follow them.)

    Comment


    • #3
      Hi @Clyde Schechter,

      Thank you so much for your help. You are correct. What I said in words is correct. I created a dummy dataset in Excel and made a mistake. For individual 3, number_of_gaps should be zero when s/he is employed. The code that you provided works like a charm. I see the logic in your code and it's very clean. Thank you for introducing me to the frame command. It will be another tool in my toolbox. I had a quick question. Why did you run the following code? Thanks.

      Code:
      assert `r(gaps)' == 0

      Comment


      • #4
        Just for the record, I flag tsspell from SSC and https://www.stata-journal.com/articl...article=dm0029 as a discussion of basic principles.

        For example, with the data from #1 a minimal run could be

        Code:
        tsset worker_id quarter_id 
        tsspell, cond(part == 0)
        egen longest = max(_seq), by(worker_id)
        That is, tsspell requires tsset or xtset data and a definition of a spell (here a condition that is true during a spell, of being unemployed).

        tsspell creates three variables, by default _spell, a counter for spells for each panel; _seq, a counter for observations within each spell, and _end,an indicator for being at the end of a spell.



        Comment


        • #5
          Re #3: I calculated the duration of a spell simply as a count of the number of observations it contains. That's only correct if there are no gaps in the quarters. For example, if a spell contains observations for quarters 5, 6, and 7, then its duration is correctly calculated as 3. But if there are gaps: suppose there is no data for quarter 6, and the three observations are for quarters 5, 7, and 8. Then it is unclear what the duration should be (or even whether this is a single spell): it depends on what you assume about what happened in quarter 6. But it would be very difficult to defend counting the duration of that spell (if it is one) as 3 quarters. It is more likely 4, or perhaps it is two spells, one of duration 1 and another of duration 2.

          Anyway, in order to assure that the duration of a spell is the same as the number of observations it contains we need to know that there are no gaps in the quarter variable.

          Comment


          • #6
            @Clyde Schechter
            Thank you for the explanations. This makes sense and I will adopt this practice in the future.

            @Nick Cox
            Thanks for the Stata Journal. I will take a closer look. I think I could also derive the variables from your way.

            Comment

            Working...
            X