Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting spell length

    Dear Statalist,

    I am doing an analysis, where I would like to control for the length of the last unemployment spell. The analysis looks at reemployed respondents with regard to a dichotomous dependent variable (here: "dep1"). I am working on this problem for a while now, but I cannot find a definite solution and I am grateful for any ideas on how to approach this.

    The problem is that I need to attribute the spell length to the dependent variable. I think an illlustration of my dataset will show why this is difficult:

    This is certainly a worst-case scenario in my data, but many of these cases exists and I am trying to account for such cases in my approach. Down below is the -dataex- version of the whole observation period of the person.
    ID Year Reemployed (1=yes) Month of interview Temp1 Temp2 Temp3 Temp4 Temp5 Temp6 Temp7 Temp8 Temp9 Temp10 Temp11 Temp12 Dependent variable (dummy)
    1 2002 0 4 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2003 0 4 -2 1 1 1 1 1 1 1 1 1 1 1 0
    1 2004 1 3 1 -2 -2 1 1 -2 -2 1 1 -2 1 -2 0
    1 2005 1 7 1 1 1 1 1 -2 -2 -2 -2 -2 -2 -2 0
    1 2006 0 8 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    1 2007 0 7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    So, here is an example: For this person I have to times where he got reemployed in the table above and got interviewed (and therefore a value for the dependent value). But as you can see, he has multiple unemployment spells. "Temp1"-"Temp12" are the unemployment months (1=unemployed) in the previous year! Month of interview corresponds to the month (eg 3=March). My dependent variable is in the -dataex- below "dep1".

    The problem: I can get the length of each spell reshaping to long-format to person-year-month and using -tsspell- counting the length of each spell. But I then would need to tell Stata to which year (where the dependent variable is) the spell length belongs. And since people like to one above often have multiple unemployment spells in a year; I can only think of one approach to create the length variable and attribute it to the corresponding year in which my dependent variable was asked.

    But first let me show you how, in the end, it should look:
    ID Year Reemployed (1=yes) Month of interview Temp1 Temp2 Temp3 Temp4 Temp5 Temp6 Temp7 Temp8 Temp9 Temp10 Temp11 Temp12 Dependent variable (dummy) Length of last spell
    1 2002 0 4 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2003 0 4 -2 1 1 1 1 1 1 1 1 1 1 1 0
    1 2004 1 3 1 -2 -2 1 1 -2 -2 1 1 -2 1 -2 0 12
    1 2005 1 7 1 1 1 1 1 -2 -2 -2 -2 -2 -2 -2 0 5
    1 2006 0 8 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    1 2007 0 7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    This is the only approach I can think of: Stata needs to count backwards. Go to the month of interview and start counting backwards, then attribute the total count to the row from which the month of the interview was taken.

    This is where I stumble:

    1) I cannot really think of a way to start the loop. Because a -while- loop seems to not work, because there is no ongoing criteria I could set -while- to TRUE. Like, e.g. I cannot say "while temp`n' == 1", because the loop starts often with "temp`n' == -2". But I need a way to tell the loop that I should start at the month of interview and then count until "-2" appears again.
    2) Can Stata count over rows? So once it finishes e.g. at "temp1", it changes to "L.temp12"?

    Really, I am grateful for every input, because I am slowly a little bit lost whether I might just not be able to control for spell length. If you think, btw, that this problem is much easier to handle in another program like R, please let me know, because I would then switch there.

    Thank you very much!




    -----

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pid int syear float reemp byte(pmonin temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10 temp11 temp12 dep1)
    1 1993 0  6 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 1994 1  4 -2 -2  1  1  1  1  1  1  1  1  1  1 .
    1 1995 0  3 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 1996 0  5 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    1 1997 0  2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    1 1998 1  4 -2 -2 -2 -2 -2 -2  1 -2 -2 -2 -2 -2 .
    1 1999 0  5 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2000 0  2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    1 2001 0  2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2002 0  4 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2003 0  4 -2  1  1  1  1  1  1  1  1  1  1  1 0
    1 2004 1  3  1 -2 -2  1  1 -2 -2  1  1 -2  1 -2 0
    1 2005 1  7  1  1  1  1  1 -2 -2 -2 -2 -2 -2 -2 0
    1 2006 0  8 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    1 2007 0  7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
    1 2008 0  6 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2009 0  7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2010 1  7 -2 -2 -2 -2 -2  1  1 -2  1  1 -2  1 .
    1 2011 1  7  1  1  1  1  1  1  1  1  1  1  1  1 .
    1 2012 0  4 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2013 0  6 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    1 2014 0  7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
    end
    label values pmonin pmonin
    label def pmonin 2 "[2] Februar", modify
    label def pmonin 3 "[3] March", modify
    label def pmonin 4 "[4] April", modify
    label def pmonin 5 "[5] May", modify
    label def pmonin 6 "[6] June", modify
    label def pmonin 7 "[7] July", modify
    label def pmonin 8 "[8] August", modify
    Last edited by sladmin; 06 Feb 2018, 09:38. Reason: anonymize user

  • #2
    Guest, I'm unable to figure out what it is you are trying to calculate here.

    First, it doesn't seem to have anything to do with spells. In the observation where year == 2004, there are three different spells of unemployment, of length 2, 2, and 1. Your "how it should look" data shows am entry of 5 in year == 2005. So it appears that what you are interested in is the total number of months of unemployment, whether in a spell or not.

    But then in the observation where year == 2006, your "how it should look" data shows nothing at all for length of last spell, even though there were 5 months of unemployment in the year == 2005 observation (and they were even a single spell!) So why don't those count? What makes them different?

    In your year==2003 observation, you show length of spell as 12. There were, however, only 11 months of unemployment in 2002. I suppose you are also counting the 1 month of employment in January 2004. But I don't quite get why: is it because that January 2004 month of unemployment was contiguous with the 11 months of unemployment in 2002? Or was it because that month of unemployment precedes the month of the interview in 2004, which was March? If the person had remained continuously unemployed through, say June of 2004, would you have included all of January through June 2004 (plus the 11 months of 2003) for a total of 17 months? Or would you have only counted through February? Or through March? Or what? If this person had been employed in January 2004, but then unemployed in February 2004, would the result have just been 11, or would it still be 12 because the February 2004 unemployment preceded the interview month, even though it was not contiguous with the 11 months of unemployment in 2003.

    So, as you see, I am very confused as to what matters here: is it uninterrupted stretches (spells) of unemployment, or is it total months of unemployment between interviews (whether continuous or not) or something altogether different?
    Last edited by sladmin; 06 Feb 2018, 09:38. Reason: anonymize original poster

    Comment


    • #3
      I agree with Clyde that this is all very confusing. The following reflects what I understood of the problem. If I'm wrong, it shows some techniques on how to handle this type of data management problem.

      I'm assuming that Guest wants the length of the most recent unemployment spell, prior to the reemployment response, as measured on the interview date. I'm also assuming that temp1 to temp12 refer to Jan. to Dec of the preceding year. The following code generates observations for each unemployment month and identifies spells of unemployment. The data is then reduced to one observation per unemployment spell, noting the start date and the length of the spell.

      With that information in hand, the code returns to the original dataset and keeps reemployment observations. These are joined using rangejoin (from SSC) with unemployment spells that start on the month of the interview or before. For each reemployment observation, the data is reduced to the one with the most recent spell. These reemployment observations are merged back with the original dataset. Finally, if the most recent unemployment spell overshoots the interview date, the length of the spell is capped to the months up to the interview.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long pid int syear float reemp byte(pmonin temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10 temp11 temp12 dep1)
      1 1993 0  6 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 1994 1  4 -2 -2  1  1  1  1  1  1  1  1  1  1 .
      1 1995 0  3 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 1996 0  5 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
      1 1997 0  2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
      1 1998 1  4 -2 -2 -2 -2 -2 -2  1 -2 -2 -2 -2 -2 .
      1 1999 0  5 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 2000 0  2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
      1 2001 0  2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 2002 0  4 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 2003 0  4 -2  1  1  1  1  1  1  1  1  1  1  1 0
      1 2004 1  3  1 -2 -2  1  1 -2 -2  1  1 -2  1 -2 0
      1 2005 1  7  1  1  1  1  1 -2 -2 -2 -2 -2 -2 -2 0
      1 2006 0  8 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
      1 2007 0  7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0
      1 2008 0  6 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 2009 0  7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 2010 1  7 -2 -2 -2 -2 -2  1  1 -2  1  1 -2  1 .
      1 2011 1  7  1  1  1  1  1  1  1  1  1  1  1  1 .
      1 2012 0  4 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 2013 0  6 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      1 2014 0  7 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 .
      end
      
      * the interview date in Stata monthly date
      gen imdate = ym(syear, pmonin)
      format %tm imdate
      
      save "statalist_example.dta", replace
      
      * work on monthly unemployment spells...
      reshape long temp, i(pid syear) j(m)
      
      * reduce to months of unemployment
      keep if temp == 1
      
      * assume that temp1 is Jan. of the preceding year
      * the unemployment monthly date
      gen umdate = ym(syear-1,m)
      format %tm umdate
      
      * reduce to relevant variables
      keep pid umdate
      
      * identify new spells by looking for gaps in the monthly date
      sort pid umdate
      by pid: gen newspell = umdate != umdate[_n-1] + 1
      
      * use a running sum to group observations by spell
      by pid: gen spellid = sum(newspell)
      sort pid spellid umdate
      
      * note the length of each spell and recude to one obs per spell
      * by keeping the first month of the spell
      by pid spellid: gen spellmonths = _N
      by pid spellid: keep if _n == 1
      save "statalist_uspells.dta", replace
      list
      
      * go back to the original data and make a interview date (monthly date)
      use "statalist_example.dta", clear
      
      * reduce to reemployment obs and join with previous unemployment spells
      keep if reemp == 1
      rangejoin umdate . imdate using "statalist_uspells.dta", by(pid)
      
      * the most recent unemployment spell is the one that is the closest
      * to the reemployment.
      sort pid syear umdate
      by pid syear: keep if _n == _N
      
      * combine with original data
      keep pid syear umdate spellmonths
      merge 1:1 pid syear using "statalist_example.dta", nogen
      
      * adjust unemployment spell length if it overshoots the interview date
      sort pid syear
      gen wanted = min(spellmonths, imdate - umdate + 1)
      order pid syear reemp imdate umdate spellmonths wanted dep1
      Last edited by sladmin; 06 Feb 2018, 09:38. Reason: anonymize original poster

      Comment


      • #4
        Dear Prof. Schechter, Prof. Picard,

        first of all thank you so much for taking the time trying to understand my question and asking about it. Prof. Picard you are right, I am interested in the length (in months) of the most recent unemployment spell. So in 2005, e.g., the last reported consecutive months of registered unemployment were 5 months long and indeed, 'temp1' to 'temp12' relate to the months of January to December. I am so deep down into the question, I missed to explain the most basic things and I am very sorry about it.

        Your code, Prof. Picard, works great! I have just checked it. Thank you so much for your help! I even learnt about a new command with -rangejoin-.

        Thank you very much again for taking your time to help me!

        Comment


        • #5
          Great that this worked out for you. But just so that others don't get the wrong impression, I should point out that I'm not a professor.

          Comment

          Working...
          X