Calculating spell length

Guest
#1

Calculating spell length

03 Apr 2017, 05:25

I am using Stata 14.

My current panel dataset is in person-year-format (long). I only look at respondents who have changed from employment status A to B. I would like to know how long the respondents have been in status A. For that I have monthly spell data, in the form of 12 variables for every year, where 1 indicates being in status A and -2 not being in status A.

I am not sure how to calculate the exact spell length. My first thought was to change the dataset into person-month-format and following that simply counting consecutive months in status A.

So, like this:

Code:

reshape long d0, i(pid syear) j(month)

However, the timing of the interviews is different in between respondents with regard to the specific month. So maybe I should consider that. I am unsure whether this is the right approach, and in particular how to include the timing of the interview in the right month.

This is my monthly spell data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long pid int syear byte(d001 d002 d003 d004 d005 d006 d007 d008 d009 d010 d011 d012) 602 2000 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 602 2001 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 602 2002 -2 -2 1 1 1 1 1 1 -2 1 1 1 602 2003 1 1 1 1 1 -2 -2 1 1 1 -2 1 602 2004 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 end

The data is of retrospective nature. So the input in 2002 relates to January to December 2001.

My current dataset looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long pid int syear byte pmonin float age byte sex 8605 2003 2 24 1 8605 2005 2 26 1 8605 2007 2 28 1 8605 2008 4 29 1 9002 1986 3 20 1 9002 1989 2 23 1 9201 1990 3 35 0 9201 2000 2 45 0 9203 2001 2 21 0 9203 2007 3 26 0 9204 2006 3 23 1 9205 1993 2 33 1 9302 1987 3 23 0 9302 1989 3 25 0 9401 1993 4 32 0 9401 1998 2 37 0 9801 1993 4 35 0 9801 2003 5 45 0 9801 2006 6 48 0 9802 1985 3 27 1 9803 1989 2 29 1 end label values pmonin interview_month label def pmonin 2 "[2] February", modify label def pmonin 3 "[3] March", modify label def pmonin 4 "[4] April", modify label def pmonin 5 "[5] May", modify label def pmonin 6 "[6] June", modify label values sex sex label def sex 0 "fem", modify label def sex 1 "male", modify

In the end, I would like to have the last line:
pid syear transitioned from A to B covariates length of A spell

602 2002 1 x 6

602 2003 1 x 8

602 2003 1 x 3

602 2004 1 x 1

603 ....

Last edited by sladmin; 06 Feb 2018, 09:39. Reason: anonymize user
Tags: None
Dave Airey

Join Date: Apr 2014

Posts: 407
#2

03 Apr 2017, 07:24

Googling "spell length in Stata" gives some choice options to look at. I personally would just change to wide data, concatenate to a single string variable, and operate on that with string or grep functions.

http://www.stata-journal.com/sjpdf.h...iclenum=dm0029
Comment

Guest

03 Apr 2017, 07:58

Interesting! I have tried around both with string and loops and finally got to the spell length using old -tsspell-. However, I still want to make sure I grab the last spell before the interview and this is still puzzling to me. Reshaped and with -tsspell- my dataset now looks like this:

pmonin: month of interview
event: transitioned in this month
maxseq: length of spellin state A

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pid int syear byte(month pmonin event _spell) int _seq byte _end float maxseq
7402 1992 12 5 0 3 40 0  .
7402 1993  1 5 0 3 41 0  .
7402 1993  2 5 0 3 42 0  .
7402 1993  3 5 0 3 43 0  .
7402 1993  4 5 0 3 44 1  .
7402 1993  5 5 1 4  1 0  .
7402 1993  6 5 1 4  2 0  .
7402 1993  7 5 1 4  3 0  .
7402 1993  8 5 1 4  4 0  .
7402 1993  9 5 1 4  5 0  .
7402 1993 10 5 1 4  6 0  .
7402 1993 11 5 1 4  7 0  .
7402 1993 12 5 1 4  8 0  .
7402 1994  1 5 1 4  9 0  .
7402 1994  2 5 1 4 10 0  .
7402 1994  3 5 1 4 11 0  .
7402 1994  4 5 1 4 12 0  .
7402 1994  5 5 1 4 13 0  .
7402 1994  6 5 1 4 14 0  .
7402 1994  7 5 1 4 15 0  .
7402 1994  8 5 1 4 16 0  .
7402 1994  9 5 1 4 17 0  .
7402 1994 10 5 1 4 18 0  .
7402 1994 11 5 1 4 19 1 19
7402 1994 12 5 0 5  1 0  .
7402 1995  1 5 0 5  2 0  .
7402 1995  2 5 0 5  3 0  .
7402 1995  3 5 0 5  4 0  .
7402 1995  4 5 0 5  5 0  .
7402 1995  5 5 0 5  6 0  .
7402 1995  6 5 0 5  7 0  .
7402 1995  7 5 0 5  8 0  .
7402 1995  8 5 0 5  9 0  .
7402 1995  9 5 0 5 10 0  .
7402 1995 10 5 0 5 11 0  .
7402 1995 11 5 0 5 12 0  .
7402 1995 12 5 0 5 13 0  .
7402 1996  1 8 0 5 14 0  .
7402 1996  2 8 0 5 15 0  .
7402 1996  3 8 0 5 16 0  .
7402 1996  4 8 0 5 17 0  .
7402 1996  5 8 0 5 18 0  .
7402 1996  6 8 0 5 19 0  .
7402 1996  7 8 0 5 20 0  .
7402 1996  8 8 0 5 21 0  .
7402 1996  9 8 0 5 22 0  .
7402 1996 10 8 0 5 23 0  .
7402 1996 11 8 0 5 24 0  .
7402 1996 12 8 0 5 25 0  .
7402 1997  1 3 0 5 26 0  .
7402 1997  2 3 0 5 27 0  .
7402 1997  3 3 0 5 28 0  .
7402 1997  4 3 0 5 29 0  .
7402 1997  5 3 0 5 30 0  .
7402 1997  6 3 0 5 31 0  .
7402 1997  7 3 0 5 32 0  .
7402 1997  8 3 0 5 33 0  .
7402 1997  9 3 0 5 34 0  .
7402 1997 10 3 0 5 35 0  .
7402 1997 11 3 0 5 36 0  .
7402 1997 12 3 0 5 37 0  .
7402 1998  1 3 0 5 38 0  .
7402 1998  2 3 0 5 39 0  .
7402 1998  3 3 0 5 40 0  .
7402 1998  4 3 0 5 41 0  .
7402 1998  5 3 0 5 42 0  .
7402 1998  6 3 0 5 43 0  .
7402 1998  7 3 0 5 44 0  .
7402 1998  8 3 0 5 45 0  .
7402 1998  9 3 0 5 46 0  .
7402 1998 10 3 0 5 47 0  .
7402 1998 11 3 0 5 48 0  .
7402 1998 12 3 0 5 49 0  .
7402 1999  1 5 0 5 50 0  .
7402 1999  2 5 0 5 51 0  .
7402 1999  3 5 0 5 52 0  .
7402 1999  4 5 0 5 53 0  .
7402 1999  5 5 0 5 54 0  .
7402 1999  6 5 0 5 55 0  .
7402 1999  7 5 0 5 56 0  .
7402 1999  8 5 0 5 57 1  .
7402 1999  9 5 1 6  1 0  .
7402 1999 10 5 1 6  2 0  .
7402 1999 11 5 1 6  3 0  .
7402 1999 12 5 1 6  4 1  4
end
label values pmonin pmonin
label def pmonin 3 "[3] March", modify
label def pmonin 4 "[4] April", modify
label def pmonin 5 "[5] May", modify
label def pmonin 8 "[8] August", modify

So in a way I want to now make sure I take "maxseq" of the last spell in state A and then, I think, I should -compress- the dataset back into person-year format, because (I forgot to mention that) my dependent variable is only assessed every year at the time of the interview. Because e.g in the example above you can see that in year 1994 the person reported 19 months in state A until 1993 and was interviewed in May 1994. Of course whether there was more time in state A I can only see in the follow-up period for 1995.

I am stuck at how to grab the last "maxseq" before the interview month and really confused on how to handle the data.

I think that I need a variation of this code, which however only grabs the last (n-1) item, not the lastnonmissing item. But this code would take the last value and should add it to the month where the interview is happening and I would then add my all other covariates to this month and only keep this month. In theory, this sounds plausible, in practice, I am still lost.

Code:

sort pid year
by pid: g prev = maxseq[_n-1] if month == pmonin

Last edited by sladmin; 06 Feb 2018, 09:40. Reason: anonymize user

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35782
#4

03 Apr 2017, 15:19

"Old" tsspell is a sprightly 14-year-old, thank you. As in http://quoteinvestigator.com/2010/10...ld-cary-grant/

How old Cary Grant?
Old Cary Grant fine. How you?

More seriously, I don't see how you applied tsspell (SSC). It requires a prior tsset and since you have monthly data, the most natural candidate is a monthly date variable, which I can't see. I don't understand what you're asking, but I can suggest that the last spell ends at a monthly date given by

Code:

gen mdate = ym(syear, month) egen last_spell_end = max(mdate / _end), by(pid) format %tm mdate last_spell_end

That may help.
Comment

pid	syear	transitioned from A to B	covariates	length of A spell
602	2002	1	x	6
602	2003	1	x	8
602	2003	1	x	3
602	2004	1	x	1
603	....

Announcement