stsplit help

László Sándor

Join Date: Apr 2014
Posts: 120

02 Sep 2014, 07:05

Hi all,
I am confused by the variables -stsplit- generates and I am not sure I checked the transformation properly. Let me ask for some independent review here.

I have spells of data for individuals, analyzing their retirement decisions. In one setup, the spells could start at birth, and end with "failure" at retirement (or without failure when my sample runs out at a specific point in calendar time). As I "only" observe all jobs after 1985, I also specify entry for stset. Exit is the same as the time for the censored observations, but I set it just in case. Time is measured at a monthly resolution, but I would work with years.

All this said, I think this is the proper -stset- line:

Code:

stset retmonth, origin(time birthmonth) enter(time ym(1985,1)) exit(time ym(2011,12)) id(PNR) failure(retired) scale(12)

This results in lines like this one:

Code:

   +---------------------------------------------------------------------------------------------------------------+
     | PNR   year   retired   lastmo~h   lyear   cohort   age     birthm~h   retmonth   _st   _d      _t         _t0 |
     |---------------------------------------------------------------------------------------------------------------|
  1. |   9   2011         0         12    2011     1951   60.75     1951m3    2011m12     1    0   60.75   33.833333 |
     +---------------------------------------------------------------------------------------------------------------+

Now I want to analyze covariates' effect on retirement, and as common, most covariates are available in annual panels using calendar time. Shouldn't the following line let me merge in (1:1 PNR year) data for 2000-2007?

Code:

stsplit y, after(time = ym(1999,12)) at(0(1)8) trim
replace year = y + 1999

But the generated y looks differently:

Code:

. l

     +--------------------------------------------------------------------------------------------------------------------+
     | PNR   year   retired   lastmo~h   lyear   cohort   Alder   birthm~h   retmonth   _st   _d      _t         _t0    y |
     |--------------------------------------------------------------------------------------------------------------------|
  1. |   9   2011         .         12    2011     1951   60.75     1951m3    1999m12     0    0   48.75   33.833333   -1 |
  2. |   9   2011         .         12    2011     1951   60.75     1951m3    2000m12     1    0   49.75       48.75    0 |
  3. |   9   2011         .         12    2011     1951   60.75     1951m3    2001m12     1    0   50.75       49.75    1 |
  4. |   9   2011         .         12    2011     1951   60.75     1951m3    2002m12     1    0   51.75       50.75    2 |
  5. |   9   2011         .         12    2011     1951   60.75     1951m3    2003m12     1    0   52.75       51.75    3 |
     |--------------------------------------------------------------------------------------------------------------------|
  6. |   9   2011         .         12    2011     1951   60.75     1951m3    2004m12     1    0   53.75       52.75    4 |
  7. |   9   2011         .         12    2011     1951   60.75     1951m3    2005m12     1    0   54.75       53.75    5 |
  8. |   9   2011         .         12    2011     1951   60.75     1951m3    2006m12     1    0   55.75       54.75    6 |
  9. |   9   2011         .         12    2011     1951   60.75     1951m3    2007m12     1    0   56.75       55.75    7 |
 10. |   9   2011         0         12    2011     1951   60.75     1951m3    2011m12     0    0   60.75       56.75    8 |
     +--------------------------------------------------------------------------------------------------------------------+

E.g. The failure variable is not filled, even though -stsplit- could understand from the original data that the new spells end without failure. More importantly, I am confused by the scaling of the the generated 'y'. From the age information, shouldn't I infer that `year = y + 2000` instead? Why is that, and why did I get a y = -1 record for 1999 when I was splitting after December 1999, at 0? In any case, I see spell -1 start in 1985 (the original enter time), and the last record y=8 start on January 1, 2008 12 a.m. and end at exit or failure?

Constructively, replacing `retired` to 0 when missing, and using `year = y + 2000` would be the correct data to then use with time-varying covariates? (OK, it is also relevant how my covariates are measured. They are measured at end-of-calendar-year, so maybe they should be predict only the next spell. That said, some flow covariates describe what happened over the same year, which affects hazard the same year, not the following. (And I will merge in leads and lags to capture the timing of effects anyway.)

Tags: None

Announcement

stsplit help