Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting person-year files to person-spell files

    Hi all,

    I am not sure if a solution is possible, but any help would be appreciated. I am not working with a traditional event-history data file; instead I have a person-year data set (which looks like a typical panel). A key question asks respondents when they started working in their current job, this is answered every year. Another key question asks respondents if they experienced a change in job over the last year. A sample of the data's current shape is listed below. The variable "idind", is a person specific ID. The variable "intyer" is the interview year. "Round" is the survey round. "Start" is the year the respondent started in their current job. "Dur" is a generated variable that's trying to capture duration (start - intyer). Finally, "Newjob" is a variable which asks people if they changed their job in the last year (this is simplified using "event").

    Code:
     list idind intyer round start dur newjob event in 1/20, sepby(idind)
    
         +----------------------------------------------------------+
         | idind   intyer   round   start_~r   dur   newjob   event |
         |----------------------------------------------------------|
      1. |     1     2009      18       1998    11        1       0 |
      2. |     1     2010      19       1998    12        1       0 |
      3. |     1     2011      20       2011     0        4       1 |
      4. |     1     2012      21       2011     1        1       0 |
      5. |     1     2013      22       2011     2        1       0 |
      6. |     1     2014      23       2011     3        1       0 |
         |----------------------------------------------------------|
      7. |     3     2009      18       2006     3        1       0 |
      8. |     3     2011      20       2010     1        1       0 |
      9. |     3     2012      21       2010     2        1       0 |
     10. |     3     2013      22       2013     0        4       1 |
     11. |     3     2014      23       2012     2        1       0 |
         |----------------------------------------------------------|
     12. |     5     2009      18       2008     1        1       0 |
     13. |     5     2010      19       2008     2        1       0 |
     14. |     5     2011      20       2008     3        1       0 |
     15. |     5     2012      21       2012     0        3       1 |
     16. |     5     2013      22       2011     2        1       0 |
     17. |     5     2014      23       2012     2        1       0 |
         |----------------------------------------------------------|
     18. |    27     2011      20       2010     1        1       0 |
         |----------------------------------------------------------|
     19. |    28     2009      18       2003     6        1       0 |
     20. |    28     2010      19       2004     6        1       0 |
         +----------------------------------------------------------+
    
    .
    I am trying to set this data up for a discrete-time survival analysis, but I can't figure out how to transform the frame above into a person-period file. Is this even possible or am I wasting my time? It seems like all the ingredients are there, but for example the duration variable keeps increasing +1 if nothing changes.

    Again, any help would be appreciated.

  • #2
    As a small bump, would it help if I changed the data frame to a wide format and worked from there?

    Comment


    • #3
      Hi Ivan,

      please have a look at the FAQ (also linked at the top of each page) on how to post, especially the part on how to present data examples. Easy-to-reproduce data examples make it much easier to track down your problem and assist in solving it.

      Thus said, I did so with your data; I guess that the code below does, more or less, what you want to achieve:
      Code:
      clear
      input idind intyer round start dur newjob event
       1 2009 18 1998 11 1 0
       1 2010 19 1998 12 1 0
       1 2011 20 2011 0 4 1
       1 2012 21 2011 1 1 0
       1 2013 22 2011 2 1 0
       1 2014 23 2011 3 1 0
       3 2009 18 2006 3 1 0
       3 2011 20 2010 1 1 0
       3 2012 21 2010 2 1 0
       3 2013 22 2013 0 4 1
       3 2014 23 2012 2 1 0
       5 2009 18 2008 1 1 0
       5 2010 19 2008 2 1 0
       5 2011 20 2008 3 1 0
       5 2012 21 2012 0 3 1
       5 2013 22 2011 2 1 0
       5 2014 23 2012 2 1 0
       27 2011 20 2010 1 1 0
       28 2009 18 2003 6 1 0
       28 2010 19 2004 6 1 0
       end
       
       * step 1: create a job id per person
       // first reported job per person gets jobid 1
       bysort idind (intyer) : generate jobid=1 if (_n==1)
       // subsequent reported jobs per person get incremented jobids
       bysort idind (intyer) : replace jobid=cond(newjob==1,jobid[_n-1],jobid[_n-1]+1) if (_n!=1)
       
       * step 2: reduce data to one observation per job per person
       // fill in the first reported start date per job in all observations of the job
       bysort idind jobid (intyer) : replace start=start[1]
       // keep last observation per job per person
       bysort idind jobid (intyer) : drop if (_n!=_N)
       // create an end date [assumption: jobs ended in the year preceding the interview with a job change]
       generate end=start[_n+1]-1 if jobid[_n+1]!=jobid & idind==idind[_n+1]
       // missing end year now means "no reported end yet"; fill in with survey year
       replace end=intyer if (missing(end))
       generate duration=end-start // this is the duration of each job
       
       * step 3: for discrete time survival analysis, expand the dataset to one observation per year
       // expand episodes
       expand duration+1
       // create episode year identifier
       bysort idind jobid (intyer) : generate episodeyear=start+_n-1
       
       // inspect result
       list idind episodeyear jobid start end , sepby(idind jobid)
      Note that there are inconsistencies in your data; this can be seen, for instance, in the observations of idind==3: the event and newjob markers say that person 3 switched jobs once. However, the data contain four different start dates (2006, 2010, 2012, 2013) to what we would expect to be 2 job episodes. Either this is an error, or I did not correctly understand the meaning of event and newjob.

      Regards
      Bela

      PS: My code could be shortened by using time series notation after -xtset-, of course; I wrote this in such detail in order to help you understand what I'm up to. Anyways, the answer to your question in post #2 is no: There are very rare occasions where things get easier in Stata after switching your data layout to wide. This is Schechter's law. This is, from my understanding, not one of these rare occasions.
      Last edited by Daniel Bela; 10 Nov 2017, 03:53. Reason: added PS

      Comment


      • #4
        My understanding is that annual surveys like this are conducted over a certain period of time and its unlikely that the same respondent was interviewed exactly one year to the date of the previous interview. Furthermore, people make mistakes when reporting past dates. You end up with situations like idind==5, rounds 21 to 23 where start and duration are not consistent.

        My instinct is always to focus on the variable that is likely to be the most reliable. With this data, I would use event (I assume this represents a change of job since the last annual interview) . The problem however is that there is a missing round for idind==3. To address this, I would fill in missing rounds and use rangestat (from SSC) to look forward for the earliest job start year. I would then fill in missing event values based on the difference between the interview year and the start year.

        Identifying spells is then simply a matter of tracking new job events. The spell start and end depend on if the job is ongoing on the first and last interview.

        None of this deals with job overlaps (multiple jobs).

        Code:
        clear
        input idind intyer round start dur newjob event
         1 2009 18 1998 11 1 0
         1 2010 19 1998 12 1 0
         1 2011 20 2011 0 4 1
         1 2012 21 2011 1 1 0
         1 2013 22 2011 2 1 0
         1 2014 23 2011 3 1 0
         3 2009 18 2006 3 1 0
         3 2011 20 2010 1 1 0
         3 2012 21 2010 2 1 0
         3 2013 22 2013 0 4 1
         3 2014 23 2012 2 1 0
         5 2009 18 2008 1 1 0
         5 2010 19 2008 2 1 0
         5 2011 20 2008 3 1 0
         5 2012 21 2012 0 3 1
         5 2013 22 2011 2 1 0
         5 2014 23 2012 2 1 0
         27 2011 20 2010 1 1 0
         28 2009 18 2003 6 1 0
         28 2010 19 2004 6 1 0
         end
         
        * verify assumptions about the data
        assert intyer >= start
        
        * fill in missing rounds
        xtset idind intyer
        tsfill
        
        * look forward to find start year for missing rounds
        rangestat (min) start, interval(intyer 0 .) by(idind)
        
        * fill in a new job event when missing
        gen new_job = event
        replace new_job = (intyer - start_min == 0) if mi(event)
        
        * group spells
        by idind: gen spell = sum(new_job)
        
        * spell start
        bysort idind spell (intyer): gen spell1 = cond(new_job[1], intyer[1], start[1])
        
        * spell end, note if the spell is ongoing at the last round
        by idind spell: gen spellN = intyer[_N]
        by idind: gen is_ongoing = spellN == intyer[_N]
        
        list , noobs compress sepby(idind spell)
        and the results from the first list
        Code:
        . list , noobs compress sepby(idind spell)
        
          +-----------------------------------------------------------------------------------------------------+
          | idind   int~r   round   start   dur   newj~   event   sta~n   n~_~b   spell   spe~1   spe~N   is_~g |
          |-----------------------------------------------------------------------------------------------------|
          |     1    2009      18    1998    11       1       0    1998       0       0    1998    2010       0 |
          |     1    2010      19    1998    12       1       0    1998       0       0    1998    2010       0 |
          |-----------------------------------------------------------------------------------------------------|
          |     1    2011      20    2011     0       4       1    2011       1       1    2011    2014       1 |
          |     1    2012      21    2011     1       1       0    2011       0       1    2011    2014       1 |
          |     1    2013      22    2011     2       1       0    2011       0       1    2011    2014       1 |
          |     1    2014      23    2011     3       1       0    2011       0       1    2011    2014       1 |
          |-----------------------------------------------------------------------------------------------------|
          |     3    2009      18    2006     3       1       0    2006       0       0    2006    2009       0 |
          |-----------------------------------------------------------------------------------------------------|
          |     3    2010       .       .     .       .       .    2010       1       1    2010    2012       0 |
          |     3    2011      20    2010     1       1       0    2010       0       1    2010    2012       0 |
          |     3    2012      21    2010     2       1       0    2010       0       1    2010    2012       0 |
          |-----------------------------------------------------------------------------------------------------|
          |     3    2013      22    2013     0       4       1    2012       1       2    2013    2014       1 |
          |     3    2014      23    2012     2       1       0    2012       0       2    2013    2014       1 |
          |-----------------------------------------------------------------------------------------------------|
          |     5    2009      18    2008     1       1       0    2008       0       0    2008    2011       0 |
          |     5    2010      19    2008     2       1       0    2008       0       0    2008    2011       0 |
          |     5    2011      20    2008     3       1       0    2008       0       0    2008    2011       0 |
          |-----------------------------------------------------------------------------------------------------|
          |     5    2012      21    2012     0       3       1    2011       1       1    2012    2014       1 |
          |     5    2013      22    2011     2       1       0    2011       0       1    2012    2014       1 |
          |     5    2014      23    2012     2       1       0    2012       0       1    2012    2014       1 |
          |-----------------------------------------------------------------------------------------------------|
          |    27    2011      20    2010     1       1       0    2010       0       0    2010    2011       1 |
          |-----------------------------------------------------------------------------------------------------|
          |    28    2009      18    2003     6       1       0    2003       0       0    2003    2010       1 |
          |    28    2010      19    2004     6       1       0    2004       0       0    2003    2010       1 |
          +-----------------------------------------------------------------------------------------------------+
        If you want to go from there to a long layout:
        Code:
        by idind spell: keep if _n == 1
        keep idind spell spell1 spellN is_ongoing
        expand spellN - spell1 + 1
        bysort idind spell: gen year = spell1 + _n - 1

        Comment


        • #5
          Many thanks for answering this. Both your answer have whole blocks of code I haven't used before. I have a couple of books on survival analysis including Mills' and the multi-level level book by Rabe-Hesketh, but neither of these talk about turning this data into a suitable format. Could you recommend anything that describes the techniques above? I've spent the morning running these blocks and coding a dependent variable. Will update soon!

          Comment

          Working...
          X