Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping with Multiple StartDate-EndDate Variables

    Dear Statalist,

    I have a dataset that with data on around 2,000 individuals and their employment histories. These histories cover a varying length of time upto about 50 years. The data are currently in a form {name, title, job1_location, job1_startdate job1_enddate, job1_type, job1_notes, job2_location, ..., job15_location, job15_startdate job15_enddate, job15_type, job15_notes, otherrole1_name, otherrole1_startdate, otherrole1_enddate, otherrole2_name, ..., boardposition1_name, boardposition1_startdate1,..., ... }

    In total there are around 1,100 variables. I would like a dataset of the form:

    {name, date, title, job, location, notes, otherrole1_name, otherrole2_name, boardposition1_name, boardpositoon2_name}

    That is individuals may only have one job at a time but they can have several (but not many) other roles and potentially several (but again not many) other board positions. The dates are recorded as being continuous but I was planning on assembling a dataset with a frequency of around 6months.

    I have searched ardently, and experimented at length with the reshape command but confess to being at this point extremely confused. I tried to split the data up in to many individual datasets which could then be reshaped individually but this seems not to be a good strategy as it is not always the case that job10 follows after job9, for example). There are also sometimes gaps in the data where people leave but re-enter the dataset. I apologise for any duplication in this question, and would be immensely grateful for any help or suggestions.

    Thanks,

    Stuart



  • #2
    Perhaps if you clarify some things. FIrst, your target data set has a single variable called date. How does this relate to the startdate and enddate variables in the source data set. Is it a start date? Or an end date? Or some date in between--if so, what is the rule to find it?)

    Second, or the other roles 1 and 2, and board positions 1 and 2 in the target data set, to have the same information as the variables with those names in the source data set, or will you be selecting 1 or 2 to keep from among the many? If selecting, by what criteria?

    Comment


    • #3
      Dear Clyde,

      Many thanks indeed for your response. The target dataset would have a date variable that took values of every six-months. So, for example my initial data looks like this:

      Name Title Job1_location Job1_StartDate Job1_EndDate Job1_Type Job2_Location Job2_StartDate Job2_EndDate Job2_Type Job3_Location Job3_StartDate Job3_EndDate Job3_Type
      David Smith Mr Phoenix, AZ 07/01/1990 04/07/1995 Sales Phoenix, AZ 07/05/1995 12/11/2001 Sales Washington, DC 01/01/2002 . Management
      Anna Other Mrs Washington, DC 04/01/1958 04/11/1972 Operations Washington, DC 04/12/1972 01/26/1974 Finance Washington, DC 01/27/1974 7/1/1988 Operations

      My target dataset would look like:

      Date Name Title Location Type
      07/01/1990 David Smith Mr Phoenix, AZ Sales
      01/01/1991 David Smith Mr Phoenix, AZ Sales
      07/01/1991 David Smith Mr Phoenix, AZ Sales
      etc
      07/01/1995 David Smith Mr Phoenix, AZ Sales
      etc
      07/01/2002 David Smith Mr Washington, DC Management
      07/01/1958 Anna Other Mrs Washington, DC Operations
      01/01/1959 Anna Other Mrs Washington, DC Operations
      etc
      07/01/1972 Anna Other Mrs Washington,DC Finance
      01/01/1973 Anna Other Mrs Washington, DC Finance
      etc
      07/01/1974 Anna Other Mrs Washington, DC Operations
      etc
      07/01/1988 Anna Other Mrs Washington, DC Operations
      But, each row in my original dataset also contains other variables specifying other roles held for other, overlapping periods. The number of these in the target dataset is given by the maximum number of positions simultaneously held by an individual - but I am confident it is not more than 3. So in this respect my data look like;

      Name Location(...)job15_type Otherrole_1_name OR_1_startdate OR_1_enddate OR_2_name OR_2_startdate OR_2_enddate boardposition_1 BP_1_startdate BP_1_enddate BP_2 BP_StartDate BP_Enddate
      David Smith First Aider 12/12/1992 12/12/1997 Key Holder 01/01/1995 01/01/2002 In Attendance 1/1/2008 1/1/2010 SalesDirector 1/1/2010 .
      Anna Other Recruitment Expert 04/01/1960 11/10/1966 First Aider 04/01/1974 7/1/1988
      Name Date OtherRole_1 OtherRole_2 OtherRole_3 BP_1 BP_2
      David Smith 01/01/1993 First Aider
      David Smith 07/01/1993 First Aider
      etc
      David Smith 01/01/1995 First Aider Key Holder
      etc
      David Smith 01/01/1998 Key Holder
      etc
      David Smith 01/01/2002
      etc
      David Smith 01/01/2008 In Attendance
      etc
      David Smith 01/01/2010 Sales Director
      I hope this makes my initial question a bit clearer - I apologise for any remaining lack of clarity.

      Thanks again,

      Stuart

      Comment


      • #4
        I'm afraid I'm still confused about what you want. For example, in your target row for David Smith with date 1/1/91, where does the information about location and type come from? I don't see any entries for David Smith having that date in your source data set.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          I'm afraid I'm still confused about what you want. For example, in your target row for David Smith with date 1/1/91, where does the information about location and type come from? I don't see any entries for David Smith having that date in your source data set.
          Clyde,

          location and type information come from variables -JobXLocation- and -JobXType-. The date 1/1/91 comes from Stuart wanting to create observations that have dates in a 6-month interval. So if someone starts a job in 01/01/2010 and ends in 01/01/2011, he would like three observations for that person: one with date 01/01/2010, a second with 01/07/2010, a third with 01/01/2011. Also, he is "advancing" the dates of his original dataset. For example, Anna Other starts her first job in 04/01/1958, but his first observation for her in the target dataset is 07/01/1958. I don't know if this is usual procedure in his area of research, but information is lost with this strategy.

          That's my interpretation, at least.

          Stuart should follow advice on listing data in the forum. Below are some suggestions.
          You should:

          1. Read the FAQ carefully.

          2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

          3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

          4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

          Comment


          • #6
            Dear Clyde and Roberto,

            Again I apologise for not being sufficiently clear. My aim to transform the data to 6-monthly intervals is indeed wasteful of information. I confess that this reflects my, potentially erroneous, judgement that a dataset preserving exact dates would result in a dataset so large as to be unwieldily as there will be many unique dates. But, given the nature of the data it I believe it shouldn't matter too much. If there were a solution that did not result in an enormous dataset that preserved the fine date information in the data then I would be delighted to learn of it, but I was trying to make my problem as feasible as possible.

            Roberto's interpretation of what I am trying to do is correct. In the below I have followed his advice and reproduced my example as Stata output.


            Initially I have data that is of the following form:

            Code:
            . list, noobs
            
              +--------------------------------------------------------------------------------------------------------------------------+
              |        name | title |  job1_location | job1_sta~e | job1_end~e |  job1_type |  job2_location | job2_sta~e  | job2_end~e  |
              |  Anna Other |   Mrs | Washington, DC | 04/01/1958 | 04/11/1972 | Operations | Washington, DC | 04/12/1972  | 01/26/1974  |
              |---------------------------------------------------+------------+---------------------------------------------------------|
              | job2_t~e |  job3_location | job3_sta~e | job3_e~e |  job3_type | locati~e |   otherrole_1_name | or_1_sta~e | or_1_end~e |
              |  Finance | Washington, DC | 01/27/1974 | 7/1/1988 | Operations |        . | Recruitment Expert | 04/01/1960 | 11/10/1966 |
              |----------------------------------------+---------------------------------------------------------------------------------|
              |   or_2_name | or_2_sta~e | or_2_end~e  | boardpositi~1  | bp_1_s~e  | bp_1_e~e  |          bp_2  | bp_sta~e  | bp_end~e  |
              | First Aider | 04/01/1974 |   7/1/1988  |                |           |           |                |           |        .  |
              +--------------------------------------------------------------------------------------------------------------------------+
            
              +--------------------------------------------------------------------------------------------------------------------------+
              |        name | title |  job1_location | job1_sta~e | job1_end~e |  job1_type |  job2_location | job2_sta~e  | job2_end~e  |
              | David Smith |    Mr |    Phoenix, AZ | 07/01/1990 | 04/07/1995 |      Sales |    Phoenix, AZ | 07/05/1995  | 12/11/2001  |
              |---------------------------------------------------+------------+---------------------------------------------------------|
              | job2_t~e |  job3_location | job3_sta~e | job3_e~e |  job3_type | locati~e |   otherrole_1_name | or_1_sta~e | or_1_end~e |
              |    Sales | Washington, DC | 01/01/2002 |        . | Management |        . |        First Aider | 12/12/1992 | 12/12/1997 |
              |----------------------------------------+---------------------------------------------------------------------------------|
              |   or_2_name | or_2_sta~e | or_2_end~e  | boardpositi~1  | bp_1_s~e  | bp_1_e~e  |          bp_2  | bp_sta~e  | bp_end~e  |
              |  Key Holder | 01/01/1995 | 01/01/2002  | In Attendance  | 1/1/2008  | 1/1/2010  | SalesDirector  | 1/1/2010  |        .  |
              +--------------------------------------------------------------------------------------------------------------------------+
            I want to transform this such that instead of observations defined by name, they are defined by a name - date pair.

            Code:
            drop bp_2
            . drop otherrole_3
            
            . list ,noobs
            
              +-------------------------------------------------------------------------------------------------------------------+
              |        name         date          otherrole_1   otherrol~2             bp_1   title         location         type |
              |-------------------------------------------------------------------------------------------------------------------|
              |  Anna Other   01/01/1959                                                        Mrs   Washington, DC   Operations |
              |  Anna Other   01/01/1961   Recruitment Expert                                   Mrs   Washington, DC   Operations |
              |  Anna Other   01/01/1967                                                        Mrs   Washington, DC   Operations |
              |  Anna Other   01/01/1973                                                        Mrs   Washington, DC      Finance |
              |  Anna Other   01/01/1974                                                        Mrs   Washington, DC   Operations |
              |-------------------------------------------------------------------------------------------------------------------|
              |  Anna Other   07/01/1958                                                        Mrs   Washington, DC   Operations |
              |  Anna Other   07/01/1960   Recruitment Expert                                   Mrs   Washington, DC   Operations |
              |  Anna Other   07/01/1966   Recruitment Expert                                   Mrs   Washington, DC   Operations |
              |  Anna Other   07/01/1972                                                        Mrs   Washington, DC      Finance |
              |  Anna Other   07/01/1974          First Aider                                   Mrs   Washington, DC   Operations |
              |-------------------------------------------------------------------------------------------------------------------|
              |  Anna Other   07/01/1988          First Aider                                   Mrs   Washington, DC   Operations |
              | David Smith   01/01/1991                                                         Mr      Phoenix, AZ        Sales |
              | David Smith   01/01/1993          First Aider                                    Mr      Phoenix, AZ        Sales |
              | David Smith   01/01/1995          First Aider   Key Holder                       Mr      Phoenix, AZ        Sales |
              | David Smith   01/01/1998           Key Holder                                    Mr      Phoenix, AZ        Sales |
              |-------------------------------------------------------------------------------------------------------------------|
              | David Smith   01/01/2002           Key Holder                                    Mr      Phoenix, AZ        Sales |
              | David Smith   01/01/2008                                      In Attendance      Mr   Washington, DC   Management |
              | David Smith   01/01/2010                                     Sales Director      Mr   Washington, DC   Management |
              | David Smith   07/01/1990                                                         Mr      Phoenix, AZ        Sales |
              | David Smith   07/01/1991                                                         Mr      Phoenix, AZ        Sales |
              |-------------------------------------------------------------------------------------------------------------------|
              | David Smith   07/01/1993          First Aider                                    Mr      Phoenix, AZ        Sales |
              | David Smith   07/01/1995          First Aider                                    Mr      Phoenix, AZ        Sales |
              | David Smith   07/01/2002                                                         Mr   Washington, DC   Management |
              +-------------------------------------------------------------------------------------------------------------------+
            Please note - I have dropped bp_2 and otherrole_3 here in the interests of presentation. In the full dataset they would be populated.

            Many thanks for any help,


            James

            Comment


            • #7
              So, assuming Roberto has correctly interpreted your needs, I think this has to be done in several steps. The first thing is to create separate data sets, each long, containing the jobs, the other roles, and the board positions:

              Code:
              preserve
              keep name title job*
              reshape long job@_location job@_startdate job@_enddate job@_type job@_location, i(name title) j(seq)
              tempfile jobs
              save `jobs'
              restore, preserve
              keep name title otherrole*
              reshape long otherrole@_name otherrole@_startdate otherrole@_enddate, i(name title) j(seq)
              tempfile otherroles
              save `otherroles'
              restore, preserve
              keep name title boardposition*
              reshape long boardposition@_name boardposition@_startdate boardposition@_enddate, i(name title) j(seq)
              tempfile boardpositions
              save `boardpositions'
              Then the next step is to create a shell file with the names and the range of dates you are looking for for each person
              Code:
              // IDENETIFY FIRST AND LAST DATES OF ANY ACTIVITY
              // AND REDUCE TO ONE RECORD PER PERSON
              egen long firstdate = rowmin(*startdate)
              egen long lastdate = rowmax(*enddate)
              collapse (min) firstdate (max) lastdate, by(name position)
              
              // CONVERT TO STATA HALF-YEARLY DATES
              gen int begin = hofd(firstdate)
              gen int end = hofd(lastdate)
              format begin end %th
              keep name position begin end
              
              // GENERATE A NEW RECORD FOR EACH HALF-YEAR FROM BEGIN TO END
              expand end-begin+1
              by name position, sort: gen half_year = begin + _n - 1
              gen long date = dofh(half_year)
              format date %td
              keep name position date
              Now we have to connect this shell with the jobs, other roles and board positions files. I'll illustrate it just for the jobs, and the code for the others is analogous. The idea is to match every record for the person in the shell with all records in the job file for that person and then weed out the ones where the dates don't align.

              Code:
              joinby name position using `jobs', unmatched(master)
              drop if !inrange(date, job_startdate, job_enddate) & _merge != 1
              drop _merge
              I have not tested this, and I may have mangled some of your variable names. But I think this will at least set you on a fruitful path. I hope this helps.

              Comment


              • #8
                Dear Clyde,

                I wanted to thank you for all your help with this - it took me a bit of time to get it working with my full dataset, but the approach you suggested was an excellent solution.

                Thanks!

                Stuart

                Comment

                Working...
                X