Reshaping with Multiple StartDate-EndDate Variables

Stuart Morrison

Join Date: Sep 2014

Posts: 34
#1

Reshaping with Multiple StartDate-EndDate Variables

15 Sep 2014, 07:42

Dear Statalist,

I have a dataset that with data on around 2,000 individuals and their employment histories. These histories cover a varying length of time upto about 50 years. The data are currently in a form {name, title, job1_location, job1_startdate job1_enddate, job1_type, job1_notes, job2_location, ..., job15_location, job15_startdate job15_enddate, job15_type, job15_notes, otherrole1_name, otherrole1_startdate, otherrole1_enddate, otherrole2_name, ..., boardposition1_name, boardposition1_startdate1,..., ... }

In total there are around 1,100 variables. I would like a dataset of the form:

{name, date, title, job, location, notes, otherrole1_name, otherrole2_name, boardposition1_name, boardpositoon2_name}

That is individuals may only have one job at a time but they can have several (but not many) other roles and potentially several (but again not many) other board positions. The dates are recorded as being continuous but I was planning on assembling a dataset with a frequency of around 6months.

I have searched ardently, and experimented at length with the reshape command but confess to being at this point extremely confused. I tried to split the data up in to many individual datasets which could then be reshaped individually but this seems not to be a good strategy as it is not always the case that job10 follows after job9, for example). There are also sometimes gaps in the data where people leave but re-enter the dataset. I apologise for any duplication in this question, and would be immensely grateful for any help or suggestions.

Thanks,

Stuart
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30085
#2

15 Sep 2014, 13:54

Perhaps if you clarify some things. FIrst, your target data set has a single variable called date. How does this relate to the startdate and enddate variables in the source data set. Is it a start date? Or an end date? Or some date in between--if so, what is the rule to find it?)

Second, or the other roles 1 and 2, and board positions 1 and 2 in the target data set, to have the same information as the variables with those names in the source data set, or will you be selecting 1 or 2 to keep from among the many? If selecting, by what criteria?
Comment

Stuart Morrison

Join Date: Sep 2014
Posts: 34

15 Sep 2014, 15:07

Dear Clyde,

Many thanks indeed for your response. The target dataset would have a date variable that took values of every six-months. So, for example my initial data looks like this:

Name

Title

Job1_location

Job1_StartDate

Job1_EndDate

Job1_Type

Job2_Location

Job2_StartDate

Job2_EndDate

Job2_Type

Job3_Location

Job3_StartDate

Job3_EndDate

Job3_Type

David Smith

Phoenix, AZ

07/01/1990

04/07/1995

Sales

Phoenix, AZ

07/05/1995

12/11/2001

Sales

Washington, DC

01/01/2002

Management

Anna Other

Mrs

Washington, DC

04/01/1958

04/11/1972

Operations

Washington, DC

04/12/1972

01/26/1974

Finance

Washington, DC

01/27/1974

7/1/1988

Operations

My target dataset would look like:

Date	Name	Title	Location	Type
07/01/1990	David Smith	Mr	Phoenix, AZ	Sales
01/01/1991	David Smith	Mr	Phoenix, AZ	Sales
07/01/1991	David Smith	Mr	Phoenix, AZ	Sales
etc
07/01/1995	David Smith	Mr	Phoenix, AZ	Sales
etc
07/01/2002	David Smith	Mr	Washington, DC	Management
07/01/1958	Anna Other	Mrs	Washington, DC	Operations
01/01/1959	Anna Other	Mrs	Washington, DC	Operations
etc
07/01/1972	Anna Other	Mrs	Washington,DC	Finance
01/01/1973	Anna Other	Mrs	Washington, DC	Finance
etc
07/01/1974	Anna Other	Mrs	Washington, DC	Operations
etc
07/01/1988	Anna Other	Mrs	Washington, DC	Operations

But, each row in my original dataset also contains other variables specifying other roles held for other, overlapping periods. The number of these in the target dataset is given by the maximum number of positions simultaneously held by an individual - but I am confident it is not more than 3. So in this respect my data look like;

Name

Location(...)job15_type

Otherrole_1_name

OR_1_startdate

OR_1_enddate

OR_2_name

OR_2_startdate

OR_2_enddate

boardposition_1

BP_1_startdate

BP_1_enddate

BP_2

BP_StartDate

BP_Enddate

David Smith

First Aider

12/12/1992

12/12/1997

Key Holder

01/01/1995

01/01/2002

In Attendance

1/1/2008

1/1/2010

SalesDirector

1/1/2010

Anna Other

Recruitment Expert

04/01/1960

11/10/1966

First Aider

04/01/1974

7/1/1988

Name	Date	OtherRole_1	OtherRole_2	OtherRole_3	BP_1	BP_2
David Smith	01/01/1993	First Aider
David Smith	07/01/1993	First Aider
etc
David Smith	01/01/1995	First Aider	Key Holder
etc
David Smith	01/01/1998		Key Holder
etc
David Smith	01/01/2002
etc
David Smith	01/01/2008				In Attendance
etc
David Smith	01/01/2010				Sales Director

I hope this makes my initial question a bit clearer - I apologise for any remaining lack of clarity.

Thanks again,

Stuart

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30085
#4

15 Sep 2014, 20:54

I'm afraid I'm still confused about what you want. For example, in your target row for David Smith with date 1/1/91, where does the information about location and type come from? I don't see any entries for David Smith having that date in your source data set.
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#5

15 Sep 2014, 22:59

Originally posted by Clyde Schechter View Post

I'm afraid I'm still confused about what you want. For example, in your target row for David Smith with date 1/1/91, where does the information about location and type come from? I don't see any entries for David Smith having that date in your source data set.

Clyde,

location and type information come from variables -JobXLocation- and -JobXType-. The date 1/1/91 comes from Stuart wanting to create observations that have dates in a 6-month interval. So if someone starts a job in 01/01/2010 and ends in 01/01/2011, he would like three observations for that person: one with date 01/01/2010, a second with 01/07/2010, a third with 01/01/2011. Also, he is "advancing" the dates of his original dataset. For example, Anna Other starts her first job in 04/01/1958, but his first observation for her in the target dataset is 07/01/1958. I don't know if this is usual procedure in his area of research, but information is lost with this strategy.

That's my interpretation, at least.

Stuart should follow advice on listing data in the forum. Below are some suggestions.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment

Stuart Morrison

Join Date: Sep 2014
Posts: 34

16 Sep 2014, 09:26

Dear Clyde and Roberto,

Again I apologise for not being sufficiently clear. My aim to transform the data to 6-monthly intervals is indeed wasteful of information. I confess that this reflects my, potentially erroneous, judgement that a dataset preserving exact dates would result in a dataset so large as to be unwieldily as there will be many unique dates. But, given the nature of the data it I believe it shouldn't matter too much. If there were a solution that did not result in an enormous dataset that preserved the fine date information in the data then I would be delighted to learn of it, but I was trying to make my problem as feasible as possible.

Roberto's interpretation of what I am trying to do is correct. In the below I have followed his advice and reproduced my example as Stata output.

Initially I have data that is of the following form:

Code:

. list, noobs

  +--------------------------------------------------------------------------------------------------------------------------+
  |        name | title |  job1_location | job1_sta~e | job1_end~e |  job1_type |  job2_location | job2_sta~e  | job2_end~e  |
  |  Anna Other |   Mrs | Washington, DC | 04/01/1958 | 04/11/1972 | Operations | Washington, DC | 04/12/1972  | 01/26/1974  |
  |---------------------------------------------------+------------+---------------------------------------------------------|
  | job2_t~e |  job3_location | job3_sta~e | job3_e~e |  job3_type | locati~e |   otherrole_1_name | or_1_sta~e | or_1_end~e |
  |  Finance | Washington, DC | 01/27/1974 | 7/1/1988 | Operations |        . | Recruitment Expert | 04/01/1960 | 11/10/1966 |
  |----------------------------------------+---------------------------------------------------------------------------------|
  |   or_2_name | or_2_sta~e | or_2_end~e  | boardpositi~1  | bp_1_s~e  | bp_1_e~e  |          bp_2  | bp_sta~e  | bp_end~e  |
  | First Aider | 04/01/1974 |   7/1/1988  |                |           |           |                |           |        .  |
  +--------------------------------------------------------------------------------------------------------------------------+

  +--------------------------------------------------------------------------------------------------------------------------+
  |        name | title |  job1_location | job1_sta~e | job1_end~e |  job1_type |  job2_location | job2_sta~e  | job2_end~e  |
  | David Smith |    Mr |    Phoenix, AZ | 07/01/1990 | 04/07/1995 |      Sales |    Phoenix, AZ | 07/05/1995  | 12/11/2001  |
  |---------------------------------------------------+------------+---------------------------------------------------------|
  | job2_t~e |  job3_location | job3_sta~e | job3_e~e |  job3_type | locati~e |   otherrole_1_name | or_1_sta~e | or_1_end~e |
  |    Sales | Washington, DC | 01/01/2002 |        . | Management |        . |        First Aider | 12/12/1992 | 12/12/1997 |
  |----------------------------------------+---------------------------------------------------------------------------------|
  |   or_2_name | or_2_sta~e | or_2_end~e  | boardpositi~1  | bp_1_s~e  | bp_1_e~e  |          bp_2  | bp_sta~e  | bp_end~e  |
  |  Key Holder | 01/01/1995 | 01/01/2002  | In Attendance  | 1/1/2008  | 1/1/2010  | SalesDirector  | 1/1/2010  |        .  |
  +--------------------------------------------------------------------------------------------------------------------------+

I want to transform this such that instead of observations defined by name, they are defined by a name - date pair.

Code:

drop bp_2
. drop otherrole_3

. list ,noobs

  +-------------------------------------------------------------------------------------------------------------------+
  |        name         date          otherrole_1   otherrol~2             bp_1   title         location         type |
  |-------------------------------------------------------------------------------------------------------------------|
  |  Anna Other   01/01/1959                                                        Mrs   Washington, DC   Operations |
  |  Anna Other   01/01/1961   Recruitment Expert                                   Mrs   Washington, DC   Operations |
  |  Anna Other   01/01/1967                                                        Mrs   Washington, DC   Operations |
  |  Anna Other   01/01/1973                                                        Mrs   Washington, DC      Finance |
  |  Anna Other   01/01/1974                                                        Mrs   Washington, DC   Operations |
  |-------------------------------------------------------------------------------------------------------------------|
  |  Anna Other   07/01/1958                                                        Mrs   Washington, DC   Operations |
  |  Anna Other   07/01/1960   Recruitment Expert                                   Mrs   Washington, DC   Operations |
  |  Anna Other   07/01/1966   Recruitment Expert                                   Mrs   Washington, DC   Operations |
  |  Anna Other   07/01/1972                                                        Mrs   Washington, DC      Finance |
  |  Anna Other   07/01/1974          First Aider                                   Mrs   Washington, DC   Operations |
  |-------------------------------------------------------------------------------------------------------------------|
  |  Anna Other   07/01/1988          First Aider                                   Mrs   Washington, DC   Operations |
  | David Smith   01/01/1991                                                         Mr      Phoenix, AZ        Sales |
  | David Smith   01/01/1993          First Aider                                    Mr      Phoenix, AZ        Sales |
  | David Smith   01/01/1995          First Aider   Key Holder                       Mr      Phoenix, AZ        Sales |
  | David Smith   01/01/1998           Key Holder                                    Mr      Phoenix, AZ        Sales |
  |-------------------------------------------------------------------------------------------------------------------|
  | David Smith   01/01/2002           Key Holder                                    Mr      Phoenix, AZ        Sales |
  | David Smith   01/01/2008                                      In Attendance      Mr   Washington, DC   Management |
  | David Smith   01/01/2010                                     Sales Director      Mr   Washington, DC   Management |
  | David Smith   07/01/1990                                                         Mr      Phoenix, AZ        Sales |
  | David Smith   07/01/1991                                                         Mr      Phoenix, AZ        Sales |
  |-------------------------------------------------------------------------------------------------------------------|
  | David Smith   07/01/1993          First Aider                                    Mr      Phoenix, AZ        Sales |
  | David Smith   07/01/1995          First Aider                                    Mr      Phoenix, AZ        Sales |
  | David Smith   07/01/2002                                                         Mr   Washington, DC   Management |
  +-------------------------------------------------------------------------------------------------------------------+

Please note - I have dropped bp_2 and otherrole_3 here in the interests of presentation. In the full dataset they would be populated.

Many thanks for any help,

James

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30085

16 Sep 2014, 09:30

So, assuming Roberto has correctly interpreted your needs, I think this has to be done in several steps. The first thing is to create separate data sets, each long, containing the jobs, the other roles, and the board positions:

Code:

preserve
keep name title job*
reshape long job@_location job@_startdate job@_enddate job@_type job@_location, i(name title) j(seq)
tempfile jobs
save `jobs'
restore, preserve
keep name title otherrole*
reshape long otherrole@_name otherrole@_startdate otherrole@_enddate, i(name title) j(seq)
tempfile otherroles
save `otherroles'
restore, preserve
keep name title boardposition*
reshape long boardposition@_name boardposition@_startdate boardposition@_enddate, i(name title) j(seq)
tempfile boardpositions
save `boardpositions'

Then the next step is to create a shell file with the names and the range of dates you are looking for for each person

Code:

// IDENETIFY FIRST AND LAST DATES OF ANY ACTIVITY
// AND REDUCE TO ONE RECORD PER PERSON
egen long firstdate = rowmin(*startdate)
egen long lastdate = rowmax(*enddate)
collapse (min) firstdate (max) lastdate, by(name position)

// CONVERT TO STATA HALF-YEARLY DATES
gen int begin = hofd(firstdate)
gen int end = hofd(lastdate)
format begin end %th
keep name position begin end

// GENERATE A NEW RECORD FOR EACH HALF-YEAR FROM BEGIN TO END
expand end-begin+1
by name position, sort: gen half_year = begin + _n - 1
gen long date = dofh(half_year)
format date %td
keep name position date

Now we have to connect this shell with the jobs, other roles and board positions files. I'll illustrate it just for the jobs, and the code for the others is analogous. The idea is to match every record for the person in the shell with all records in the job file for that person and then weed out the ones where the dates don't align.

Code:

joinby name position using `jobs', unmatched(master)
drop if !inrange(date, job_startdate, job_enddate) & _merge != 1
drop _merge

I have not tested this, and I may have mangled some of your variable names. But I think this will at least set you on a fruitful path. I hope this helps.

Comment

Stuart Morrison

Join Date: Sep 2014

Posts: 34
#8

29 Sep 2014, 04:12

Dear Clyde,

I wanted to thank you for all your help with this - it took me a bit of time to get it working with my full dataset, but the approach you suggested was an excellent solution.

Thanks!

Stuart
Comment

Announcement