I have a dataset of worker employment spells, and I would like a general way to identify an individual's co-workers when the individual starts a job. Consider the following example dataset:
When worker 1 starts work in firm 1, she has no co-workers, same when she starts in firm 2. When worker 2 starts in firm 1, worker 1 is her co-worker until time period 4, and worker 3 is her co-worker from time period 6 on. When worker 3 starts, worker 2 is her co-worker for the time worker 3 is in . When worker 4 starts, she has no co-workers.
The following code illustrates something like what I'm after
I say "general way" because while this code would do what I want on test data, it would not in general work. I have 40 million spells, so looping, replacing, or sorting repeatedly will take far too long. The code could also fail in various ways: (i) I could easily hit the size limit for the macro -coworkers`i'-; (ii) the code assumes that each worker works a single spell in a firm; (iii) I'm sure people can come up with other reasonble cases in which this code would fail.
I realise this a broad, somewhat vague question, but it would already be great if anyone has suggestions for commands or features of stata's syntax I could exploit to radically speed this up and make is more robust, or could point me towards references that might have suggestions for how to go about solving this. Even improving individuals steps would be helpful.
There are many things I would like to subsequently do with these data. Two important ones are calculate the number of coworkers with certain charactersitics a worker has during a given spell, and see whether a worker has former coworkers in a firm in a subsequent spell with a different firm. I am using Stata 14.1
Code:
input worker_id firm_id start_date end_date 1 1 1 4 1 2 5 10 2 1 2 8 2 2 9 10 3 1 6 7 4 3 2 7 end
The following code illustrates something like what I'm after
Code:
levelsof firm_id, local(firms) tempfile worker_ds save "`worker_ds'" clear gen worker_id = . tempfile coworker_ds save "`coworker_ds'" foreach j of local firms { di "" di "results for firm `j'" di "" use "`worker_ds'", clear keep if firm_id == `j' count local obs = r(N) total_coworkers = . * identify set of coworkers for each worker in firm j forvalues i = 1/`obs' { local coworkers`i' "" local n_coworkers = 0 forvalues ii = 1/`obs' { if start_date[`ii'] < end_date[`i'] & end_date[`ii'] > start_date[`i'] & `i' != `ii' { local nextworker = worker_id[`ii'] local coworkers`i' `coworkers`i'' `nextworker' local n_coworkers = `n_coworkers' + 1 } } replace total_coworkers = `n_coworkers' if _n == `i' } * create a worker-coworker level dataset expand total_coworkers_all, generate(expand_obs) gen coworker_id = . sort worker_id local n = 1 * fill in values for coworker_id variable forvalues i = 1/`obs' { di "`i''s coworkers are `coworkers`i''" foreach id of local coworkers`i' { replace coworker_id = `id' if _n == `n' local n = `n' + 1 } } append using "`coworker_ds'" save "`coworker_ds'", replace }
I realise this a broad, somewhat vague question, but it would already be great if anyone has suggestions for commands or features of stata's syntax I could exploit to radically speed this up and make is more robust, or could point me towards references that might have suggestions for how to go about solving this. Even improving individuals steps would be helpful.
There are many things I would like to subsequently do with these data. Two important ones are calculate the number of coworkers with certain charactersitics a worker has during a given spell, and see whether a worker has former coworkers in a firm in a subsequent spell with a different firm. I am using Stata 14.1
Comment