Hi Statalisters
I am trying to calculate the number of unique coworkers a worker has in a given time window, as well as the number of coworkers with certain characteristics. The following code illustrate what I would like to do.
The problem with this code is the -joinby- step. My dataset is way too large to efficiently form all pairwise combinations like this. I have considered splitting it into smaller chunks but even then the computations would take weeks, if nothing goes wrong. I am wondering if there is a way to perform the same calculations on the original dataset, without the -joinby- step, perhaps using -rangestat-? The difficulty I keep running into is that a worker can have multiple discontinuous spells in the same firm during the time period and I can't think of an elegant way around it.
Thanks for any suggestion
I am trying to calculate the number of unique coworkers a worker has in a given time window, as well as the number of coworkers with certain characteristics. The following code illustrate what I would like to do.
Code:
clear input id firm year male 1 1 1 1 1 2 2 1 1 1 3 1 1 1 4 1 2 1 1 0 2 1 2 0 2 3 3 0 2 0 4 0 3 1 2 1 3 1 3 1 3 4 4 1 4 . 2 1 4 4 4 1 5 1 3 0 5 1 4 0 6 5 1 1 6 5 2 1 end tempfile mydata save `mydata' // keep track of all workers keep id duplicates drop tempfile idlist save `idlist' // create all pairs of coworkers use `mydata' rename (id male) i_= joinby firm year using `mydata' // create all pairs of coworker-years drop if id == i_id // drop self-matches keep *id *male duplicates drop // keep one observation per unique coworker pair. // count coworkers with given characteristics gen own_sex = male == i_male // characteristic of interest collapse (count) id (sum) male own_sex, by(i_id) rename (id male own_sex) (N N_male N_own_sex) // add individuals with no coworkers back in rename i_id id merge 1:1 id using `idlist', nogen foreach var of varlist N* { replace `var' = 0 if missing(`var') }
Thanks for any suggestion
Comment