I have an employer-employee data at quarterly frequency. Wid shows the worker id, firmid shows the firm she works, and Mday shows the number of days a particular worker (wid) works in that particular firm (firmid) at that quarter. There are gaps in my data. Moreover there are workers who work more than 1 firm in the same quarter. I want to clean the data so that for each quarter, a worker is assigned to one single firm only. As a first step, I assign the worker to the firm where she worked the longest. There is no problem here. But when there are ties, ( I mean when a worker works for equal days at different firms at the same period), I need a rule. Here is an example of my data with ties:
I want to specify a rule such that if there are ties at period t, then (1) choose the firm where she worked at the preceeding available date. To be more specific, for wid= 1 & date=2013q1, I want to assign the firm with firmid=10, since she worked there at 2012q4.
(2) If this is not the case or if the tie occurs at the first observation, then I want to assign the worker to the firm she works in the following avaliable date. For example for wid=2 and date=2012q4 I need to assign the firm 20 since she works there at 2013q2.
(3) And if neither of the previous two conditions hold, I want my rule to choose one of the tie firms randomly (for instance it may choose the one with largest firmid, for ex, I want my rule to assign firm=40 to wid=3 for the period 2013q1). I want my cleaned data to look like this:
How can I write such a rule to convert my data in a form one worker is matched with only one firm for each quarter? Many Thanks in advance.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year byte(quarter wid firmid mday) 2012 4 1 10 30 2013 1 1 10 15 2013 1 1 20 15 2013 2 1 30 30 2013 3 1 30 30 2013 4 1 30 30 2014 1 1 30 30 2014 2 1 30 30 2014 3 1 30 30 2014 4 1 30 30 2012 4 2 10 20 2012 4 2 20 20 2012 4 2 30 20 2013 1 2 . . 2013 2 2 20 30 2013 3 2 20 30 2013 4 2 20 30 2014 1 2 20 30 2014 2 2 20 30 2014 3 2 20 30 2014 4 2 20 30 2012 4 3 20 30 2013 1 3 30 15 2013 1 3 40 15 2013 2 3 50 30 2013 3 3 50 30 2013 4 3 50 30 2014 1 3 50 30 2014 2 3 50 30 2014 3 3 50 30 2014 4 3 50 30 end
(2) If this is not the case or if the tie occurs at the first observation, then I want to assign the worker to the firm she works in the following avaliable date. For example for wid=2 and date=2012q4 I need to assign the firm 20 since she works there at 2013q2.
(3) And if neither of the previous two conditions hold, I want my rule to choose one of the tie firms randomly (for instance it may choose the one with largest firmid, for ex, I want my rule to assign firm=40 to wid=3 for the period 2013q1). I want my cleaned data to look like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year byte(quarter wid firmid mday) 2012 4 1 10 30 2013 1 1 10 15 2013 2 1 30 30 2013 3 1 30 30 2013 4 1 30 30 2014 1 1 30 30 2014 2 1 30 30 2014 3 1 30 30 2014 4 1 30 30 2012 4 2 20 20 2013 1 2 . . 2013 2 2 20 30 2013 3 2 20 30 2013 4 2 20 30 2014 1 2 20 30 2014 2 2 20 30 2014 3 2 20 30 2014 4 2 20 30 2012 4 3 20 30 2013 1 3 40 15 2013 2 3 50 30 2013 3 3 50 30 2013 4 3 50 30 2014 1 3 50 30 2014 2 3 50 30 2014 3 3 50 30 2014 4 3 50 30 end
How can I write such a rule to convert my data in a form one worker is matched with only one firm for each quarter? Many Thanks in advance.

Comment