Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert a data with repeating observations to unique observation

    I have an employer-employee data at quarterly frequency. Wid shows the worker id, firmid shows the firm she works, and Mday shows the number of days a particular worker (wid) works in that particular firm (firmid) at that quarter. There are gaps in my data. Moreover there are workers who work more than 1 firm in the same quarter. I want to clean the data so that for each quarter, a worker is assigned to one single firm only. As a first step, I assign the worker to the firm where she worked the longest. There is no problem here. But when there are ties, ( I mean when a worker works for equal days at different firms at the same period), I need a rule. Here is an example of my data with ties:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year byte(quarter wid firmid mday)
    2012 4 1 10 30
    2013 1 1 10 15
    2013 1 1 20 15
    2013 2 1 30 30
    2013 3 1 30 30
    2013 4 1 30 30
    2014 1 1 30 30
    2014 2 1 30 30
    2014 3 1 30 30
    2014 4 1 30 30
    2012 4 2 10 20
    2012 4 2 20 20
    2012 4 2 30 20
    2013 1 2  .  .
    2013 2 2 20 30
    2013 3 2 20 30
    2013 4 2 20 30
    2014 1 2 20 30
    2014 2 2 20 30
    2014 3 2 20 30
    2014 4 2 20 30
    2012 4 3 20 30
    2013 1 3 30 15
    2013 1 3 40 15
    2013 2 3 50 30
    2013 3 3 50 30
    2013 4 3 50 30
    2014 1 3 50 30
    2014 2 3 50 30
    2014 3 3 50 30
    2014 4 3 50 30
    end
    I want to specify a rule such that if there are ties at period t, then (1) choose the firm where she worked at the preceeding available date. To be more specific, for wid= 1 & date=2013q1, I want to assign the firm with firmid=10, since she worked there at 2012q4.
    (2) If this is not the case or if the tie occurs at the first observation, then I want to assign the worker to the firm she works in the following avaliable date. For example for wid=2 and date=2012q4 I need to assign the firm 20 since she works there at 2013q2.
    (3) And if neither of the previous two conditions hold, I want my rule to choose one of the tie firms randomly (for instance it may choose the one with largest firmid, for ex, I want my rule to assign firm=40 to wid=3 for the period 2013q1). I want my cleaned data to look like this:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year byte(quarter wid firmid mday)
    2012 4 1 10 30
    2013 1 1 10 15
    2013 2 1 30 30
    2013 3 1 30 30
    2013 4 1 30 30
    2014 1 1 30 30
    2014 2 1 30 30
    2014 3 1 30 30
    2014 4 1 30 30
    2012 4 2 20 20
    2013 1 2  .  .
    2013 2 2 20 30
    2013 3 2 20 30
    2013 4 2 20 30
    2014 1 2 20 30
    2014 2 2 20 30
    2014 3 2 20 30
    2014 4 2 20 30
    2012 4 3 20 30
    2013 1 3 40 15
    2013 2 3 50 30
    2013 3 3 50 30
    2013 4 3 50 30
    2014 1 3 50 30
    2014 2 3 50 30
    2014 3 3 50 30
    2014 4 3 50 30
    end


    How can I write such a rule to convert my data in a form one worker is matched with only one firm for each quarter? Many Thanks in advance.

  • #2
    Code:
    drop if missing(firmid, mday)
    gen qdate = yq(year, quarter)
    format qdate %tq
    by wid (qdate), sort: gen seq = _n
    
    //  IDENTIFY OBSERVATIONS THAT HAVE MAXIMAL WORKDAYS FOR WID IN EACH QUARTER
    by wid qdate (mday), sort: gen byte candidate = (mday == mday[_N])
    
    
    //  IDENTIFY WHETHER WORKER HAS WORKED AT THE FIRM IN PREVIOUS QUARTER
    gen long obs_no = _n
    rangestat (count) previous = obs_no, interval(seq -1 -1) by(wid firmid)
    
    //  IDENTIFY WHETHER WORKER WORKS AT THE FIRM IN SUBSEQUENT QUARTER
    rangestat (count) subsequent = obs_no, interval(seq 1 1) by(wid firmid)
    
    mvencode previous subsequent, mv(0)
    
    //  CREATE A RANDOM ORDER AMONG THE SURVIVING CANDIDATES IF THERE ARE STILL UNRESOLVED TIES
    set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED
    gen double shuffle = runiform()
    
    by wid qdate(candidate previous subsequent shuffle), sort: keep if _n == _N
    Note: This does not exactly reproduce your desired results because the random selections that were applied when there was no suitable previous or next quarter firm were made differently from what you did.

    Added: -rangejoin- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.
    Last edited by Clyde Schechter; 14 Aug 2022, 11:20.

    Comment


    • #3
      Dear Clyde, many thanks for helping, the code works perfectly!

      Comment

      Working...
      X