Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Breaking tie

    Dear Stata users

    I have the following dataset example:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(referal_CWid referal_year referal_officeid) double(nos_off nos_ref) float(tenure mult_off_dummy multi_off_dummy)
     22 2013  118 1 46 3 0 0
     22 2014  118 1 19 3 0 0
     22 2015 1830 2  1 1 1 0
     22 2015  118 2  1 3 1 0
    366 2006 1715 1  4 9 0 0
    366 2007 1715 1 16 9 0 0
    366 2008 1715 1 40 9 0 0
    366 2009 1730 3 29 3 0 1
    366 2009 1737 3  1 1 0 1
    366 2009 1715 3 24 9 0 1
    366 2010 1730 1 38 3 0 0
    366 2011 1730 2  9 3 1 0
    366 2011 1715 2  8 9 1 0
    366 2012 1715 1  6 9 0 0
    366 2013 1715 1  8 9 0 0
    366 2014 1715 1  5 9 0 0
    366 2015 1715 1  4 9 0 0
    end
    Here, referal_CWid is the person id, referal_officeid is office id and referal_year is year. It is also visible that few person works at different offices in a given year. Now, I would like to create a person year panel and for that I need to remove duplicates for a given year.
    My rule of removal is: (i) I would like to create a variable called major_office which will be the one in a given year that has the maximum nos_ref (number of referral). (ii) In case of a tie, I would like to keep the office for which tenure is maximum.

    Would highly appreciate if anyone can suggest how can I do this cleaning.

    Regards,
    Zariab Hossain
    Uppsala University

  • #2
    I created the following code which serves the purpose. Would be nice, if anyone can provide a suggestion

    bys referal_CWid referal_year (nos_ref ): egen major_off=max(nos_ref)
    replace major_off=. if nos_ref !=major_off
    replace major_off=referal_officeid if !missing(major_off)

    keep if !missing(major_off)
    drop major_off

    bys referal_CWid referal_year (tenure ): egen major_off=max(tenure)
    replace major_off=. if tenure !=major_off
    replace major_off=referal_officeid if !missing(major_off)

    keep if !missing(major_off)
    drop major_off


    In this way, I can create office by year panel.

    Thanks for any future suggestions.


    Comment


    • #3
      I think this can be simplified to a one-liner:
      Code:
      by referral_CWid referral_year (nos_ref tenure), sort: keep if _n == _N
      This will work correctly if, as is the case in the example data, referal_CWid, referal_year, and referal_officeid uniquely identify observations. But if the same referal_CWID and referal_officeid can have multiple occurrences in the data in the same year, then it would be slightly different:
      Code:
      by referal_CWid referal_year (nos_ref tenure), sort: ///
          keep if referal_officeid == referal_officeid[_N]
      This will retain the same referal_officeid as the preceding code, but it will retain all of the observations of that officeid for the given referal_CWid and year.

      Note: You may not have fully resolved the problem of tie breaking in any case. Is it possible that there will be a remaining tie even after taking number of referrals and tenure into account? If so, this code generalizes easily by just adding more variables after tenure inside the parentheses of the -by- prefix.

      Comment

      Working...
      X