Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • imputing values for a variable

    Dear Stata users

    I am working with a dataset like below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(referal_CWid referal_officeid) float referal_month double(nos_off nos_js)
    4595 1209 569 1 1
    4595 1209 570 1 2
    4595 1209 570 1 2
    4595 1209 576 1 1
    4595 1209 577 1 1
    4595 1209 578 1 1
    4595 1202 590 1 1
    4595 1202 594 1 2
    4595 1202 594 1 2
    4595 1227 598 1 1
    4595 1227 599 1 1
    4595 1227 604 1 1
    4595 1227 605 1 1
    4595 1227 606 1 1
    4595 1227 611 1 1
    4595 1227 613 1 1
    4595 1227 618 2 1
    4595 1229 618 2 1
    4595 1228 619 3 1
    4595 1229 619 3 1
    4595 1227 619 3 1
    4595 1227 621 1 1
    4595 1227 623 1 1
    4595 1227 624 1 1
    4595 1227 625 1 1
    4595 1227 626 1 1
    4595 1227 642 1 1
    4595 1227 650 1 1
    end
    format %tm referal_month
    referal_CWid is person id; referal_officeid is office id; referal_month is a time (monthly) variable; nos_off denotes the number of branches of an office a person worked in a specific month and finally nos_js is the number of client of the person at a specific office in a specific month. As you can probably detect, in some months the person works at multiple office branches. I want to remove duplicates month to have unique id time panel. In order to remove duplicate months, (i) I want to keep the office with highest number of clients. (ii) But there are cases where number of clients (nos_js) is a tie. In this particular case, I want to keep the office of the previous month when nos_off==1.

    I tried the following:

    bys referal_CWid referal_month (nos_js): egen major_off=max(nos_js)
    keep if nos_js==major_off


    It only works if I do not have case (ii). But it has flaws in the presence of case(ii).

    Any suggestions on this would be highly appreciated.

    PS: The above dataset is at the Client level and hence there are some duplicate rows.

    Zariab Hossain
    Uppsala University
    Last edited by Zariab Hossain; 07 Mar 2024, 12:39.

  • #2
    I'm not sure I understand what you want to do, but it might be this:
    Code:
    preserve
    keep if nos_off == 1
    keep referal_CWid referal_officeid referal_month
    tempfile single_offices
    save `single_offices'
    restore
    
    assert !missing(nos_js)
    by referal_CWid referal_month (nos_js), sort: keep if nos_js == nos_js[_N]
    duplicates tag referal_CWid referal_month, gen(flag)
    preserve
    keep if !flag
    drop flag
    tempfile results
    save `results'
    restore
    keep if flag
    drop flag
    rangejoin referal_month . -1 using `single_offices', by(referal_CWid)
    by referal_CWid referal_month (referal_month_U), sort: keep if _n == _N
    replace referal_officeid = referal_officeid_U
    drop *_U
    
    append using `results'
    isid referal_CWid referal_month, sort
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Note: this code will fail if for some referal_CWid in some referal_month there is no earlier month in which nos_off == 1. Of course, in that case, there is also no solution that meets your criteria.

    Comment


    • #3
      Many thanks for your kind help.

      Comment

      Working...
      X