Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discrete choice model with varying choice sets

    Which Discrete Choice Model in Stata fit a model in which the choice sets vary in term of potential choices, and in term of the number of zeros?

    Detailed Explanation:
    I would like to analyze the effect of workload on the probability of being staffed for (assigned to) the new clients. There are m branches, and in each branch, there is a varying number of employees. Each time that a new client comes, the branch should assign one employee to the new client. In each assignment (staffing) decision, the branch can choose only one employee from all the employees at the time of the assignment decision. As the unit of analysis is an employee-client pair, the dependent variable (chosen) is, therefore, an indicator variable that takes a value of 1 if the employee has been staffed on the client and 0 otherwise. In each branch, and for each staffing (assignment) decision, I identified a set of employees that could be assigned to the client in each staffing decision. Next, for each staffing (assignment) decision, I calculated the workload (aggregated clients’ size) for all the potential employees in that branch, on one day before staffing decision. The following shows the structure of the data. The code for the simulated dataset is at the end of this post (in case).

    Code:
    clear all
    input assign_id br_id chosen emp_id emp_chosen workload
    1    1    1    8    8    42
    1    1    0    7    8    120
    2    3    1    16    16    3
    2    3    0    12    16    14
    2    3    0    13    16    210
    2    3    0    14    16    20
    2    3    0    15    16    52
    2    3    0    18    16    37
    end
    Variable definition:
    • assign_id is the Assignment ID
    • br_id is the Branch ID
    • chosen is an indicator variable that takes a value of 1 if the employee has been staffed on the client and 0 otherwise.
    • emp_id is the Employee ID
    • emp_chosen is the ID of the Employee who has been staffed
    • workload is the workload of the employee on the assignment date
    Problem:
    The problem is that the subset of theoretically possible responses varies from different aspects:
    • First, the subset of theoretically possible responses varies because (see above), in branch 1, employees 8 and 7 are in the staffing opportunity set while in branch 3, employees 12,13,14,15,16 and 18 are in the staffing opportunity set.
    • Second, the subset of theoretically possible responses varies because, in branch 1, there are 2 available employees as a staffing opportunity while in branch 3, there are 6 available employees as staffing opportunity.
    I have been informed that the standard conditional logit model (CL) will fit the situation I have as long as I am willing to make the assumptions on which the model rests (e.g. the IIA assumption). Essentially I will be assuming that there is nothing systematic that makes each client-employee pair intrinsically unique (i.e. the differences are in the error term) once I condition on their observed characteristics. Under that assumption, I believe that the CL should work in theory, because all the differences across the individuals and the clients except their modeled characteristics are in the error term. The reason I say "in theory" is because some software implementations of CL might assume the choice sets are fixed or at least consist of the same number of alternatives per observation.

    Question:
    I would be grateful if you could suggest any solution or source that addresses the above issues?

    I truly appreciate your time and consideration.



    Code for the demonstration dataset:
    Code:
    ssc install rangestat
    ssc install rangejoin
    ssc install rangerun
    clear all
    set seed 3213
    set obs 20
    gen br_id = _n
    gen long empl_count = runiformint(2,10)
    expand empl_count
    bysort br_id: gen emp_id = _n
    gen long cl_count = empl_count * runiformint(2,5)
    expand cl_count
    bysort br_id emp_id: gen cl_id = _n
    gen emclstdate = runiformint(mdy(1,1,2001), mdy(12,31,2017))
    gen emclendate = runiformint(emclstdate, emclstdate + 365*10)
    format %td emclstdate emclendate
    gen clsize = runiformint(1,99)
    drop empl_count cl_count
    isid br_id emclstdate emp_id cl_id, sort
    gen contract = _n
    save "\contracts.dta", replace
     
    clear all
    use "\contracts.dta", clear
    collapse (min) day1=emclstdate (max) dayN=emclendate (min) clsize, by(br_id cl_id)
    gen years = year(dayN) - year(day1) + 1
    expand years
    bysort br_id cl_id: gen year = year(day1) + _n - 1
    by br_id cl_id: replace clsize = clsize + runiformint(-clsize+5,clsize)
    gen clsize_date = mdy(12,31,year)
    replace clsize_date = day1 if year == year(day1)
    replace clsize_date = dayN if year == year(dayN)
    format %td clsize_date
    drop years
    isid br_id cl_id year, sort
    save "\clsize.dta", replace
     
    use "\contracts.dta", clear
    drop clsize
    gen year1 = year(emclstdate)
    gen yearN = year(emclendate)
    rangejoin year year1 yearN using "\clsize.dta", by(br_id cl_id)
    isid br_id emclstdate emp_id cl_id year, sort
    save "\contracts_annual.dta", replace
     
    clear all
    use "\contracts.dta", clear
    collapse (min) day1=emclstdate (max) dayN=emclendate, by(br_id emp_id)
    rangejoin emclstdate day1 dayN using "\contracts.dta", by(br_id) keep(emclstdate contract emp_id)
    rename emp_id_U emp_chosen
    gen chosen = emp_id == emp_chosen
    gen clsize_date = emclstdate - 1
    isid br_id contract emp_id, sort
    format %td clsize_date
    keep br_id emp_id day1 dayN emp_chosen emclstdate contract chosen clsize_date
    save "\opset.dta", replace
     
    append using "\contracts_annual.dta"
    isid br_id contract clsize_date emp_id, sort missok
    program do1
        drop if !mi(chosen)
        bysort cl_id (year): keep if _n == _N
        gen load = sum(clsize)
    end
    gen low = cond(!mi(chosen), ., mdy(12,31,2099))
    format %td low
    rangerun do1, interval(clsize_date low clsize_date) by(br_id emp_id)
    isid br_id contract clsize_date emp_id, sort missok
    drop if mi(chosen)
    keep br_id emp_id emp_chosen emclstdate contract chosen load
    replace load=0 if load==.
    egen N_emp = count(emp_id), by (contract)
    drop if N_emp<2
    rename load workload
    egen N_assign = count(contract), by (br_id)
    sort contract
    egen assign_id = group(contract)
    sort br_id emp_id
    egen emp_id2 = group(br_id emp_id)
    drop emp_chosen
    sort contract emp_id2
    drop emp_id contract
    rename emp_id2 emp_id
    order assign_id emclstdate emp_id chosen workload br_id N_emp
    sort assign_id emp_id
    gen yr=year(emclstdate)
       label variable yr "Year"
       label variable br_id "Branch ID"
       label variable emp_id "Employee ID"
       label variableemclstdate "The assignment's date"
       label variable assign_id "Assignment ID"
       label variablechosen "Whether the employee has been chosen (staffed) - dependent Variable"
       label variable workload "The workload on the employee at the assignment date"
       label variable N_emp "Number of employees in the branch "
       label variable N_assign "Total number of assignments in the branch"
    list if assign_id <= 5,sepby(assign_id)
    save "\demo.dta", replace
    Last edited by Amin Sofla; 22 Jun 2018, 05:12.
Working...
X