Discrete choice model with varying choice sets

Amin Sofla

Join Date: May 2018
Posts: 67

Discrete choice model with varying choice sets

22 Jun 2018, 04:59

Which Discrete Choice Model in Stata fit a model in which the choice sets vary in term of potential choices, and in term of the number of zeros?

Detailed Explanation:
I would like to analyze the effect of workload on the probability of being staffed for (assigned to) the new clients. There are m branches, and in each branch, there is a varying number of employees. Each time that a new client comes, the branch should assign one employee to the new client. In each assignment (staffing) decision, the branch can choose only one employee from all the employees at the time of the assignment decision. As the unit of analysis is an employee-client pair, the dependent variable (chosen) is, therefore, an indicator variable that takes a value of 1 if the employee has been staffed on the client and 0 otherwise. In each branch, and for each staffing (assignment) decision, I identified a set of employees that could be assigned to the client in each staffing decision. Next, for each staffing (assignment) decision, I calculated the workload (aggregated clients’ size) for all the potential employees in that branch, on one day before staffing decision. The following shows the structure of the data. The code for the simulated dataset is at the end of this post (in case).

Code:

clear all
input assign_id br_id chosen emp_id emp_chosen workload
1    1    1    8    8    42
1    1    0    7    8    120
2    3    1    16    16    3
2    3    0    12    16    14
2    3    0    13    16    210
2    3    0    14    16    20
2    3    0    15    16    52
2    3    0    18    16    37
end

Variable definition:

assign_id is the Assignment ID
br_id is the Branch ID
chosen is an indicator variable that takes a value of 1 if the employee has been staffed on the client and 0 otherwise.
emp_id is the Employee ID
emp_chosen is the ID of the Employee who has been staffed
workload is the workload of the employee on the assignment date

Problem:
The problem is that the subset of theoretically possible responses varies from different aspects:

First, the subset of theoretically possible responses varies because (see above), in branch 1, employees 8 and 7 are in the staffing opportunity set while in branch 3, employees 12,13,14,15,16 and 18 are in the staffing opportunity set.
Second, the subset of theoretically possible responses varies because, in branch 1, there are 2 available employees as a staffing opportunity while in branch 3, there are 6 available employees as staffing opportunity.

I have been informed that the standard conditional logit model (CL) will fit the situation I have as long as I am willing to make the assumptions on which the model rests (e.g. the IIA assumption). Essentially I will be assuming that there is nothing systematic that makes each client-employee pair intrinsically unique (i.e. the differences are in the error term) once I condition on their observed characteristics. Under that assumption, I believe that the CL should work in theory, because all the differences across the individuals and the clients except their modeled characteristics are in the error term. The reason I say "in theory" is because some software implementations of CL might assume the choice sets are fixed or at least consist of the same number of alternatives per observation.

Question:
I would be grateful if you could suggest any solution or source that addresses the above issues?

I truly appreciate your time and consideration.

Code for the demonstration dataset:

Code:

ssc install rangestat
ssc install rangejoin
ssc install rangerun
clear all
set seed 3213
set obs 20
gen br_id = _n
gen long empl_count = runiformint(2,10)
expand empl_count
bysort br_id: gen emp_id = _n
gen long cl_count = empl_count * runiformint(2,5)
expand cl_count
bysort br_id emp_id: gen cl_id = _n
gen emclstdate = runiformint(mdy(1,1,2001), mdy(12,31,2017))
gen emclendate = runiformint(emclstdate, emclstdate + 365*10)
format %td emclstdate emclendate
gen clsize = runiformint(1,99)
drop empl_count cl_count
isid br_id emclstdate emp_id cl_id, sort
gen contract = _n
save "\contracts.dta", replace
 
clear all
use "\contracts.dta", clear
collapse (min) day1=emclstdate (max) dayN=emclendate (min) clsize, by(br_id cl_id)
gen years = year(dayN) - year(day1) + 1
expand years
bysort br_id cl_id: gen year = year(day1) + _n - 1
by br_id cl_id: replace clsize = clsize + runiformint(-clsize+5,clsize)
gen clsize_date = mdy(12,31,year)
replace clsize_date = day1 if year == year(day1)
replace clsize_date = dayN if year == year(dayN)
format %td clsize_date
drop years
isid br_id cl_id year, sort
save "\clsize.dta", replace
 
use "\contracts.dta", clear
drop clsize
gen year1 = year(emclstdate)
gen yearN = year(emclendate)
rangejoin year year1 yearN using "\clsize.dta", by(br_id cl_id)
isid br_id emclstdate emp_id cl_id year, sort
save "\contracts_annual.dta", replace
 
clear all
use "\contracts.dta", clear
collapse (min) day1=emclstdate (max) dayN=emclendate, by(br_id emp_id)
rangejoin emclstdate day1 dayN using "\contracts.dta", by(br_id) keep(emclstdate contract emp_id)
rename emp_id_U emp_chosen
gen chosen = emp_id == emp_chosen
gen clsize_date = emclstdate - 1
isid br_id contract emp_id, sort
format %td clsize_date
keep br_id emp_id day1 dayN emp_chosen emclstdate contract chosen clsize_date
save "\opset.dta", replace
 
append using "\contracts_annual.dta"
isid br_id contract clsize_date emp_id, sort missok
program do1
    drop if !mi(chosen)
    bysort cl_id (year): keep if _n == _N
    gen load = sum(clsize)
end
gen low = cond(!mi(chosen), ., mdy(12,31,2099))
format %td low
rangerun do1, interval(clsize_date low clsize_date) by(br_id emp_id)
isid br_id contract clsize_date emp_id, sort missok
drop if mi(chosen)
keep br_id emp_id emp_chosen emclstdate contract chosen load
replace load=0 if load==.
egen N_emp = count(emp_id), by (contract)
drop if N_emp<2
rename load workload
egen N_assign = count(contract), by (br_id)
sort contract
egen assign_id = group(contract)
sort br_id emp_id
egen emp_id2 = group(br_id emp_id)
drop emp_chosen
sort contract emp_id2
drop emp_id contract
rename emp_id2 emp_id
order assign_id emclstdate emp_id chosen workload br_id N_emp
sort assign_id emp_id
gen yr=year(emclstdate)
   label variable yr "Year"
   label variable br_id "Branch ID"
   label variable emp_id "Employee ID"
   label variableemclstdate "The assignment's date"
   label variable assign_id "Assignment ID"
   label variablechosen "Whether the employee has been chosen (staffed) - dependent Variable"
   label variable workload "The workload on the employee at the assignment date"
   label variable N_emp "Number of employees in the branch "
   label variable N_assign "Total number of assignments in the branch"
list if assign_id <= 5,sepby(assign_id)
save "\demo.dta", replace

Last edited by Amin Sofla; 22 Jun 2018, 05:12.

Tags: conditional logit, discrete choice model, varying choice sets

Announcement

Discrete choice model with varying choice sets