Which Discrete Choice Model in Stata fit a model in which the choice sets vary in term of potential choices, and in term of the number of zeros?
Detailed Explanation:
I would like to analyze the effect of workload on the probability of being staffed for (assigned to) the new clients. There are m branches, and in each branch, there is a varying number of employees. Each time that a new client comes, the branch should assign one employee to the new client. In each assignment (staffing) decision, the branch can choose only one employee from all the employees at the time of the assignment decision. As the unit of analysis is an employee-client pair, the dependent variable (chosen) is, therefore, an indicator variable that takes a value of 1 if the employee has been staffed on the client and 0 otherwise. In each branch, and for each staffing (assignment) decision, I identified a set of employees that could be assigned to the client in each staffing decision. Next, for each staffing (assignment) decision, I calculated the workload (aggregated clients’ size) for all the potential employees in that branch, on one day before staffing decision. The following shows the structure of the data. The code for the simulated dataset is at the end of this post (in case).
Variable definition:
The problem is that the subset of theoretically possible responses varies from different aspects:
Question:
I would be grateful if you could suggest any solution or source that addresses the above issues?
I truly appreciate your time and consideration.
Code for the demonstration dataset:
Detailed Explanation:
I would like to analyze the effect of workload on the probability of being staffed for (assigned to) the new clients. There are m branches, and in each branch, there is a varying number of employees. Each time that a new client comes, the branch should assign one employee to the new client. In each assignment (staffing) decision, the branch can choose only one employee from all the employees at the time of the assignment decision. As the unit of analysis is an employee-client pair, the dependent variable (chosen) is, therefore, an indicator variable that takes a value of 1 if the employee has been staffed on the client and 0 otherwise. In each branch, and for each staffing (assignment) decision, I identified a set of employees that could be assigned to the client in each staffing decision. Next, for each staffing (assignment) decision, I calculated the workload (aggregated clients’ size) for all the potential employees in that branch, on one day before staffing decision. The following shows the structure of the data. The code for the simulated dataset is at the end of this post (in case).
Code:
clear all
input assign_id br_id chosen emp_id emp_chosen workload
1 1 1 8 8 42
1 1 0 7 8 120
2 3 1 16 16 3
2 3 0 12 16 14
2 3 0 13 16 210
2 3 0 14 16 20
2 3 0 15 16 52
2 3 0 18 16 37
end
- assign_id is the Assignment ID
- br_id is the Branch ID
- chosen is an indicator variable that takes a value of 1 if the employee has been staffed on the client and 0 otherwise.
- emp_id is the Employee ID
- emp_chosen is the ID of the Employee who has been staffed
- workload is the workload of the employee on the assignment date
The problem is that the subset of theoretically possible responses varies from different aspects:
- First, the subset of theoretically possible responses varies because (see above), in branch 1, employees 8 and 7 are in the staffing opportunity set while in branch 3, employees 12,13,14,15,16 and 18 are in the staffing opportunity set.
- Second, the subset of theoretically possible responses varies because, in branch 1, there are 2 available employees as a staffing opportunity while in branch 3, there are 6 available employees as staffing opportunity.
Question:
I would be grateful if you could suggest any solution or source that addresses the above issues?
I truly appreciate your time and consideration.
Code for the demonstration dataset:
Code:
ssc install rangestat ssc install rangejoin ssc install rangerun clear all set seed 3213 set obs 20 gen br_id = _n gen long empl_count = runiformint(2,10) expand empl_count bysort br_id: gen emp_id = _n gen long cl_count = empl_count * runiformint(2,5) expand cl_count bysort br_id emp_id: gen cl_id = _n gen emclstdate = runiformint(mdy(1,1,2001), mdy(12,31,2017)) gen emclendate = runiformint(emclstdate, emclstdate + 365*10) format %td emclstdate emclendate gen clsize = runiformint(1,99) drop empl_count cl_count isid br_id emclstdate emp_id cl_id, sort gen contract = _n save "\contracts.dta", replace clear all use "\contracts.dta", clear collapse (min) day1=emclstdate (max) dayN=emclendate (min) clsize, by(br_id cl_id) gen years = year(dayN) - year(day1) + 1 expand years bysort br_id cl_id: gen year = year(day1) + _n - 1 by br_id cl_id: replace clsize = clsize + runiformint(-clsize+5,clsize) gen clsize_date = mdy(12,31,year) replace clsize_date = day1 if year == year(day1) replace clsize_date = dayN if year == year(dayN) format %td clsize_date drop years isid br_id cl_id year, sort save "\clsize.dta", replace use "\contracts.dta", clear drop clsize gen year1 = year(emclstdate) gen yearN = year(emclendate) rangejoin year year1 yearN using "\clsize.dta", by(br_id cl_id) isid br_id emclstdate emp_id cl_id year, sort save "\contracts_annual.dta", replace clear all use "\contracts.dta", clear collapse (min) day1=emclstdate (max) dayN=emclendate, by(br_id emp_id) rangejoin emclstdate day1 dayN using "\contracts.dta", by(br_id) keep(emclstdate contract emp_id) rename emp_id_U emp_chosen gen chosen = emp_id == emp_chosen gen clsize_date = emclstdate - 1 isid br_id contract emp_id, sort format %td clsize_date keep br_id emp_id day1 dayN emp_chosen emclstdate contract chosen clsize_date save "\opset.dta", replace append using "\contracts_annual.dta" isid br_id contract clsize_date emp_id, sort missok program do1 drop if !mi(chosen) bysort cl_id (year): keep if _n == _N gen load = sum(clsize) end gen low = cond(!mi(chosen), ., mdy(12,31,2099)) format %td low rangerun do1, interval(clsize_date low clsize_date) by(br_id emp_id) isid br_id contract clsize_date emp_id, sort missok drop if mi(chosen) keep br_id emp_id emp_chosen emclstdate contract chosen load replace load=0 if load==. egen N_emp = count(emp_id), by (contract) drop if N_emp<2 rename load workload egen N_assign = count(contract), by (br_id) sort contract egen assign_id = group(contract) sort br_id emp_id egen emp_id2 = group(br_id emp_id) drop emp_chosen sort contract emp_id2 drop emp_id contract rename emp_id2 emp_id order assign_id emclstdate emp_id chosen workload br_id N_emp sort assign_id emp_id gen yr=year(emclstdate) label variable yr "Year" label variable br_id "Branch ID" label variable emp_id "Employee ID" label variableemclstdate "The assignment's date" label variable assign_id "Assignment ID" label variablechosen "Whether the employee has been chosen (staffed) - dependent Variable" label variable workload "The workload on the employee at the assignment date" label variable N_emp "Number of employees in the branch " label variable N_assign "Total number of assignments in the branch" list if assign_id <= 5,sepby(assign_id) save "\demo.dta", replace