Dear Statalist,
I would like to analyze the effect of workload on the probability of being staffed for the new clients. In the 'Dataset 1' (below), we observe all the branches with their corresponding employees and clients. In addition, we observe every employee that was staffed by the specific client, and the corresponding starting and ending date. I limit the sample to one year for the sake of brevity. Variable definition: br_id is the branch Id; emp_id is the employee Id, cl_id is the client's Id; assign_id is the assignment decision's Id; emclstdate is the starting date, emclendate is the ending date; newassign is an indicator that shows whether there is an assignment in the current year or no; clsize is the client's size.
Dataset 1:
The unit of observation for the final analysis will be the employee - his(her) new client dyad, which represents a staffing opportunity. In order to test my hypotheses, in each branch, and for each staffing (assignment) decision, I need to identify a set of employees that could be assigned to the client in each staffing decision. I directly observe every employee that was staffed through the 'Dataset 1' and include them in the opportunity set. In order to identify non-staffed employees that could have been staffed on a specific transaction, I would like to consider all the non-staffed employees who worked for the branch on the specific assignment (staffing) date. Next, for each transaction (staffing decision), I would like to calculate the workload (aggregated clients’ size) for all the potential employees in the branch, on one day before staffing decision (see the workload). The final dataset should look like the 'Dataset 2'.The dependent variable (staffed in the 'Dataset 2') is an indicator variable that takes a value of one if the employee is assigned to the client (from 'Dataset 1') and zero otherwise.
Dataset 2:
To summarize:
How can I create the staffing opportunity set (In each office, for each staffing decision, find all the staffed and non-staffed employees in the branch- and merge altogether – The Dataset 2)?
How can I calculate the aggregated workload of each staffed and non-staffed employees at each staffing (assignment) date (actually one day before the decision)? (Dataset 2 – workload variable)
More importantly, how can I improve the efficiency of the program? Please also note that in each year, there are in total about 1000 branches, 50,000 assignment (staffing) decisions, 5,000 employees, and 250,000 clients.
I truly appreciate your time and consideration,
I would like to analyze the effect of workload on the probability of being staffed for the new clients. In the 'Dataset 1' (below), we observe all the branches with their corresponding employees and clients. In addition, we observe every employee that was staffed by the specific client, and the corresponding starting and ending date. I limit the sample to one year for the sake of brevity. Variable definition: br_id is the branch Id; emp_id is the employee Id, cl_id is the client's Id; assign_id is the assignment decision's Id; emclstdate is the starting date, emclendate is the ending date; newassign is an indicator that shows whether there is an assignment in the current year or no; clsize is the client's size.
Dataset 1:
yeare | br_id | emp_id | cl_id | newassign | assign_id | emclstdate | emclendate | clsize |
2005 | 1 | 1 | 12 | 0 | 09/01/2001 | 23/05/2007 | 60 | |
2005 | 1 | 1 | 2 | 0 | 31/10/2003 | 22/09/2014 | 80 | |
2005 | 1 | 1 | 5 | 0 | 01/11/2003 | 06/02/2006 | 20 | |
2005 | 1 | 2 | 7 | 0 | 01/11/2003 | 05/05/2009 | 90 | |
2005 | 1 | 2 | 4 | 0 | 03/08/2004 | 16/05/2006 | 90 | |
2005 | 1 | 2 | 8 | 1 | 3 | 01/12/2005 | 02/05/2006 | 60 |
2005 | 1 | 3 | 11 | 0 | 24/01/2004 | 31/03/2017 | 50 | |
2005 | 1 | 3 | 6 | 0 | 24/11/2004 | 02/05/2006 | 80 | |
2005 | 1 | 3 | 1 | 1 | 1 | 01/01/2005 | 16/05/2006 | 30 |
2005 | 2 | 4 | 3 | 0 | 14/12/2004 | 09/03/2010 | 70 | |
2005 | 2 | 4 | 10 | 1 | 4 | 20/01/2005 | 12/03/2010 | 30 |
2005 | 2 | 5 | 13 | 0 | 14/12/2004 | 27/12/2006 | 20 | |
2005 | 2 | 5 | 9 | 1 | 2 | 20/12/2005 | 27/12/2005 | 10 |
Dataset 2:
yeare | br_id | assign_id | staffed | emp_id | emclstdate | workload |
2005 | 1 | 1 | 0 | 1 | 01/01/2005 | 160 |
2005 | 1 | 1 | 0 | 2 | 01/01/2005 | 180 |
2005 | 1 | 1 | 1 | 3 | 01/01/2005 | 130 |
2005 | 2 | 2 | 0 | 4 | 20/12/2005 | 100 |
2005 | 2 | 2 | 1 | 5 | 20/12/2005 | 20 |
2005 | 1 | 3 | 0 | 1 | 01/12/2005 | 160 |
2005 | 1 | 3 | 1 | 2 | 01/12/2005 | 180 |
2005 | 1 | 3 | 0 | 3 | 01/12/2005 | 160 |
2005 | 2 | 4 | 1 | 4 | 20/01/2005 | 70 |
2005 | 2 | 4 | 0 | 5 | 20/01/2005 | 20 |
How can I create the staffing opportunity set (In each office, for each staffing decision, find all the staffed and non-staffed employees in the branch- and merge altogether – The Dataset 2)?
How can I calculate the aggregated workload of each staffed and non-staffed employees at each staffing (assignment) date (actually one day before the decision)? (Dataset 2 – workload variable)
More importantly, how can I improve the efficiency of the program? Please also note that in each year, there are in total about 1000 branches, 50,000 assignment (staffing) decisions, 5,000 employees, and 250,000 clients.
I truly appreciate your time and consideration,
Comment