Hi All,
I have a dataset of employees of an organization between 2013 and 2015 with several variables. Using this dataset, I want to model probability of retention over time for future job applicants. In other words, I want to predict the probability of any given future applicant’s retention at each point in time of his or her tenure.
The big problem in my data is that performance is one of the predictors of retention in this particular case, but I only have performance data for 2013-2015. And, some employees started working in this company many years ago, some started after 2013, and some people left their job at different points between 2013 and 2015.
So, I don't have equal number of observations on performance for everyone. I also don't have a common start point for all of the employees.
The attached sample data is made-up and simply shows all different possibilities that occur in my dataset.
Is it possible to model retention with my data? Do you have any suggestions for me? I really appreciate your help.
start shows employees start date in the company
left-2014 shows whether employee left at 2014
perf variables show performance for each year
I have a dataset of employees of an organization between 2013 and 2015 with several variables. Using this dataset, I want to model probability of retention over time for future job applicants. In other words, I want to predict the probability of any given future applicant’s retention at each point in time of his or her tenure.
The big problem in my data is that performance is one of the predictors of retention in this particular case, but I only have performance data for 2013-2015. And, some employees started working in this company many years ago, some started after 2013, and some people left their job at different points between 2013 and 2015.
So, I don't have equal number of observations on performance for everyone. I also don't have a common start point for all of the employees.
The attached sample data is made-up and simply shows all different possibilities that occur in my dataset.
Is it possible to model retention with my data? Do you have any suggestions for me? I really appreciate your help.
Code:
input int female int age int start int left_2014 int left_2015 int perf2013 int perf2014 int perf2015 int tenure female age start left_2014 left_2015 perf2013 perf2014 perf2015 tenure 1 54 2000 1 . 3 . . 16 1 50 2002 0 1 3 4 . 14 1 43 2008 0 0 4 4 5 8 0 36 2011 0 0 3 4 3 5 1 29 2013 1 . 2 . . 3 1 32 2013 0 1 3 3 . 3 0 30 2013 0 0 2 3 3 3 1 27 2014 . 1 . 4 4 2 0 26 2014 . 0 4 3 5 2 0 28 2015 . . 3 4 4 1 end
left-2014 shows whether employee left at 2014
perf variables show performance for each year
Comment