I am trying to calculate teacher retention for data set by category. I used the following code however it gave be an overall teacher retention and not a teacher retention by category by year. I would like a overall retention for the year but I would also like a retention by category. Any suggestions? I am very new to Stata so I apologize if I may sound like I don't know what I'm talking about.
by teacher_id (year), sort: gen byte retained = (year[_n+1] == year+1) if year < 2024
egen tag = tag(teacher_id year)
egen count = total(tag), by(year)
tabdisp year, cell(count)
rename count total
egen count = total(retained), by(year)
rename count totalretained
rename total totalteachers
destring totalretained, replace
destring totalteachers, replace
gen pp_retained = totalretained/totalteachers
gen pp_retained2 = totalretained/totalteachers*100
tabdisp category year, cell(pp_retained2)
* Example generated by -dataex-. For more info, type help dataex
clear
input long teacher_id int year float category
1054 2015 0
1054 2016 0
1054 2017 0
1062 2015 2
1062 2016 2
1062 2017 2
1062 2018 2
1074 2015 1
1076 2015 0
1076 2016 0
1082 2015 2
1086 2019 3
1086 2020 3
1086 2021 3
1086 2022 3
1086 2023 3
1091 2015 1
1091 2016 1
1091 2017 1
1091 2018 1
1091 2019 1
1091 2020 1
1091 2021 1
1091 2022 1
1091 2023 1
1095 2017 3
1095 2018 3
1095 2019 3
1095 2020 3
1095 2021 3
1095 2022 3
1095 2023 3
1098 2016 0
1098 2017 0
1098 2018 0
1098 2019 0
1098 2020 0
1098 2021 0
1098 2022 0
1100 2015 0
1100 2016 0
1100 2017 0
1100 2018 0
1100 2019 3
1100 2020 2
1100 2021 3
1100 2022 1
1100 2023 2
1122 2015 2
1123 2015 0
1123 2016 0
1123 2017 0
1123 2018 0
1123 2019 0
1123 2020 0
1123 2021 0
1123 2022 0
1123 2023 0
1126 2015 1
1126 2016 1
1126 2017 1
1126 2018 1
1126 2019 1
1126 2020 1
1126 2021 1
1130 2016 0
1130 2017 0
1130 2020 0
1130 2021 0
1130 2022 0
1130 2023 3
1135 2015 3
1135 2016 3
1135 2017 3
1135 2018 0
1135 2019 3
1135 2020 0
1135 2021 0
1135 2022 0
1135 2023 0
1138 2017 0
1138 2020 0
1142 2015 0
1142 2016 0
1142 2017 0
1142 2018 0
1142 2019 3
1142 2020 3
1142 2021 3
1144 2015 2
1144 2016 2
1144 2017 2
1144 2022 0
1150 2015 2
1150 2016 2
1150 2018 0
1158 2015 3
1158 2016 3
1158 2017 3
1158 2018 3
end
[/CODE]
by teacher_id (year), sort: gen byte retained = (year[_n+1] == year+1) if year < 2024
egen tag = tag(teacher_id year)
egen count = total(tag), by(year)
tabdisp year, cell(count)
rename count total
egen count = total(retained), by(year)
rename count totalretained
rename total totalteachers
destring totalretained, replace
destring totalteachers, replace
gen pp_retained = totalretained/totalteachers
gen pp_retained2 = totalretained/totalteachers*100
tabdisp category year, cell(pp_retained2)
* Example generated by -dataex-. For more info, type help dataex
clear
input long teacher_id int year float category
1054 2015 0
1054 2016 0
1054 2017 0
1062 2015 2
1062 2016 2
1062 2017 2
1062 2018 2
1074 2015 1
1076 2015 0
1076 2016 0
1082 2015 2
1086 2019 3
1086 2020 3
1086 2021 3
1086 2022 3
1086 2023 3
1091 2015 1
1091 2016 1
1091 2017 1
1091 2018 1
1091 2019 1
1091 2020 1
1091 2021 1
1091 2022 1
1091 2023 1
1095 2017 3
1095 2018 3
1095 2019 3
1095 2020 3
1095 2021 3
1095 2022 3
1095 2023 3
1098 2016 0
1098 2017 0
1098 2018 0
1098 2019 0
1098 2020 0
1098 2021 0
1098 2022 0
1100 2015 0
1100 2016 0
1100 2017 0
1100 2018 0
1100 2019 3
1100 2020 2
1100 2021 3
1100 2022 1
1100 2023 2
1122 2015 2
1123 2015 0
1123 2016 0
1123 2017 0
1123 2018 0
1123 2019 0
1123 2020 0
1123 2021 0
1123 2022 0
1123 2023 0
1126 2015 1
1126 2016 1
1126 2017 1
1126 2018 1
1126 2019 1
1126 2020 1
1126 2021 1
1130 2016 0
1130 2017 0
1130 2020 0
1130 2021 0
1130 2022 0
1130 2023 3
1135 2015 3
1135 2016 3
1135 2017 3
1135 2018 0
1135 2019 3
1135 2020 0
1135 2021 0
1135 2022 0
1135 2023 0
1138 2017 0
1138 2020 0
1142 2015 0
1142 2016 0
1142 2017 0
1142 2018 0
1142 2019 3
1142 2020 3
1142 2021 3
1144 2015 2
1144 2016 2
1144 2017 2
1144 2022 0
1150 2015 2
1150 2016 2
1150 2018 0
1158 2015 3
1158 2016 3
1158 2017 3
1158 2018 3
end
[/CODE]
Comment