Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help calculating retention

    I am trying to calculate teacher retention for data set by category. I used the following code however it gave be an overall teacher retention and not a teacher retention by category by year. I would like a overall retention for the year but I would also like a retention by category. Any suggestions? I am very new to Stata so I apologize if I may sound like I don't know what I'm talking about.

    by teacher_id (year), sort: gen byte retained = (year[_n+1] == year+1) if year < 2024

    egen tag = tag(teacher_id year)
    egen count = total(tag), by(year)
    tabdisp year, cell(count)

    rename count total

    egen count = total(retained), by(year)

    rename count totalretained
    rename total totalteachers

    destring totalretained, replace
    destring totalteachers, replace

    gen pp_retained = totalretained/totalteachers
    gen pp_retained2 = totalretained/totalteachers*100

    tabdisp category year, cell(pp_retained2)


    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long teacher_id int year float category
    1054 2015 0
    1054 2016 0
    1054 2017 0
    1062 2015 2
    1062 2016 2
    1062 2017 2
    1062 2018 2
    1074 2015 1
    1076 2015 0
    1076 2016 0
    1082 2015 2
    1086 2019 3
    1086 2020 3
    1086 2021 3
    1086 2022 3
    1086 2023 3
    1091 2015 1
    1091 2016 1
    1091 2017 1
    1091 2018 1
    1091 2019 1
    1091 2020 1
    1091 2021 1
    1091 2022 1
    1091 2023 1
    1095 2017 3
    1095 2018 3
    1095 2019 3
    1095 2020 3
    1095 2021 3
    1095 2022 3
    1095 2023 3
    1098 2016 0
    1098 2017 0
    1098 2018 0
    1098 2019 0
    1098 2020 0
    1098 2021 0
    1098 2022 0
    1100 2015 0
    1100 2016 0
    1100 2017 0
    1100 2018 0
    1100 2019 3
    1100 2020 2
    1100 2021 3
    1100 2022 1
    1100 2023 2
    1122 2015 2
    1123 2015 0
    1123 2016 0
    1123 2017 0
    1123 2018 0
    1123 2019 0
    1123 2020 0
    1123 2021 0
    1123 2022 0
    1123 2023 0
    1126 2015 1
    1126 2016 1
    1126 2017 1
    1126 2018 1
    1126 2019 1
    1126 2020 1
    1126 2021 1
    1130 2016 0
    1130 2017 0
    1130 2020 0
    1130 2021 0
    1130 2022 0
    1130 2023 3
    1135 2015 3
    1135 2016 3
    1135 2017 3
    1135 2018 0
    1135 2019 3
    1135 2020 0
    1135 2021 0
    1135 2022 0
    1135 2023 0
    1138 2017 0
    1138 2020 0
    1142 2015 0
    1142 2016 0
    1142 2017 0
    1142 2018 0
    1142 2019 3
    1142 2020 3
    1142 2021 3
    1144 2015 2
    1144 2016 2
    1144 2017 2
    1144 2022 0
    1150 2015 2
    1150 2016 2
    1150 2018 0
    1158 2015 3
    1158 2016 3
    1158 2017 3
    1158 2018 3
    end
    [/CODE]

  • #2
    You've written some fairly convoluted code to do something that is actually pretty straightforward.

    Code:
    isid teacher_id year
    by teacher_id (year), sort: gen byte retained = (year[_n+1] == year + 1)
    
    //    OVERALL RETENTION RATES BY YEAR
    tabstat retained, statistics(mean) by(year) format(%03.2f)
    
    //    OVERALL RETENTION BY CATEGORY BY YEAR
    table (year) (category), statistic(mean retained) nformat(%03.2f mean)
    By the way, I note that some teachers change category over the years. As you haven't labeled the category variable, I can't tell whether this variable is supposed to denote unalterable aspects of the teacher, or things that are expected to change over time. If the latter, then this is no problem. But if it's the former, then something is wrong with your data.

    Added: The code above gives tables showing the proportions (0-1 range) of teachers retained. I notice that in your code you were looking for percentages. That's just a matter of multiplying everything by 100. And the simplest way to accomplish that is
    Code:
    by teacher_id (year), sort: gen byte retained = 100*(year[_n+1] == year + 1)
    Also, to get a nice display of the percentages, change the formats from %03.2f to %5.1f in the -tabstat- and -table- commands.
    Last edited by Clyde Schechter; 05 Dec 2023, 17:56.

    Comment

    Working...
    X