Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to best aggregate monthly episode data?

    Dear all,

    I have monthly spell dataset with information about pupils educational achievements and their final grades (see dataex example bellow). My aim is to aggregate the information included in the spell file in a way that I, at the end, have a variable that shows me for each individual which school-leaving certificate and overall grade a person has achieved.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id byte wave_sp long splink byte certificate double grade byte(splast endm) int endy byte tag
    4010912 5 220002 3 1.2999999523162842 2 6 2012 1
    4010912 9 220003 5  1.600000023841858 2 7 2015 1
    4010913 9 220004 5                  3 2 7 2015 1
    4010913 9 220003 3 2.0999999046325684 2 7 2012 1
    4010914 5 220002 3                2.5 2 7 2012 0
    4010915 5 220004 2  3.200000047683716 2 7 2012 0
    4010918 9 220003 4                  3 2 7 2015 1
    4010918 7 220002 3                -20 2 7 2013 1
    4010981 9 220005 5  1.899999976158142 2 6 2015 1
    4010981 5 220004 3  1.899999976158142 2 6 2012 1
    4011534 5 220003 3  2.799999952316284 2 9 2012 2
    4011534 5 220002 2                  3 2 8 2011 2
    4011534 7 220004 5                  4 2 7 2013 2
    end
    label values wave_sp en2574
    label def en2574 5 "2012/2013", modify
    label def en2574 7 "2013/2014", modify
    label def en2574 9 "2015/2016", modify
    label values certificate en2978ext1
    label values grade enext1
    label values splast en177
    label def en177 2 "No", modify
    label values endm en1874
    label def en1874 6 "June", modify
    label def en1874 7 "July", modify
    label def en1874 8 "August", modify
    label def en1874 9 "September", modify
    label values endy enmiss

    Thus far I used the following code to restrict my data in a way that for each individual only the last school-leaving certificate remains.

    PRIOR STEPS:
    • Here I merged two data files and selected the variables I need for my analysis.
    • Furthermore, I deleted duplicates without differences in id, endy, certificate and splast.
    • I also tagged the duplicates that remained in my dataset.
    CODE IN FOCUS
    Code:
    tempvar last_year
    tempvar last_month
    bysort id: egen `last_year'=max(endy)
    bysort id: egen `last_month'=max(endm)
    keep if `last_year'==endy
    keep if `last_month'==endm
    drop tag `last_year'
    LATER STEPS:
    • Combine the data from above with another dataset.
    Now I’m wondering if there is a better solution to capture not only the last school-leaving certificate but the prior certificates, too, without running the same steps explained above again and again for different time points. I mean I could open the dataset, restrict the dataset to only 1 specific time point (e.g., keep if endy == 2011) and run my code for this particular year. Finally I could repeat this step with all other time points possible. Most likely with a loop.

    Anyway. I thought that maybe one of you guys knows a better way to do this all in one step.

    Thank you in advance for you help. I'm curious to see what solutions you guys may find. If it matters, I'm using Stata 14.

    Jonas
    Last edited by Jonas Jakobi; 06 Sep 2018, 06:52.
Working...
X