How to best aggregate monthly episode data?

Jonas Jakobi

Join Date: Sep 2018

Posts: 19
#1

How to best aggregate monthly episode data?

06 Sep 2018, 06:48

Dear all,

I have monthly spell dataset with information about pupils educational achievements and their final grades (see dataex example bellow). My aim is to aggregate the information included in the spell file in a way that I, at the end, have a variable that shows me for each individual which school-leaving certificate and overall grade a person has achieved.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long id byte wave_sp long splink byte certificate double grade byte(splast endm) int endy byte tag 4010912 5 220002 3 1.2999999523162842 2 6 2012 1 4010912 9 220003 5 1.600000023841858 2 7 2015 1 4010913 9 220004 5 3 2 7 2015 1 4010913 9 220003 3 2.0999999046325684 2 7 2012 1 4010914 5 220002 3 2.5 2 7 2012 0 4010915 5 220004 2 3.200000047683716 2 7 2012 0 4010918 9 220003 4 3 2 7 2015 1 4010918 7 220002 3 -20 2 7 2013 1 4010981 9 220005 5 1.899999976158142 2 6 2015 1 4010981 5 220004 3 1.899999976158142 2 6 2012 1 4011534 5 220003 3 2.799999952316284 2 9 2012 2 4011534 5 220002 2 3 2 8 2011 2 4011534 7 220004 5 4 2 7 2013 2 end label values wave_sp en2574 label def en2574 5 "2012/2013", modify label def en2574 7 "2013/2014", modify label def en2574 9 "2015/2016", modify label values certificate en2978ext1 label values grade enext1 label values splast en177 label def en177 2 "No", modify label values endm en1874 label def en1874 6 "June", modify label def en1874 7 "July", modify label def en1874 8 "August", modify label def en1874 9 "September", modify label values endy enmiss

Thus far I used the following code to restrict my data in a way that for each individual only the last school-leaving certificate remains.

PRIOR STEPS:
Here I merged two data files and selected the variables I need for my analysis.

Furthermore, I deleted duplicates without differences in id, endy, certificate and splast.

I also tagged the duplicates that remained in my dataset.

CODE IN FOCUS

Code:

tempvar last_year tempvar last_month bysort id: egen `last_year'=max(endy) bysort id: egen `last_month'=max(endm) keep if `last_year'==endy keep if `last_month'==endm drop tag `last_year'

LATER STEPS:
Combine the data from above with another dataset.

Now I’m wondering if there is a better solution to capture not only the last school-leaving certificate but the prior certificates, too, without running the same steps explained above again and again for different time points. I mean I could open the dataset, restrict the dataset to only 1 specific time point (e.g., keep if endy == 2011) and run my code for this particular year. Finally I could repeat this step with all other time points possible. Most likely with a loop.

Anyway. I thought that maybe one of you guys knows a better way to do this all in one step.

Thank you in advance for you help. I'm curious to see what solutions you guys may find. If it matters, I'm using Stata 14.

Jonas

Last edited by Jonas Jakobi; 06 Sep 2018, 06:52.
Tags: panel data, syntax, Time Series

Announcement

How to best aggregate monthly episode data?