Hi everybody
I am considering whether or not it is wise to use survival analysis for attainment of vocational training after finishing high school. I have looked into the dataset specified in the manual (use https://www.stata-press.com/data/r18/drug2, clear). In my case, however, I have individual-level data from official registries, so I know exactly whether individuals attain vocational training or not within the time period.
I have high school students from 2010 and 2011, which means that I time records, respectively, 10 and 9 years after high school. My question is what I do with those students who do not attain vocational training (either because they do not get educated or because they decided to pursuit some other education). For those who do not receive vocational training, I have decided to code the timevariable either 10 or 9 depending on their year of attaining a high school diploma. This resembles the dataset in the manual, however, in the drug2-dataset, there are individuals who drop out of the study. This is not the case with my dataset, as I know exaxtly what they do after high school. So, is it meaningful to specify time, as the maximum time after high school for those who do not do vocational training?
In my understanding, survival analysis is an interesting approach because it allows me to use both high school-students from 2010 and 2011 without having to trim the dataset (e.g., only study the population 9 years after high school). As such, I can use the full information (additionally, sts graphs provide some great visuals of the development over time).
I have created a dataset, which resembles the data that I am working with.
I am considering whether or not it is wise to use survival analysis for attainment of vocational training after finishing high school. I have looked into the dataset specified in the manual (use https://www.stata-press.com/data/r18/drug2, clear). In my case, however, I have individual-level data from official registries, so I know exactly whether individuals attain vocational training or not within the time period.
I have high school students from 2010 and 2011, which means that I time records, respectively, 10 and 9 years after high school. My question is what I do with those students who do not attain vocational training (either because they do not get educated or because they decided to pursuit some other education). For those who do not receive vocational training, I have decided to code the timevariable either 10 or 9 depending on their year of attaining a high school diploma. This resembles the dataset in the manual, however, in the drug2-dataset, there are individuals who drop out of the study. This is not the case with my dataset, as I know exaxtly what they do after high school. So, is it meaningful to specify time, as the maximum time after high school for those who do not do vocational training?
In my understanding, survival analysis is an interesting approach because it allows me to use both high school-students from 2010 and 2011 without having to trim the dataset (e.g., only study the population 9 years after high school). As such, I can use the full information (additionally, sts graphs provide some great visuals of the development over time).
I have created a dataset, which resembles the data that I am working with.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float time_new byte educational_attainment float(gender finished_high_school) 5 1 0 2011 7 1 1 2010 9 1 0 2010 5 1 1 2011 10 1 0 2010 9 1 0 2010 3 1 0 2011 5 1 0 2010 6 1 1 2010 1 1 1 2011 7 1 1 2010 9 1 1 2011 6 1 0 2011 9 1 0 2010 7 1 0 2010 10 1 1 2010 9 1 1 2011 8 1 1 2011 6 1 0 2010 9 1 0 2011 5 1 0 2011 6 1 0 2011 9 1 1 2011 9 1 0 2010 7 1 0 2011 10 1 0 2010 6 1 1 2011 10 1 1 2010 7 1 1 2011 4 1 0 2011 7 1 1 2010 6 1 1 2011 7 1 0 2010 6 1 0 2011 5 1 0 2011 5 1 1 2010 9 1 0 2011 9 1 1 2010 7 1 0 2011 9 1 1 2011 6 1 1 2010 7 1 1 2010 9 1 1 2011 9 1 0 2010 9 1 0 2011 9 1 1 2010 7 1 1 2010 5 1 1 2011 6 1 1 2011 9 1 1 2010 6 1 0 2010 9 1 1 2011 9 1 0 2011 6 1 0 2010 5 1 1 2011 9 1 0 2010 2 1 0 2010 7 1 1 2011 7 1 1 2011 7 1 1 2010 5 1 1 2011 6 1 1 2010 2 1 0 2010 10 1 0 2010 6 1 0 2010 7 1 0 2010 5 1 0 2010 7 1 1 2010 9 1 0 2010 8 1 1 2011 6 1 0 2011 4 1 0 2010 5 1 0 2010 7 1 0 2011 5 1 1 2010 9 1 0 2011 9 1 0 2011 9 1 0 2011 5 1 0 2011 5 1 0 2010 6 1 1 2010 9 1 0 2011 6 1 0 2010 9 1 0 2011 6 1 0 2010 10 1 1 2010 5 1 1 2011 5 1 0 2011 5 1 0 2010 9 1 0 2011 6 1 1 2010 7 1 0 2010 9 1 1 2011 9 0 0 2011 9 0 1 2011 9 0 0 2011 9 0 0 2011 9 0 0 2011 10 0 1 2010 10 0 0 2010 9 0 0 2011 9 0 0 2011 10 0 0 2010 9 0 1 2011 9 0 0 2011 10 0 0 2010 9 0 0 2011 10 0 1 2010 10 0 1 2010 9 0 1 2011 10 0 1 2010 10 0 0 2010 9 0 1 2011 9 0 0 2011 9 0 1 2011 10 0 0 2010 10 0 1 2010 9 0 0 2011 9 0 1 2011 10 0 0 2010 10 0 1 2010 10 0 0 2010 9 0 1 2011 9 0 1 2011 9 0 0 2011 9 0 1 2011 10 0 0 2010 9 0 0 2011 10 0 0 2010 9 0 0 2011 9 0 1 2011 9 0 1 2011 9 0 0 2011 10 0 1 2010 10 0 1 2010 10 0 1 2010 10 0 0 2010 10 0 0 2010 10 0 0 2010 10 0 0 2010 10 0 1 2010 10 0 0 2010 10 0 1 2010 10 0 0 2010 10 0 1 2010 9 0 1 2011 9 0 1 2011 10 0 1 2010 9 0 0 2011 10 0 1 2010 10 0 0 2010 10 0 1 2010 10 0 0 2010 10 0 1 2010 9 0 0 2011 10 0 1 2010 9 0 0 2011 9 0 0 2011 9 0 1 2011 9 0 0 2011 9 0 0 2011 9 0 1 2011 10 0 0 2010 10 0 0 2010 10 0 0 2010 9 0 1 2011 10 0 0 2010 10 0 1 2010 10 0 1 2010 9 0 1 2011 10 0 1 2010 9 0 0 2011 10 0 1 2010 10 0 0 2010 9 0 0 2011 10 0 0 2010 9 0 0 2011 9 0 1 2011 9 0 0 2011 10 0 1 2010 9 0 0 2011 10 0 1 2010 9 0 0 2011 9 0 0 2011 10 0 1 2010 end
Comment