Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • survival analysis for educational attainment

    Hi everybody

    I am considering whether or not it is wise to use survival analysis for attainment of vocational training after finishing high school. I have looked into the dataset specified in the manual (use https://www.stata-press.com/data/r18/drug2, clear). In my case, however, I have individual-level data from official registries, so I know exactly whether individuals attain vocational training or not within the time period.

    I have high school students from 2010 and 2011, which means that I time records, respectively, 10 and 9 years after high school. My question is what I do with those students who do not attain vocational training (either because they do not get educated or because they decided to pursuit some other education). For those who do not receive vocational training, I have decided to code the timevariable either 10 or 9 depending on their year of attaining a high school diploma. This resembles the dataset in the manual, however, in the drug2-dataset, there are individuals who drop out of the study. This is not the case with my dataset, as I know exaxtly what they do after high school. So, is it meaningful to specify time, as the maximum time after high school for those who do not do vocational training?

    In my understanding, survival analysis is an interesting approach because it allows me to use both high school-students from 2010 and 2011 without having to trim the dataset (e.g., only study the population 9 years after high school). As such, I can use the full information (additionally, sts graphs provide some great visuals of the development over time).

    I have created a dataset, which resembles the data that I am working with.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float time_new byte educational_attainment float(gender finished_high_school)
     5 1 0 2011
     7 1 1 2010
     9 1 0 2010
     5 1 1 2011
    10 1 0 2010
     9 1 0 2010
     3 1 0 2011
     5 1 0 2010
     6 1 1 2010
     1 1 1 2011
     7 1 1 2010
     9 1 1 2011
     6 1 0 2011
     9 1 0 2010
     7 1 0 2010
    10 1 1 2010
     9 1 1 2011
     8 1 1 2011
     6 1 0 2010
     9 1 0 2011
     5 1 0 2011
     6 1 0 2011
     9 1 1 2011
     9 1 0 2010
     7 1 0 2011
    10 1 0 2010
     6 1 1 2011
    10 1 1 2010
     7 1 1 2011
     4 1 0 2011
     7 1 1 2010
     6 1 1 2011
     7 1 0 2010
     6 1 0 2011
     5 1 0 2011
     5 1 1 2010
     9 1 0 2011
     9 1 1 2010
     7 1 0 2011
     9 1 1 2011
     6 1 1 2010
     7 1 1 2010
     9 1 1 2011
     9 1 0 2010
     9 1 0 2011
     9 1 1 2010
     7 1 1 2010
     5 1 1 2011
     6 1 1 2011
     9 1 1 2010
     6 1 0 2010
     9 1 1 2011
     9 1 0 2011
     6 1 0 2010
     5 1 1 2011
     9 1 0 2010
     2 1 0 2010
     7 1 1 2011
     7 1 1 2011
     7 1 1 2010
     5 1 1 2011
     6 1 1 2010
     2 1 0 2010
    10 1 0 2010
     6 1 0 2010
     7 1 0 2010
     5 1 0 2010
     7 1 1 2010
     9 1 0 2010
     8 1 1 2011
     6 1 0 2011
     4 1 0 2010
     5 1 0 2010
     7 1 0 2011
     5 1 1 2010
     9 1 0 2011
     9 1 0 2011
     9 1 0 2011
     5 1 0 2011
     5 1 0 2010
     6 1 1 2010
     9 1 0 2011
     6 1 0 2010
     9 1 0 2011
     6 1 0 2010
    10 1 1 2010
     5 1 1 2011
     5 1 0 2011
     5 1 0 2010
     9 1 0 2011
     6 1 1 2010
     7 1 0 2010
     9 1 1 2011
     9 0 0 2011
     9 0 1 2011
     9 0 0 2011
     9 0 0 2011
     9 0 0 2011
    10 0 1 2010
    10 0 0 2010
     9 0 0 2011
     9 0 0 2011
    10 0 0 2010
     9 0 1 2011
     9 0 0 2011
    10 0 0 2010
     9 0 0 2011
    10 0 1 2010
    10 0 1 2010
     9 0 1 2011
    10 0 1 2010
    10 0 0 2010
     9 0 1 2011
     9 0 0 2011
     9 0 1 2011
    10 0 0 2010
    10 0 1 2010
     9 0 0 2011
     9 0 1 2011
    10 0 0 2010
    10 0 1 2010
    10 0 0 2010
     9 0 1 2011
     9 0 1 2011
     9 0 0 2011
     9 0 1 2011
    10 0 0 2010
     9 0 0 2011
    10 0 0 2010
     9 0 0 2011
     9 0 1 2011
     9 0 1 2011
     9 0 0 2011
    10 0 1 2010
    10 0 1 2010
    10 0 1 2010
    10 0 0 2010
    10 0 0 2010
    10 0 0 2010
    10 0 0 2010
    10 0 1 2010
    10 0 0 2010
    10 0 1 2010
    10 0 0 2010
    10 0 1 2010
     9 0 1 2011
     9 0 1 2011
    10 0 1 2010
     9 0 0 2011
    10 0 1 2010
    10 0 0 2010
    10 0 1 2010
    10 0 0 2010
    10 0 1 2010
     9 0 0 2011
    10 0 1 2010
     9 0 0 2011
     9 0 0 2011
     9 0 1 2011
     9 0 0 2011
     9 0 0 2011
     9 0 1 2011
    10 0 0 2010
    10 0 0 2010
    10 0 0 2010
     9 0 1 2011
    10 0 0 2010
    10 0 1 2010
    10 0 1 2010
     9 0 1 2011
    10 0 1 2010
     9 0 0 2011
    10 0 1 2010
    10 0 0 2010
     9 0 0 2011
    10 0 0 2010
     9 0 0 2011
     9 0 1 2011
     9 0 0 2011
    10 0 1 2010
     9 0 0 2011
    10 0 1 2010
     9 0 0 2011
     9 0 0 2011
    10 0 1 2010
    end

  • #2
    Using a discrete time survival analysis model is a classic approach for educational attainment. It is typically referred to as a Mare model after the late Rob Mare (Mare 1980) or a sequential logit model. It is usually not called a discrete time survival model and its potential that way is often not used, but that is exactly what it is. You seem to want to apply it to an educational system than is not just a series of continue/drop-out decisions, but has some branching. That is possible to, see (Breen & Jonsson 2000).

    Richard Breen & Jan O. Jonsson (2000) Analyzing educational careers: A multinomial transition model. American Sociological Review 65:754-772. https://doi.org/10.2307/2657545

    Robert D. Mare (1980) Social background and social continuation decisions. Journal of the American Statistical Association, 75:295-305. https://doi.org/10.2307/2287448
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Given how you set up your data, you seem to want to look at how long it takes before someone finished vocational education. I would not do that, as now it is very hard to distinguish two very different cases:
      • some vocational programs take longer than others, and in that case a longer program typically means a "higher" program
      • Some students take longer to finish a given program than others, and now taking more time is typically a "bad" sign.
      Instead, you have a set of transitions, so each row in your data is a transition and you have a variable indicating at what level that person started that transition and another variable what level was attained in that transition.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Hi Maarten

        Many thanks for your reply and the references. They are very interesting.

        Your comment in #3 makes great sense, and I would of course make sure to account for differences between vocational programs (and, fortunately, my original dataset allows me to do so).

        I agree that it would be better to arrange data where each row is a transition (and in other parts of my analysis, I do, in fact, use sequence analysis where data is structured, as you suggested). However, the reason that I am now looking into survival analysis is that (at least to my understanding), it allows me to analyze data where some observations are "shorter" than others. For instance, high school students from 2010 until 2020 (=10 years) as opposed to 2011 until 2020 (=9 years). With survival analysis, I believe I would be able to use both. That was what I tried to illustrate with the fabricated dataset above.

        (In other analyses, where I do not use survival analysis, I usually trim the dataset (e.g., only study the population 9 years after high school), although I have actually 10 years of data for half of the population. This is a shame because I lose information).

        Comment


        • #5
          Originally posted by Gustav Egede Hansen View Post
          I agree that it would be better to arrange data where each row is a transition (and in other parts of my analysis, I do, in fact, use sequence analysis where data is structured, as you suggested). However, the reason that I am now looking into survival analysis is that (at least to my understanding), it allows me to analyze data where some observations are "shorter" than others. For instance, high school students from 2010 until 2020 (=10 years) as opposed to 2011 until 2020 (=9 years). With survival analysis, I believe I would be able to use both.
          This is exactly the point I made in #2. Read those references, read up on discrete time survival analysis, and you will see that those are the exact same model. Time in school after controling for level and/or whether a respondent is repeating a year, is hardly ever of interest. So forget about survival analysis in terms of time. Instead focus on discrete "time" survival analysis in terms of transitions. Since education is fairly standardized, I don't see much added value for a sequence analysis either. Anyhow, a lot of research has happend in this area, so starting from the references I gave you can easily find many many many articles on how this has been studied. You don't have to reinvent the wheel.


          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thank you very much Maarten! I'll take a look at the references.

            Best,
            Gustav

            Comment

            Working...
            X