Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding category to variable and sustain response after point of change in longitudinal dataset

    Dear Stata Listers,

    I am working with a 19 wave longitudinal dataset of around 1,000 respondents, which I have filled in for those with waves missing so that all respondents have 19 observations. I am trying to set up the dataset for a sequence analysis with the following aim: I want to determine if having a certain job status leads to getting married sooner than other job statuses.

    The approach I thought of to do this would be to add a new category to the job status variable (job_status_combo) that would be captured in the sequence analysis to indicate when the respondent gets married. Essentially, my aim is to somehow merge or add the 'mar_change' response to the 'job_status_combo' variable so the respondent would be captured in the same wave they got married, and then also apply the married status to all remaining waves thereafter.
    'mar_change' equals 1 only in the year the respondent got married, and is missing otherwise.

    I'm not sure if this would be the best method to achieve my goal? For sequence analysis, I believe I need to contain all sequential options in one variable. Also, I'm not sure if/how the married status could 'override' the other job statuses within the same variable once the respondent is married?

    If this approach is appropriate, there are two steps I cannot figure out:

    1) how to add a category to the job status variable to have an additional category indicating when someone gets married. The job status variable currently has 5 categories; a 6th category would indicate if they got married. I could not figure out how to use "replace" to add this category only when it occurs chronologically (it filled all years with a 6 when I tried it).

    2) how to tell stata to apply the added category of being married to all waves that follow the year they got married. Once a respondent is married, I will ignore their job status.

    For instance: respondent 203 (pid) got married in wave 14 (indicated by the '1' in the 4th column), so I want change the 5th column to a 6 in that year, and the 6 would be sustained for the remainder of that individual's time in the dataset. On the contrary, respondent 501 never got married, and therefore the 5th column would not change.

    My variables are:

    PID (id#), wave (1-19), p_married (marital status, 3 categories), mar_change (year respondent got married), and job_status_combo (currently 5 category job status)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pid byte(wave p_married) float(mar_change job_status_combo)
     203  1 1 . .
     203  2 1 . .
     203  3 1 . .
     203  4 1 . .
     203  5 1 . .
     203  6 1 . 1
     203  7 1 . 2
     203  8 1 . 2
     203  9 1 . 2
     203 10 1 . 2
     203 11 1 . 2
     203 12 1 . 2
     203 13 1 . 2
     203 14 1 1 2
     203 15 . . .
     203 16 2 . .
     203 17 2 . .
     203 18 2 . .
     203 19 2 . .
     501  1 1 . 1
     501  2 1 . 1
     501  3 . . .
     501  4 1 . 1
     501  5 . . .
     501  6 . . .
     501  7 . . .
     501  8 1 . 1
     501  9 . . .
     501 10 . . .
     501 11 . . .
     501 12 . . .
     501 13 . . .
     501 14 . . .
     501 15 . . .
     501 16 . . .
     501 17 . . .
     501 18 . . .
     501 19 . . .
     801  1 1 . 1
     801  2 1 . 1
     801  3 1 . 1
     801  4 1 . 1
     801  5 1 . 1
     801  6 1 1 1
     801  7 2 . 1
     801  8 2 . .
     801  9 . . .
     801 10 2 . 1
     801 11 2 . 1
     801 12 2 . 1
     801 13 2 . 1
     801 14 2 . 1
     801 15 . . .
     801 16 . . .
     801 17 . . .
     801 18 . . .
     801 19 . . .
    end
    label var pid "Personal ID" 
    label var wave "Wave of Survey" 
    label var p_married "Marital Status" 
    label var mar_change "Year respondent got married" 
    label var job_status_combo "Job and Marital Status Combined"

    Above is what I currently have (respondent 203), below is what I want to create (notice change in 5th column to 6 in wave 14):
    203 1 1 . .
    203 2 1 . .
    203 3 1 . .
    203 4 1 . .
    203 5 1 . .
    203 6 1 . 1
    203 7 1 . 2
    203 8 1 . 2
    203 9 1 . 2
    203 10 1 . 2
    203 11 1 . 2
    203 12 1 . 2
    203 13 1 . 2
    203 14 1 1 6
    203 15 . . 6
    203 16 2 . 6
    203 17 2 . 6
    203 18 2 . 6
    203 19 2 . 6


    Any thoughts or suggestions would be greatly appreciated!!

    All the best,
    Wyatt

  • #2
    I'm at best agnostic (per below) with respect to whether what you want to do is a good idea, but I believe this does what you want:
    Code:
    bysort pid (wave): replace job_status_combo = 6 if (sum(mar_change == 1) > 0)
    The statement "I want to determine if having a certain job status leads to getting married sooner than other job statuses" sounds well-suited to a discrete-time survival model, and "*having* a certain job status" does not sound like a sequential variable to me. "Having job status 3 before job status 5 or 2" sounds like a sequential variable to me, but then I know almost nothing about sequence analysis.

    Comment


    • #3
      Dear Mike,

      Thank you so much for your help! This did exactly what I was trying to do. I hadn't seen this method, but it's very useful!

      Also, I appreciate your feedback on my approach. I am investigating life transitions and trajectories, so in my search for methods, sequence analysis seemed most appropriate, but I will look at it again from the perspective of discrete-time survival analysis.

      Thank you very much!

      All the best,
      Wyatt

      Comment

      Working...
      X