Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival Data

    Dear all,
    I am trying to restructure/compress my dataset because it’s currently too big to do anything with. In its current form I have around 200,000 individuals observed in 1080 time periods (days) each – giving me a dataset with more than 200m obs.
    I am using it for a survival analysis and its current form looks like this:
    id t0 t1 y var1 var2 var3
    1 0 1 0 0 0 4
    1 1 2 0 2 1 4
    1 2 3 0 2 1 4
    1 3 4 0 3 1 4
    1 4 5 1 5 0 4
    I.e. individual 1’s failure time is t1==5.
    var1 and var2 are time-varying variables and var3 is constant.
    I am mainly interested in the effect of the time-varying variables var1 and var2.
    For instance, I want to run the following cox regression model
    stset t1, failure(y==1) time0(t0) id(id)
    stcox var1 var2 var3
    Here, the interpretation of (the exponentiated coefficient of) var1 is the percentage-change in the hazard ratio associated with a unit increase in var1 in a given day.
    However, because of the size of the dataset I am thinking about restructuring to something like:
    id t0 t1 y var1 var2 var3
    1 0 1 0 0 0 4
    1 1 3 0 2 1 4
    1 3 4 0 3 1 4
    1 4 5 1 5 0 4
    I.e. I want to collapse rows where var1 and var2 don’t change to end up with fewer observations. This can be done with:
    collapse (first) t0 (last) t1 (first) var3, by(id var1 var2 y)
    My question is:
    Will the interpretation of var1 remain the same? I.e. will Stata still know that individual 1 had var1=2 in t1=2 and t1=3?

  • #2
    Any chance I can delete this post? Want to reformulate the problem in another post.

    Comment


    • #3
      Said:
      no chance, as far as I know.
      You can simply
      reformulate the problem in another post.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X