Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stsum percentiles do not coincide with summary on spell length variable

    Dear Stata Users,

    I am puzzled by the Stata output for the command stsum. I am working with multi-episode data (split as suggested in the stsum documentation) split by category and would like to determine within-category percentiles for survival time for my dataset. Additionally, I have a variable spellLength which captures the difference between the end of the spell (endDate) and the start of the spell (startDate) in days. However using the summary command with the by() option to find the percentiles for survival using spellLength gives different percentiles than stsum also using the by option. I have performed the Kaplan-Meier calculation in Excel and get the same results as the summary command in Stata.

    Could someone please help me understand what may be leading to this discrepancy?

    I am running Stata 15.1 on a Unix machine.

    Thank you!

    Jonathan Gomez Martinez

  • #2
    This is how I am setting the data and getting the discrepancy outlined above. Any help would be greatly appreciated

    Code:
    sort IntProductID releaseDate
    by IntProductID: egen firstObs = min(releaseDate)
    gen startTime = releaseDate - firstObs
    
    gen Update = 0
    by IntProductID: replace Update = 1 if IntProductID == IntProductID[_n+1]
    
    by IntProductID: gen endTime = startTime[_n+1]
    replace endTime=date("31dec2017","DMY") - firstObs if endTime==.
    
    gen timeToUpdate = endTime - startTime
    
    drop if Update==0
    
    stset endTime, f(Update == 1) id(IntProductID) time0(startTime) exit(time .)
    
    do CreateVars
    
    save NotRightCensored, replace
    
    stsum, by(CategoryCluster)
    
    by CategoryCluster, sort : stsum
    
    by CategoryCluster, sort : summarize timeToUpdate, detail

    Comment

    Working...
    X