Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Out-of sample prediction with user written command

    Hi,

    I have fitted a multi trajectory model using the user written command traj. Downloadable from (https://www.andrew.cmu.edu/user/bjones/example.htm). It is a subset of finite mixture modeling and it uses an Expectation-Maximization algorithm to calculate likelihood, BIC, AIC. It is used to get trajectories of similar developments of a outcome over time (in my case organ failure in the intensive care unit). A way of longitudinal clustering. I have fitted a model with 5 trajectories using 600 of my 660 patients. After successful model estimation you will get Posterior Probabilities of Group Membership (PPGM) for all patients used in model building, i.e. a probability of trajectory group membership which is based on their individual outcome trajectory. In my case 600 patients assigned according to their PPGM to trajectory groups 1-5.

    I would now like to test my model using the 60 out-of sample patients. The command traj saves (amongst other things)
    coefficient vector in ​​​​e(b) and variance-covariance matrix of the estimators in ​​​​​​e(V). I just do not know how to use the estimated coefficients (or something else) to predict trajectory group membership (or to be more precise, predict PPGM:s) for the out-of sample patients.


    These are the coefficients for the model (however simply using predict efter model estimation does not seem to work):
    Code:
    mat list e(b)
    e(b)[1,93]
        intercG1M1  linearG1M1  quadraG1M1  intercG2M1  linearG2M1  quadraG2M1
    y1  -.24828183    .0957084  -.18756259   .94245807   .15709564  -.01689182
    
        intercG3M1  linearG3M1  quadraG3M1   cubicG3M1  intercG4M1  linearG4M1
    y1   .47850806   .41861559  -.11403015   .00434383   4.8893951  -.14670883
    
        intercG5M1  linearG5M1     sigmaM1  intercG1M2  linearG1M2  intercG2M2
    y1   2.8395839  -.08377488   1.7084579  -1.9743456  -1.0622643  -1.7178047
    
        linearG2M2  quadraG2M2  intercG3M2  linearG3M2  quadraG3M2  intercG4M2
    y1   .19386034  -.01960231  -2.9482756   .28555771  -.07082197  -1.7504799
    
        linearG4M2  intercG5M2  linearG5M2  quadraG5M2     sigmaM2  intercG1M3
    y1  -.23470341   1.0383231   .47755237  -.03152176   2.5935315   1.1313146
    
        linearG1M3  quadraG1M3  intercG2M3  linearG2M3  intercG3M3  linearG3M3
    y1  -.02527667  -.30855228   3.9644703  -.37858437   3.3076506  -.40766561
    
        quadraG3M3  intercG4M3  linearG4M3  intercG5M3  linearG5M3  quadraG5M3
    y1  -.04323597   4.8143053  -.27388504   5.8055347  -.53495029   .02150391
    
           sigmaM3  intercG1M4  linearG1M4  intercG2M4  linearG2M4  quadraG2M4
    y1   2.0796368  -.18207435  -.77078781  -1.3592468   .29705347  -.02165018
    
        intercG3M4  linearG3M4  quadraG3M4  intercG4M4  linearG4M4  quadraG4M4
    y1  -1.0767241   .22194989  -.04113865    -.952448   .16721562  -.01153703
    
        intercG5M4  linearG5M4  quadraG5M4     sigmaM4  intercG1M5  linearG1M5
    y1  -.40631714   .48732439  -.02192546   1.7242394  -1.3547466   1.5372866
    
        quadraG1M5  intercG2M5  linearG2M5  quadraG2M5  intercG3M5  linearG3M5
    y1  -.40166623   1.0632533  -.12456124  -.01900371   -.2377571   .80695618
    
        quadraG3M5  intercG4M5  linearG4M5  quadraG4M5  intercG5M5  linearG5M5
    y1   -.1548212  -.22852551   .53338764  -.10662458   1.5454625   .06741163
    
        quadraG5M5     sigmaM5  intercG1M6  linearG1M6  quadraG1M6  intercG2M6
    y1  -.02318965   1.3083379    1.384895   .22452482   -.2026116   2.1075309
    
        linearG2M6  quadraG2M6  intercG3M6  linearG3M6  quadraG3M6  intercG4M6
    y1   .14149251  -.01543762   1.7952867   .23408158  -.05830794   2.0573176
    
        linearG4M6  quadraG4M6  intercG5M6  linearG5M6     sigmaM6    mthetaG2
    y1   .10300466  -.00803982   2.7987022   -.0460269   1.0136497  -1.2123054
    
          mthetaG3    mthetaG4    mthetaG5
    y1  -.79020097  -1.1170229  -2.0183256
    
    . predict trajectory_group_hat
    variable intercG1M1 not found
    Edit: intercG1M1 is the coefficent for intercept for Group 1 and outcome 1.
    quadraG4M5 is the coefficient for the quadratic term for Group 4 and outcome 5.



    I am new to prediction models, since this is a user-written command most guides and helpfiles does not bring me any closer to a solution. If anyone needs more info on my dataset, code for traj command or anything else please ask. I tried to keep the first post somewhat limited because I am uncertain on what information necessary for a solution.
    Any ideas would be much appreciated.

    All the best,


    Jesper Eriksson, Stockholm, Sweden.







    Last edited by Jesper Eriksson; 24 Sep 2020, 03:31.

  • #2
    As is common with user written procedures, they don't necessarily play well with other components in Stata.

    By the way, is often a good idea to post the simplest model you can that will demonstrate your problem when you post to Stata list. I would assume your problem can be illustrated with fewer than 70 or whatever variables.

    I did not see it in my brief glance at the documentation, but normally such procedures and papers include a program that runs specific data to produce an example. I would try to run that specific program on the sample data set used in the documentation first to see that that works. If all else fails, you may need to contact the authors

    Comment


    • #3
      Thanks Phil.
      I found an article (J Elmer, Resuscitation, 2020) that used the option outofsample to exclude variabes in model building. So it turned out that traj actually had an option for this exact purpose. I could not fins the option in the documentation anywhere though which surprise me somewhat.

      For closing the thread (dummy code):
      (outofsample is coded 1 if not to be used in model estimation but instead estimated from the parameter values of that observation (by the estimated coefficients of the model)
      (probupdates gives posterior probabilities for each time, rather for after the full duration of time)

      Code:
      traj, multgroups(5) var1(a_1-a_14) indep1(time_1-time_14) model1(cnorm) max1(4) order1(1 1 1 1 1) var2(b_1-b_14) indep2(time_1-time_14) model2(cnorm) max2(4) order2(2 2 2 2 2) var3(c_1-c_14) indep3(time_1-time_14) model3(cnorm) max3(4) order3(3 3 3 3 3)  detail start(some_startvalue_matrix) probupdates outofsample(indicator_varname_not_to_be_used_in_model)

      Comment


      • #4
        Hi Jesper - thanks for sharing! I'm running a GBTM analysis using the traj package, and the Stata code presented in the Elmer 2020 paper was helpful to read.

        Two follow-up questions for you, if you don't mind:

        1) How do you calculate the average posterior probability over the entire duration of observation (as opposed to the posterior probability following each time epoch, per the "probupdates" command)?

        2) How do you calculate the odds of correct classification?

        Would greatly appreciate any/all suggestions you might have!

        Comment


        • #5
          Managed to find a solution to the above question re: average posterior probabilities and odds of correct classification, courtesy of Andrew P. Wheeler -- posting here in case it's helpful to others: https://andrewpwheeler.com/2016/10/0...it-statistics/

          Comment

          Working...
          X