Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stcurve after stcrreg with time varying covariates of interest

    Dear All,
    I have a 32-month patient-level panel data, which I want to use to estimate the impact of COVID-19 infection (coded as binary variable "tvc") on time-to-event 'Y'. There is also the competing risk of death ("dead") that my preclude observing "Y". There is variation across patients in when they have COVID-19 during the 32 month period (variation in timing of treatment), and, of course, not all patient's have COVID-19 during the study period (control arm). Here is how my data is stset and looks:

    Code:
    . stset stop, id(ID) enter(start) failure(d=1) time0(start)
    
    Survival-time data settings
    
               ID variable: ID
             Failure event: d==1
    Observed time interval: (start, stop]
         Enter on or after: time start
         Exit on or before: failure
    
    --------------------------------------------------------------------------
        700,185  total observations
              0  exclusions
    --------------------------------------------------------------------------
        700,185  observations remaining, representing
         26,147  subjects
          4,072  failures in single-failure-per-subject data
        700,185  total analysis time at risk and under observation
                                                    At risk from t =         0
                                         Earliest observed entry t =         0
                                              Last observed exit t =        31
    
    . dataex date ID Female age80plus x1 x2 z1 z2 z3 z4 z5 start Y dead COVID d t0 failtime stop _st _d _t _t0  
    
    ----------------------- copy starting from the next line -----------------------
    
    
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(date ID) byte(Female age80plus) float(x1 x2 z1) double z2 float(z3 z4 z5 start Y dead COVID d t0 failtime stop) byte(_st _d _t _t0)
    714     1 0 0 0 1        0 108734         0         0         0  0 0 . 0 0 0 31  1 1 0  1  0
    715     1 0 0 0 1        0 116834         0         0         0  1 0 . 0 0 0 31  2 1 0  2  1
    716     1 0 0 0 1        0 120565         0         0         0  2 0 . 0 0 0 31  3 1 0  3  2
    717     1 0 0 0 1        0 115212         0         0         0  3 0 . 0 0 0 31  4 1 0  4  3
    718     1 0 0 0 1        0 112542         0         0         0  4 0 . 0 0 0 31  5 1 0  5  4
    719     1 0 0 0 1        0 125570         0         0         0  5 0 . 0 0 0 31  6 1 0  6  5
    720     1 0 0 0 1        0 143446         0         0         0  6 0 . 0 0 0 31  7 1 0  7  6
    721     1 0 0 0 1        0 130334         0         0         0  7 0 . 0 0 0 31  8 1 0  8  7
    722     1 0 0 0 1        0 103759 .24440205 .24440205         0  8 0 . 0 0 0 31  9 1 0  9  8
    723     1 0 0 0 1        0  78091  5.556073  5.311671         0  9 0 . 0 0 0 31 10 1 0 10  9
    724     1 0 0 0 1 .8333333  82506  12.77408  7.218007         0 10 0 . 0 0 0 31 11 1 0 11 10
    725     1 0 0 0 1 .8333333  86526 16.896328  4.122248         0 11 0 . 0 0 0 31 12 1 0 12 11
    726     1 0 0 0 1 .8333333  87483  21.26298   4.36665         0 12 0 . 0 0 0 31 13 1 0 13 12
    727     1 0 0 0 1 .8333333  88849   26.4443  5.181324         0 13 0 . 0 0 0 31 14 1 0 14 13
    728     1 0 0 0 1 .8333333  89153 35.145016  8.700713         0 14 0 . 0 0 0 31 15 1 0 15 14
    729     1 0 0 0 1 .8333333  92720  50.34682 15.201808         0 15 0 . 0 0 0 31 16 1 0 16 15
    730     1 0 0 0 1 .8333333  85388   64.8643 14.517482         0 16 0 . 0 0 0 31 17 1 0 17 16
    731     1 0 0 0 1 .8333333 103097  95.91966 31.055355   1.28795 17 0 . 0 0 0 31 18 1 0 18 17
    732     1 0 0 0 1 .8333333 104865 116.92194 21.002283  8.336668 18 0 . 0 0 0 31 19 1 0 19 18
    733     1 0 0 0 1 .8333333  96550 136.58817  19.66622  20.84593 19 0 . 0 0 0 31 20 1 0 20 19
    734     1 0 0 0 1 .8333333  97024 146.26648  9.678321   41.1382 20 0 . 0 0 0 31 21 1 0 21 20
    735     1 0 0 0 1 .8333333  99512 150.60054  4.334063  64.93627 21 0 . 0 0 0 31 22 1 0 22 21
    736     1 0 0 0 1 .8333333 127138 157.65562  7.055073  76.24552 22 0 . 0 0 0 31 23 1 0 23 22
    737     1 0 0 0 1 .8333333 125215 161.61493  3.959313  83.28538 23 0 . 0 0 0 31 24 1 0 24 23
    738     1 0 0 0 1 .8333333 127337 167.98567  6.370747  89.34604 24 0 . 0 0 0 31 25 1 0 25 24
    739     1 0 0 0 1 .8333333 131623  183.6437 15.658025    96.881 25 0 . 0 0 0 31 26 1 0 26 25
    740     1 0 0 0 1 .8333333 127392    198.65 15.006286 102.72098 26 0 . 0 0 0 31 27 1 0 27 26
    741     1 0 0 0 1 .8333333 134953 208.18167   9.53168 111.11894 27 0 . 0 0 0 31 28 1 0 28 27
    742     1 0 0 0 1 .8333333 131271 253.08647   44.9048 120.57523 28 0 . 0 0 0 31 29 1 0 29 28
    743     1 0 0 0 1 .8333333 137164 264.32898 11.242495 131.78543 29 0 . 0 0 0 31 30 1 0 30 29
    744     1 0 0 0 1 .8333333      0 283.84854 19.519577 138.48784 30 0 0 0 0 0 31 31 1 0 31 30
    714   227 0 0 0 0        0 105688         0         0         0  0 0 . 0 0 0 31  1 1 0  1  0
    715   227 0 0 0 0        0 109137         0         0         0  1 0 . 0 0 0 31  2 1 0  2  1
    716   227 0 0 0 0        0 112341         0         0         0  2 0 . 0 0 0 31  3 1 0  3  2
    717   227 0 0 0 0        0 107216         0         0         0  3 0 . 0 0 0 31  4 1 0  4  3
    718   227 0 0 0 0        0 103992         0         0         0  4 0 . 0 0 0 31  5 1 0  5  4
    719   227 0 0 0 0        0 113248         0         0         0  5 0 . 0 0 0 31  6 1 0  6  5
    720   227 0 0 0 0        0 128925         0         0         0  6 0 . 0 0 0 31  7 1 0  7  6
    721   227 0 0 0 0        0 115043         0         0         0  7 0 . 0 0 0 31  8 1 0  8  7
    722   227 0 0 0 0        0  93962 1.1867286 1.1867286         0  8 0 . 0 0 0 31  9 1 0  9  8
    723   227 0 0 0 0        0  68791 10.558118  9.371388         0  9 0 . 0 0 0 31 10 1 0 10  9
    724   227 0 0 0 0        0  73578 19.015913  8.457796         0 10 0 . 0 0 0 31 11 1 0 11 10
    725   227 0 0 0 0        0  77321  25.97617  6.960258         0 11 0 . 0 0 0 31 12 1 0 12 11
    726   227 0 0 0 0        0  76899   34.6035  8.627329         0 12 0 . 0 0 0 31 13 1 0 13 12
    727   227 0 0 0 0        0  77769  51.85816 17.254658         0 13 0 . 0 0 0 31 14 1 0 14 13
    728   227 0 0 0 0        0  76386   64.6202 12.762042         0 14 0 . 0 0 0 31 15 1 0 15 14
    729   227 0 0 0 0        0  79724  73.36997  8.749769         0 15 0 . 0 0 0 31 16 1 0 16 15
    730   227 0 0 0 0        0  73021  86.63119 13.261222         0 16 0 . 0 0 0 31 17 1 0 17 16
    731   227 0 0 0 0        0  81521  99.72288  13.09169  .7078648 17 0 . 0 0 0 31 18 1 0 18 17
    732   227 0 0 0 0        0  80023 129.76784 30.044956  8.607889 18 0 . 0 0 0 31 19 1 0 19 18
    733   227 0 0 0 0        0  74881  157.8255 28.057655  19.51934 19 0 . 1 0 0 31 20 1 0 20 19
    734   227 0 0 0 0        0  71290  173.9876 16.162113 36.316338 20 0 . 1 0 0 31 21 1 0 21 20
    735   227 0 0 0 0        0  76012 184.26317 10.275563   59.3938 21 0 . 1 0 0 31 22 1 0 22 21
    736   227 0 0 0 0        0  90849 190.06496  5.801785  71.21994 22 0 . 1 0 0 31 23 1 0 23 22
    737   227 0 0 0 0        0  87359  195.2357  5.170746  79.33001 23 0 . 1 0 0 31 24 1 0 24 23
    738   227 0 0 0 0        0  88531  197.5809 2.3452017  84.21228 24 0 . 1 0 0 31 25 1 0 25 24
    739   227 0 0 0 0        0  86848  207.2725  9.691617  91.53414 25 0 . 1 0 0 31 26 1 0 26 25
    740   227 0 0 0 0        0  85626  237.5718 30.299253  100.5245 26 0 . 1 0 0 31 27 1 0 27 26
    741   227 0 0 0 0        0  90028 264.07538 26.503607 108.14006 27 0 . 1 0 0 31 28 1 0 28 27
    742   227 0 0 0 0        0  88215  276.7809  12.70553 116.46112 28 0 . 1 0 0 31 29 1 0 29 28
    743   227 0 0 0 0        0  92667 285.21988  8.438959 125.56448 29 0 . 1 0 0 31 30 1 0 30 29
    744   227 0 0 0 0        0      0  298.5376 13.317733 131.98718 30 0 0 1 0 0 31 31 1 0 31 30
    714   737 0 1 0 0        0  49437         0         0         0  0 1 . 0 0 0 14  1 1 0  1  0
    715   737 0 1 0 0        0  49238         0         0         0  1 1 . 0 0 0 14  2 1 0  2  1
    716   737 0 1 0 0        0  51998         0         0         0  2 1 . 0 0 0 14  3 1 0  3  2
    717   737 0 1 0 0        0  49558         0         0         0  3 1 . 0 0 0 14  4 1 0  4  3
    718   737 0 1 0 0        0  48611         0         0         0  4 1 . 0 0 0 14  5 1 0  5  4
    719   737 0 1 0 0        0  55345         0         0         0  5 1 . 0 0 0 14  6 1 0  6  5
    720   737 0 1 0 0        0  60259         0         0         0  6 1 . 0 0 0 14  7 1 0  7  6
    721   737 0 1 0 0        0  53299 .01313216 .01313216         0  7 1 . 0 0 0 14  8 1 0  8  7
    722   737 0 1 0 0        0  38747  2.967868  2.954736         0  8 1 . 0 0 0 14  9 1 0  9  8
    723   737 0 1 0 0       .3  28353 10.794636  7.826768         0  9 1 . 0 0 0 14 10 1 0 10  9
    724   737 0 1 0 0      1.3  30530 14.786813  3.992177         0 10 1 . 0 0 0 14 11 1 0 11 10
    725   737 0 1 0 0      1.3  32364  17.50517  2.718357         0 11 1 . 0 0 0 14 12 1 0 12 11
    726   737 0 1 0 0      1.3  32721  21.70746 4.2022915         0 12 1 . 0 0 0 14 13 1 0 13 12
    727   737 0 1 0 0      1.3  31891 26.290586  4.583124         0 13 1 0 0 1 0 14 14 1 1 14 13
    714 26657 0 1 0 0        0  61743         0         0         0  0 0 . 0 0 0 31  1 1 0  1  0
    715 26657 0 1 0 0        0  63705         0         0         0  1 0 . 0 0 0 31  2 1 0  2  1
    716 26657 0 1 0 0        0  64712         0         0         0  2 0 . 0 0 0 31  3 1 0  3  2
    717 26657 0 1 0 0        0  63334         0         0         0  3 0 . 0 0 0 31  4 1 0  4  3
    718 26657 0 1 0 0        0  61209         0         0         0  4 0 . 0 0 0 31  5 1 0  5  4
    719 26657 0 1 0 0        0  66799         0         0         0  5 0 . 0 0 0 31  6 1 0  6  5
    720 26657 0 1 0 0        0  76538         0         0         0  6 0 . 0 0 0 31  7 1 0  7  6
    721 26657 0 1 0 0        0  68623         0         0         0  7 0 . 0 0 0 31  8 1 0  8  7
    722 26657 0 1 0 0        0  56262 .42729115 .42729115         0  8 0 . 0 0 0 31  9 1 0  9  8
    723 26657 0 1 0 0        0  42645 4.7390475  4.311756         0  9 0 . 0 0 0 31 10 1 0 10  9
    724 26657 0 1 0 0        0  44820  9.594629  4.855581         0 10 0 . 0 0 0 31 11 1 0 11 10
    725 26657 0 1 0 0        0  46741   14.3531 4.7584696         0 11 0 . 0 0 0 31 12 1 0 12 11
    726 26657 0 1 0 0        0  45964 33.251022 18.897923         0 12 0 . 0 0 0 31 13 1 0 13 12
    727 26657 0 1 0 0        0  46146  52.82872 19.577703         0 13 0 . 0 0 0 31 14 1 0 14 13
    728 26657 0 1 0 0        0  46625  65.60862  12.77989         0 14 0 . 0 0 0 31 15 1 0 15 14
    729 26657 0 1 0 0        0  49075  76.42685 10.818235         0 15 0 . 0 0 0 31 16 1 0 16 15
    730 26657 0 1 0 0        0  44179  85.08921  8.662357         0 16 0 . 0 0 0 31 17 1 0 17 16
    731 26657 0 1 0 0        0  57722 102.86063 17.771427  .9761273 17 0 . 0 0 0 31 18 1 0 18 17
    732 26657 0 1 0 0        0  51584 136.77202  33.91138  8.536908 18 0 . 0 0 0 31 19 1 0 19 18
    733 26657 0 1 0 0        0  51467  165.9832  29.21118  21.10333 19 0 . 0 0 0 31 20 1 0 20 19
    734 26657 0 1 0 0        0  46464  177.6366 11.653396  41.76697 20 0 . 0 0 0 31 21 1 0 21 20
    735 26657 0 1 0 0        0  59542 184.45383  6.817236  62.96244 21 0 . 0 0 0 31 22 1 0 22 21
    736 26657 0 1 0 0        0  66349 189.09576  4.641936  73.31761 22 0 . 0 0 0 31 23 1 0 23 22
    737 26657 0 1 0 0        0  66952  190.7078  1.612053  81.38069 23 0 . 0 0 0 31 24 1 0 24 23
    end
    format %tm date
    ------------------ copy up to and including the previous line ------------------ Listed 100 out of 700185 observations Use the count() option to list more
    To capture the effect of the time variant treatment "COVID", on "Y" with a competing risk of "dead" I estimate a competing risk regression, followed by an attempt to generate a cumulative incidence graph as follows :

    Code:
    . stcrreg Female age80plus, tvc(z1 z2 z3 z4 z5 COVID) compete(dead)
    
             Failure _d: d==1
       Analysis time _t: stop
      Enter on or after: time start
            ID variable: ID
    
    Iteration 0:  Log pseudolikelihood = -40484.897  
    Iteration 1:  Log pseudolikelihood = -40418.002  
    Iteration 2:  Log pseudolikelihood = -40417.681  
    Iteration 3:  Log pseudolikelihood = -40417.681  
    
    Competing-risks regression                        No. of obs      =    700,185
                                                      No. of subjects =     26,147
    Failure event:    d == 1                          No. failed      =      4,072
    Competing events: dead nonzero, nonmissing        No. competing   =      2,357
                                                      No. censored    =     19,718
    
                                                      Wald chi2(8)    =    1484.28
    Log pseudolikelihood = -40417.681                 Prob > chi2     =     0.0000
    
                                    (Std. err. adjusted for 26,147 clusters in ID)
    ------------------------------------------------------------------------------
                 |               Robust
              _t |        SHR   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    main         |
          Female |   1.102109   .0351884     3.05   0.002     1.035254    1.173281
       age80plus |   3.086966   .0967383    35.97   0.000     2.903068    3.282514
    -------------+----------------------------------------------------------------
    tvc          |
              z1 |   .9950684   .0017935    -2.74   0.006     .9915595    .9985898
              z2 |          1   1.13e-08     2.27   0.023            1           1
              z3 |     1.0001   .0000206     4.83   0.000     1.000059     1.00014
              z4 |   1.000123   .0001134     1.08   0.279     .9999004    1.000345
              z5 |   1.000227   .0000315     7.22   0.000     1.000165    1.000289
           COVID |   1.014416   .0040336     3.60   0.000     1.006541    1.022353
    ------------------------------------------------------------------------------
    Note: Variables in tvc equation interacted with _t.
    
    .
    end of do-file
    
    . stcurve, cif at(COVID=(0 1))
    this post-estimation command is not allowed after estimation with tvc();
    see tvc note for an alternative to the tvc() option
    r(198);
    I tried to follow the "tvc note" but I can't figure out how to produce it manually... Will be grateful for any guidance you may be able to offer.

    Sincerely,
    Sumedha

  • #2
    I would try -stsplit-ing the data, breaking observations of each person who got covid into pre- and post-covid observations. Then I would do the -stcrreg- including i.COVID#c._t as one of the predictors. This creates an interaction term between COVID and _t, and the results for this interaction will reflect the extent to which COVID modifies the hazard ratio. And I think -stcurve- will run without objections after that.

    Comment


    • #3
      Thank you for chiming in, Prof. Schechter. To clarify, I believe the data is already split up (for instance see observations for ID 227 in dataex above, for whom COVID==0 for the first 19 periods and then COVID==1 thereafter (periods 20-31). Do I need to further stsplit the data?

      Comment


      • #4
        Ah, yes, I hadn't noticed that in your example data. It looked unsplit, but I see it is, indeed, already split. So I would just do -stcrreg- with i.COVID#c._t included as a predictor with this data, and then try -stcurve-.

        Comment


        • #5
          Ok, great.
          But, I must be doing something wrong still, because I don't get anything anymore:

          Code:
          . stcrreg Female age80plus (c.z1 c.z2 c.z3 c.z4 c.z5 i.COVID)#c._t, compete(dead) vce(cluster ID)
          
                   Failure _d: d==1
             Analysis time _t: stop
            Enter on or after: time start
                  ID variable: ID
          
          Iteration 0:  Log pseudolikelihood = -241.29089  
          Iteration 1:  Log pseudolikelihood = -239.30785  
          Iteration 2:  Log pseudolikelihood =  -239.0089  
          Iteration 3:  Log pseudolikelihood =  -238.9001  
          Iteration 4:  Log pseudolikelihood =   -238.857  
          Iteration 5:  Log pseudolikelihood =  -238.8404  
          Iteration 6:  Log pseudolikelihood = -238.83631  
          Iteration 7:  Log pseudolikelihood = -238.83528  
          Iteration 8:  Log pseudolikelihood = -238.83504  
          Iteration 9:  Log pseudolikelihood =   -238.835  
          Iteration 10: Log pseudolikelihood = -238.83498  
          warning: variance matrix is nonsymmetric or highly singular.
          
          Competing-risks regression                        No. of obs      =      6,956
                                                            No. of subjects =        264
          Failure event:    d == 1                          No. failed      =         45
          Competing events: dead nonzero, nonmissing        No. competing   =         30
                                                            No. censored    =        189
          
                                                            Wald chi2(0)    =          .
          Log pseudolikelihood = -238.83498                 Prob > chi2     =          .
          
                                             (Std. err. adjusted for 264 clusters in ID)
          ------------------------------------------------------------------------------
                       |               Robust
                    _t |        SHR   std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                Female |   1.060472          .        .       .            .           .
             age80plus |   1.760689          .        .       .            .           .
                       |
             c.z1#c._t |   .9767032          .        .       .            .           .
                       |
             c.z2#c._t |   .9999997          .        .       .            .           .
                       |
             c.z3#c._t |   1.000019          .        .       .            .           .
                       |
             c.z4#c._t |   .9985852          .        .       .            .           .
                       |
             c.z5#c._t |   1.000254          .        .       .            .           .
                       |
            COVID#c._t |
                    0  |   50841.43          .        .       .            .           .
                    1  |    54135.5          .        .       .            .           .
          ------------------------------------------------------------------------------
          
          .
          end of do-file
          What did I do wrong? Help

          Gratefully,
          Sumedha

          Comment


          • #6

            Hmm. Sorry, I'm not sure what's going wrong here. I do realize that my advice in #2 was misguided: -tvc()- used without -texp()- does create a direct interaction with _t. I had mistakenly assumed that it created an interaction with ln(_t) by default. I'm not sure why I thought that.

            It isn't possible to troubleshoot with the example data you provided because it is too scanty. You have only 4 different IDs, 1 failure, and no competing events. There is no way to get sensible results from -stcrreg- with it. And I imagine your full data set is far too large to include here in a post.

            To troubleshoot this yourself, I suggest you start over with the simplest model with no time varying covariates, and make sure -stcrreg- converges to a sensible result that way. Then add your time varying variables, interacted with _t, one at a time until something breaks down. That will point you to what aspect of your data set -stcrreg- finds troublesome. You may have to settle for a smaller model than you initially wanted. Alternatively, you can stick with your results with the -tvc()- option, but may not be able to produce a graph.

            It is also possible that I am missing something here, and if somebody else has any suggestions, I hope he or she will chime in.





            Comment


            • #7
              I've just been playing around with the example shown in -help tvc note-. First, they do not use factor-variable notation in that example: they create the interactions by multiplying the variables concerned by _t and adding those. I tried to see if it works with factor variable notation and I discovered that for continuous time-varying covariates it does, but for discrete ones it leads to non-convergence or convergence to implausible results after a large number of iterations. As I'm not in familiar territory here, I'm not going to specifically declare this to be a bug in -stcrreg-, but I tend to think it is. In any case, your only discrete time-varying covariate here is COVID. As that variable is dichotomous and we are not going to use -margins-, this suggests trying the analysis using c.COVID instead. I would try this before the more time-consuming troubleshooting approach I mentioned in #6 since it will be quick, and I think there is a good chance it will work.

              Comment


              • #8
                Click image for larger version

Name:	Graph_statalist.png
Views:	2
Size:	48.4 KB
ID:	1728011 You are the best! Yes, that worked!

                Code:
                . foreach tvc in x1 x2 z1 z2 z3 z4 z5 COVID {
                  2.         gen `tvc'tvc=`tvc'*(_t)
                  3. }
                
                . stcrreg Female age80plus c.z1 c.z2 c.z3 c.z4 c.z5 c.COVID c.z1tvc c.z2tvc c.z3tvc c.z4tvc c.z5tvc c.COVIDtvc, compete(dead) vce(cluster ID)
                
                         Failure _d: d==1
                   Analysis time _t: stop
                  Enter on or after: time start
                        ID variable: ID
                
                Iteration 0:  Log pseudolikelihood = -239.52031  
                Iteration 1:  Log pseudolikelihood = -239.12838  
                Iteration 2:  Log pseudolikelihood = -239.12643  
                Iteration 3:  Log pseudolikelihood = -239.12643  
                
                Competing-risks regression                        No. of obs      =      6,956
                                                                  No. of subjects =        264
                Failure event:    d == 1                          No. failed      =         45
                Competing events: dead nonzero, nonmissing        No. competing   =         30
                                                                  No. censored    =        189
                
                                                                  Wald chi2(14)   =      34.12
                Log pseudolikelihood = -239.12643                 Prob > chi2     =     0.0020
                
                                                   (Std. err. adjusted for 264 clusters in ID)
                ------------------------------------------------------------------------------
                             |               Robust
                          _t |        SHR   std. err.      z    P>|z|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                      Female |   1.085202   .3453661     0.26   0.797     .5815878    2.024911
                   age80plus |   1.692023   .5487433     1.62   0.105     .8960896    3.194928
                          z1 |   .3194986   .5694249    -0.64   0.522     .0097145    10.50797
                          z2 |   .9999984   2.06e-06    -0.76   0.449     .9999944    1.000002
                          z3 |   1.004033   .0166543     0.24   0.808     .9719164    1.037212
                          z4 |   .8690125   .0892814    -1.37   0.172      .710517    1.062864
                          z5 |   1.011105   .0622987     0.18   0.858     .8960861    1.140887
                       COVID |    7.85115   27.61417     0.59   0.558      .007963    7740.892
                       z1tvc |   1.037434   .1078433     0.35   0.724     .8462067    1.271876
                       z2tvc |   .9999998   1.87e-07    -0.89   0.373     .9999995           1
                       z3tvc |   .9998533   .0008061    -0.18   0.856     .9982748    1.001434
                       z4tvc |   1.006519    .005446     1.20   0.230     .9959014     1.01725
                       z5tvc |   .9999654   .0019668    -0.02   0.986     .9961179    1.003828
                    COVIDtvc |   .9690446   .1599352    -0.19   0.849     .7012258    1.339151
                ------------------------------------------------------------------------------
                
                . stcurve, cif at(COVID=(0 1)) legend(pos(6) cols(2)) lcolor(navy khaki)
                note: function evaluated at specified values of selected covariates and overall means of other covariates (if any).
                Sorry to ask for more help, but to examine the effect of COVID on CIF over the study period is this correct? I am not sure how to interpret this? Or should I be graphing COVIDtvc or a linear combination of COVID+COVIDtvc? I tried that... but it looks really messy:

                Code:
                . tab COVIDtvc
                
                   COVIDtvc |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          0 |      6,783       97.51       97.51
                         11 |          2        0.03       97.54
                         12 |          3        0.04       97.58
                         13 |          3        0.04       97.63
                         14 |          3        0.04       97.67
                         15 |          3        0.04       97.71
                         16 |          3        0.04       97.76
                         17 |          4        0.06       97.81
                         18 |          7        0.10       97.92
                         19 |          8        0.12       98.03
                         20 |         11        0.16       98.19
                         21 |         11        0.16       98.35
                         22 |         11        0.16       98.50
                         23 |         11        0.16       98.66
                         24 |         11        0.16       98.82
                         25 |         10        0.14       98.96
                         26 |         10        0.14       99.11
                         27 |         10        0.14       99.25
                         28 |         11        0.16       99.41
                         29 |         13        0.19       99.60
                         30 |         14        0.20       99.80
                         31 |         14        0.20      100.00
                ------------+-----------------------------------
                      Total |      6,956      100.00
                
                . stcurve, cif at(COVIDtvc=(11 12 13 14 15 16 17 18 19 20 21)) legend(pos(6) cols(2)) lcolor(navy khaki)
                note: function evaluated at specified values of selected covariates and overall means of other covariates (if any).
                
                
                Click image for larger version
                
                Name:	Graph_statalist_COVIDtvc.png
                Views:	1
                Size:	71.6 KB
                ID:	1728012
                Thank you so much for your guidance. Very grateful.
                Sumedha
                Last edited by Sumedha Gupta; 23 Sep 2023, 19:29.

                Comment


                • #9
                  The first analysis and graph in #8 look right to me. You have two curves, one of which shows the cumulative incidence function when COVID == 0 and the other shows it when COVID == 1. And your analysis overcomes the problem of proportional hazards failure that arises from a simpler model by virtue of including COVID#_t interaction.

                  The second graph you show, in addition to being messy, frankly doesn't even make sense to me. I don't grasp what those separate curves all mean. In fact, I don't see any sense in graphing CIF vs _t in a curve that is supposedly conditioned on, for example, COVIDtvc == 11, since, by definition of COVIDtvc, this entails that _t is restricted to being 11. So graphing that against a range of _t makes no sense that I can see.

                  Added: I do have a concern about the analysis, however. In the original model in #1, with the use of -tvc()-, the sample size is well over 700,000. But the current analysis uses a sample size just under 7,000, i.e. less than 1% of the original data. What is going on there? Where did the rest of the data go? It cannot be due to more observations with missing values associated with the homebrew interaction variables, as the interactions only involve variables that were already in the original model. Moreover, at least in the example data, you have almost no missing values in the data. The only ones I see are in the variable dead which encodes the competing event. I don't see why that should matter in the model of #8 but not the model of #1, but it is generally a bad idea in Stata to use missing values in situations where the value is actually known to be 0. So I would replace all of those missing values of dead by zeroes. (It is an event, and it can only occur at one point in time for each id, so if it is not known to have occurred, it must be zero.) I don't know if that will resolve this issue, but it is worth a try.
                  Last edited by Clyde Schechter; 24 Sep 2023, 11:04.

                  Comment


                  • #10
                    Thank you for the explanation, Prof. Schechter. To be sure I am understanding the estimates and the CIF graph correctly, does the COVID shr estimate of 7.85115 imply that the subhazard of pain in those with COVID==1 is 7.85115*subhazard of pain for those without COVID (i.e. COVID==0)? Furthermore, since the COVID#_t interaction (variable COVIDtvc) shr is .9690446, it means that over _t (i.e. every additional time period that goes by) the total COVID shr gets multiplied with COVID#_t interaction i.e. 7.85115 in period 0, in period _t=1 it becomes 7.85115*.9690446=7.6081145, in period _t=2 it becomes 7.85115*.9690446*.9690446=7.3726023, and so on and so forth? If that is correct, how should the declining in time shr of COVID be interpreted? Does it mean over time the shr from COVID on pain is declining? Is that because of the competing risk of dead?

                    Second, if I wanted to further check the differential impact of COVID on Female, I should estimate Option A or Option B below?

                    Code:
                    . *OPTION A
                    . stcrreg i.Female age80plus c.z1 c.z2 c.z3 c.z4 c.z5 c.COVID c.z1tvc c.z2tvc c.z3tvc c.z4tvc c.z5tvc c.COVIDtvc (c.COVID c.COVIDtvc)#i.Female, compete(dead) vce(
                    > cluster ID)
                    
                             Failure _d: d==1
                       Analysis time _t: stop
                      Enter on or after: time start
                            ID variable: ID
                    
                    Iteration 0:  Log pseudolikelihood = -234.90251  
                    Iteration 1:  Log pseudolikelihood = -234.41716  
                    Iteration 2:  Log pseudolikelihood =   -234.414  
                    Iteration 3:  Log pseudolikelihood = -234.41398  
                    
                    Competing-risks regression                        No. of obs      =      6,956
                                                                      No. of subjects =        264
                    Failure event:    d == 1                          No. failed      =         45
                    Competing events: dead nonzero, nonmissing        No. competing   =         30
                                                                      No. censored    =        189
                    
                                                                      Wald chi2(16)   =     605.15
                    Log pseudolikelihood = -234.41398                 Prob > chi2     =     0.0000
                    
                                                            (Std. err. adjusted for 264 clusters in ID)
                    -----------------------------------------------------------------------------------
                                      |               Robust
                                   _t |        SHR   std. err.      z    P>|z|     [95% conf. interval]
                    ------------------+----------------------------------------------------------------
                             1.Female |   1.227999   .4055069     0.62   0.534     .6428602     2.34574
                            age80plus |   1.561727   .5208886     1.34   0.181     .8122704    3.002683
                                   z1 |   .3313203   .5815692    -0.63   0.529     .0106201    10.33639
                                   z2 |   .9999983   2.04e-06    -0.84   0.400     .9999943    1.000002
                                   z3 |   .9824367   .0167259    -1.04   0.298     .9501955    1.015772
                                   z4 |   .8839783   .0892156    -1.22   0.222     .7253273    1.077331
                                   z5 |   1.021148   .0750868     0.28   0.776     .8840936    1.179448
                                COVID |   20640.04    57682.8     3.55   0.000     86.26899     4938176
                                z1tvc |   1.046752   .1074296     0.45   0.656     .8560201    1.279982
                                z2tvc |   .9999999   1.81e-07    -0.82   0.414     .9999995           1
                                z3tvc |   1.000792   .0008772     0.90   0.367     .9990739    1.002512
                                z4tvc |    1.00585   .0054271     1.08   0.280     .9952694    1.016544
                                z5tvc |   .9994563   .0024432    -0.22   0.824     .9946792    1.004256
                             COVIDtvc |   .6662193   .1057127    -2.56   0.010      .488149    .9092473
                                      |
                       Female#c.COVID |
                                   1  |   9.90e-47   8.81e-46   -11.90   0.000     2.62e-54    3.74e-39
                                      |
                    Female#c.COVIDtvc |
                                   1  |   41.14739   13.38244    11.43   0.000     21.75226    77.83593
                    -----------------------------------------------------------------------------------
                    
                    . *OPTION B
                    . stcrreg age80plus c.z1 c.z2 c.z3 c.z4 c.z5 c.COVID c.z1tvc c.z2tvc c.z3tvc c.z4tvc c.z5tvc c.COVIDtvc (c.COVID c.COVIDtvc)#i.Female, compete(dead) vce(cluster I
                    > D)
                    
                             Failure _d: d==1
                       Analysis time _t: stop
                      Enter on or after: time start
                            ID variable: ID
                    
                    Iteration 0:  Log pseudolikelihood = -235.13685  
                    Iteration 1:  Log pseudolikelihood =  -234.6232  
                    Iteration 2:  Log pseudolikelihood = -234.61934  
                    Iteration 3:  Log pseudolikelihood = -234.61931  
                    
                    Competing-risks regression                        No. of obs      =      6,956
                                                                      No. of subjects =        264
                    Failure event:    d == 1                          No. failed      =         45
                    Competing events: dead nonzero, nonmissing        No. competing   =         30
                                                                      No. censored    =        189
                    
                                                                      Wald chi2(15)   =     509.28
                    Log pseudolikelihood = -234.61931                 Prob > chi2     =     0.0000
                    
                                                            (Std. err. adjusted for 264 clusters in ID)
                    -----------------------------------------------------------------------------------
                                      |               Robust
                                   _t |        SHR   std. err.      z    P>|z|     [95% conf. interval]
                    ------------------+----------------------------------------------------------------
                            age80plus |    1.61644   .5135049     1.51   0.131     .8672673    3.012772
                                   z1 |   .3293982   .5720933    -0.64   0.523     .0109488    9.910005
                                   z2 |   .9999983   2.05e-06    -0.82   0.411     .9999943    1.000002
                                   z3 |   .9826947   .0166535    -1.03   0.303     .9505905    1.015883
                                   z4 |   .8845144   .0884361    -1.23   0.220     .7271089    1.075995
                                   z5 |   1.021126    .073619     0.29   0.772      .886566    1.176109
                                COVID |   16628.32   45079.21     3.58   0.000     81.89337     3376353
                                z1tvc |   1.046904   .1061023     0.45   0.651     .8582989    1.276953
                                z2tvc |   .9999999   1.83e-07    -0.81   0.419     .9999995           1
                                z3tvc |   1.000776   .0008713     0.89   0.373     .9990695    1.002485
                                z4tvc |   1.005841   .0053774     1.09   0.276     .9953567    1.016436
                                z5tvc |   .9994487   .0023928    -0.23   0.818       .99477    1.004149
                             COVIDtvc |   .6692391   .1043664    -2.58   0.010     .4929902    .9084987
                                      |
                       Female#c.COVID |
                                   1  |   1.45e-46   1.33e-45   -11.49   0.000     2.20e-54    9.58e-39
                                      |
                    Female#c.COVIDtvc |
                                   1  |   40.81373   13.59388    11.14   0.000      21.2469    78.40019
                    -----------------------------------------------------------------------------------
                    If option A is the correct way, then the shr on women with COVID=1.227999*9.90e-47 at _t=0, which will get further multiplied by Female#c.COVIDtvc==41.14739 every addition _t>0? If Option B is correct instead, then the shr on women with COVID=1.45e-46 at _t=0, which will get further multiplied by Female#c.COVIDtvc==40.81373 every addition _t>0? If I look at the stcurve CIF graph the CIF for those with COVID==1 if about .8 by end of the study period (32 months) whereas its about .2 for those with COVID==.2, should I interpret that as cumulative incidence of pain is about 4 times (.8/.2) by the end of the study period in those with COVID relative to those without COVID?

                    Trying the stcurve I get flat lines for COVID==1, for both Female==0 and Female==1. What might be the issue?


                    Code:
                     . stcurve, cif at(COVID=0 Female=0) at(COVID=0 Female=1) at(COVID=1 Female=0) at(COVID=1 Female=1) legend(pos(6) cols(2)) lcolor(navy khaki ebblue midgreen) note: function evaluated at specified values of selected covariates and overall means of other covariates (if any).
                    Click image for larger version

Name:	Graph_statalist_COVIDtvcfemale.png
Views:	1
Size:	53.5 KB
ID:	1728046

                    What have I done wrong? So grateful for all your guidance Prof. Schechter.
                    Sincerely, Sumedha

                    p.s. Regarding the much smaller sample size, I took a 1% random sample of the full data to debug and troubleshoot through trial-and-error as the full data was taking too long to estimate with each time. Sorry, I intended to mention that but forgot. Regarding coding dead==. for all waves prior to the final wave, I mistakenly thought I needed to do that to correct stset the data. Good to know that is not the case.
                    Last edited by Sumedha Gupta; 24 Sep 2023, 12:37.

                    Comment


                    • #11
                      does the COVID shr estimate of 7.85115 imply that the subhazard of pain in those with COVID==1 is 7.85115*subhazard of pain for those without COVID (i.e. COVID==0)?
                      Yes, this is correct.

                      Furthermore, since the COVID#_t interaction (variable COVIDtvc) shr is .9690446, it means that over _t (i.e. every additional time period that goes by) the total COVID shr gets multiplied with COVID#_t interaction i.e. 7.85115 in period 0, in period _t=1 it becomes 7.85115*.9690446=7.6081145, in period _t=2 it becomes 7.85115*.9690446*.9690446=7.3726023, and so on and so forth?
                      No, this is not correct. The dependent variable in -stcox- -stcrreg-) is the hazard (subhazard) of the outcome. So the hazard (subhazard) declines in something like the way you describe, but the hazard (subhazard) ratio between the two groups does not.

                      Although this is not strictly mathematically correct, you might think about this like a compound interest problem. If Alice makes a deposit into an account that pays 1.5% annual interest, and Bob makes a deposit into one that pays 1.0% annual interest, then the Alice:Bob balance ratio grows larger and larger over time, but he return ratio remains 1.5 in perpetuity. A (sub)hazard is analogous to a return, not a balance.

                      Comment


                      • #12
                        Thank you Prof Clyde Schechter; I think what I remain confused about is what role the COVID#_t interaction is then playing here if it doesn't enter the calculation of the CIF calculation (except that the COVID shr is now conditional on COVID#_t interaction)? Also, to be sure I understand the stcurve CIF graph correctly, the CIF for those with COVID==1 is about .8 by the end of the study period (32 months) whereas its about .2 for those with COVID==0. Should I interpret that as cumulative incidence of pain is about 4 times (.8/.2) by the end of the study period in those with COVID relative to those without COVID? Adding, and the CIF is showing the 'balance' in your compound interest example, which grows over time despite the constant shr?

                        Will be very grateful for any guidance you may be able to offer on the interaction model specification and interpretation above as well 🙏.
                        Gratefully,
                        Sumedha
                        Last edited by Sumedha Gupta; 24 Sep 2023, 14:10.

                        Comment


                        • #13
                          Thank you Prof Clyde Schechter; I think what I remain confused about is what role the COVID#_t interaction is then playing here if it doesn't enter the calculation of the CIF calculation (except that the COVID shr is now conditional on COVID#_t interaction)?
                          It does enter the CIF calculation.

                          If you know calculus, the CIF is, 1 minus the exponentiated integral of the negative of the subhazard over time. And the subhazard in these models is the base subhazard function (hazard function when all predictor variables are 0) * the product of the subhazard ratios specific to the values of the variables. So when we have a case where COVID = 1, the subhazard (not the subhazard ratio) is equal to the base subhazard * the product of the subhazard ratios of any other variables * the subhazard ratio of the COVID variable * the subhazard ratio of the COVID#_t interaction term * _t. And then the CIF is 1 minus the exponentiated integral of the negative of the subhazard from 0 to _t.

                          Should I interpret that as cumulative incidence of pain is about 4 times (.8/.2) by the end of the study period in those with COVID relative to those without COVID?
                          Well, that is an accurate interpretation of the facts at the end of the study. But that is just one point in time. The graph as a whole shows the growing probability of the outcome event from time 0 all the way to the end of the study.

                          Adding, and the CIF is showing the 'balance' in your compound interest example, which grows over time despite the constant shr?
                          Yes, loosely speaking.

                          My explanation about balance and return rate is not a precise analog of the present situation: the detailed mathematical relationships are different. I was just trying to illustrate in more familiar, less abstract terms how something could be growing at two different constant rates (in two different people/settings) and how the corresponding accumulated results over time would show an increasing ratio. Also your situation is a bit more complicated because the subhazards here are not constant: they are explicitly varying with time, and in the case of the z* and COVID variables even the subhazard ratios differ between the times up to and following getting covid.

                          Comment

                          Working...
                          X