Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Control function approach with longitudinal binary endogenous regressor and survival time outcome.

    Dear All,
    I am wondering if it may be possible to apply the 2-stage control function approach (as suggested by Prof. Jeff Wooldridge suggests in #12 in ivpoisson with panel-data fixed effects - Statalist) to examine impact of having a covid-19 infection (binary endogenous variable, fully absorbing i.e. once patient has COVID, they stay in the state of having had COVID i.e. value of 1) on new onset of chronic pain (time to new diagnosis). Here is what my data looks like:

    Code:
    . stset stop, id(grpatidtreat) enter(start) failure(d=1) time0(start)
    
    Survival-time data settings
    
               ID variable: grpatidtreat
             Failure event: d==1
    Observed time interval: (start, stop]
         Enter on or after: time start
         Exit on or before: failure
    
    --------------------------------------------------------------------------
        700,185  total observations
              0  exclusions
    --------------------------------------------------------------------------
        700,185  observations remaining, representing
         26,147  subjects
          4,072  failures in single-failure-per-subject data
        700,185  total analysis time at risk and under observation
                                                    At risk from t =         0
                                         Earliest observed entry t =         0
                                              Last observed exit t =        31
    
    . dataex monthlydate grpatidtreat Female age80plus chf cad CumMonthsSAH nursing_visits deaths_rate newdeaths_rate cum_num_vacpct start pain dead COVID d t0 failti
    > me stop _st _d _t _t0  
    
    ----------------------- copy starting from the next line -----------------------
    
    
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(monthlydate grpatidtreat) byte(Female age80plus) float(chf cad CumMonthsSAH) double nursing_visits float(deaths_rate newdeaths_rate cum_num_vacpct start pain dead COVID d t0 failtime stop) byte(_st _d _t _t0)
    714 1 0 0 0 1         0 108734         0         0         0  0 0 0 0 0 0 31  1 1 0  1  0
    715 1 0 0 0 1         0 116834         0         0         0  1 0 0 0 0 0 31  2 1 0  2  1
    716 1 0 0 0 1         0 120565         0         0         0  2 0 0 0 0 0 31  3 1 0  3  2
    717 1 0 0 0 1         0 115212         0         0         0  3 0 0 0 0 0 31  4 1 0  4  3
    718 1 0 0 0 1         0 112542         0         0         0  4 0 0 0 0 0 31  5 1 0  5  4
    719 1 0 0 0 1         0 125570         0         0         0  5 0 0 0 0 0 31  6 1 0  6  5
    720 1 0 0 0 1         0 143446         0         0         0  6 0 0 0 0 0 31  7 1 0  7  6
    721 1 0 0 0 1         0 130334         0         0         0  7 0 0 0 0 0 31  8 1 0  8  7
    722 1 0 0 0 1         0 103759 .24440205 .24440205         0  8 0 0 0 0 0 31  9 1 0  9  8
    723 1 0 0 0 1         0  78091  5.556073  5.311671         0  9 0 0 0 0 0 31 10 1 0 10  9
    724 1 0 0 0 1  .8333333  82506  12.77408  7.218007         0 10 0 0 0 0 0 31 11 1 0 11 10
    725 1 0 0 0 1  .8333333  86526 16.896328  4.122248         0 11 0 0 0 0 0 31 12 1 0 12 11
    726 1 0 0 0 1  .8333333  87483  21.26298   4.36665         0 12 0 0 0 0 0 31 13 1 0 13 12
    727 1 0 0 0 1  .8333333  88849   26.4443  5.181324         0 13 0 0 0 0 0 31 14 1 0 14 13
    728 1 0 0 0 1  .8333333  89153 35.145016  8.700713         0 14 0 0 0 0 0 31 15 1 0 15 14
    729 1 0 0 0 1  .8333333  92720  50.34682 15.201808         0 15 0 0 0 0 0 31 16 1 0 16 15
    730 1 0 0 0 1  .8333333  85388   64.8643 14.517482         0 16 0 0 0 0 0 31 17 1 0 17 16
    731 1 0 0 0 1  .8333333 103097  95.91966 31.055355   1.28795 17 0 0 0 0 0 31 18 1 0 18 17
    732 1 0 0 0 1  .8333333 104865 116.92194 21.002283  8.336668 18 0 0 0 0 0 31 19 1 0 19 18
    733 1 0 0 0 1  .8333333  96550 136.58817  19.66622  20.84593 19 0 0 0 0 0 31 20 1 0 20 19
    734 1 0 0 0 1  .8333333  97024 146.26648  9.678321   41.1382 20 0 0 0 0 0 31 21 1 0 21 20
    735 1 0 0 0 1  .8333333  99512 150.60054  4.334063  64.93627 21 0 0 0 0 0 31 22 1 0 22 21
    736 1 0 0 0 1  .8333333 127138 157.65562  7.055073  76.24552 22 0 0 0 0 0 31 23 1 0 23 22
    737 1 0 0 0 1  .8333333 125215 161.61493  3.959313  83.28538 23 0 0 0 0 0 31 24 1 0 24 23
    738 1 0 0 0 1  .8333333 127337 167.98567  6.370747  89.34604 24 0 0 0 0 0 31 25 1 0 25 24
    739 1 0 0 0 1  .8333333 131623  183.6437 15.658025    96.881 25 0 0 0 0 0 31 26 1 0 26 25
    740 1 0 0 0 1  .8333333 127392    198.65 15.006286 102.72098 26 0 0 0 0 0 31 27 1 0 27 26
    741 1 0 0 0 1  .8333333 134953 208.18167   9.53168 111.11894 27 0 0 0 0 0 31 28 1 0 28 27
    742 1 0 0 0 1  .8333333 131271 253.08647   44.9048 120.57523 28 0 0 0 0 0 31 29 1 0 29 28
    743 1 0 0 0 1  .8333333 137164 264.32898 11.242495 131.78543 29 0 0 0 0 0 31 30 1 0 30 29
    744 1 0 0 0 1  .8333333      0 283.84854 19.519577 138.48784 30 0 0 0 0 0 31 31 1 0 31 30
    714 2 1 1 0 0         0  23152         0         0         0  0 0 1 0 0 0 14  1 1 0  1  0
    715 2 1 1 0 0         0  23589         0         0         0  1 0 1 0 0 0 14  2 1 0  2  1
    716 2 1 1 0 0         0  24512         0         0         0  2 0 1 0 0 0 14  3 1 0  3  2
    717 2 1 1 0 0         0  24150         0         0         0  3 0 1 0 0 0 14  4 1 0  4  3
    718 2 1 1 0 0         0  23678         0         0         0  4 0 1 0 0 0 14  5 1 0  5  4
    719 2 1 1 0 0         0  25964         0         0         0  5 0 1 0 0 0 14  6 1 0  6  5
    720 2 1 1 0 0         0  27860         0         0         0  6 0 1 0 0 0 14  7 1 0  7  6
    721 2 1 1 0 0         0  25949         0         0         0  7 0 1 0 0 0 14  8 1 0  8  7
    722 2 1 1 0 0         0  19128  .1559596  .1559596         0  8 0 1 0 0 0 14  9 1 0  9  8
    723 2 1 1 0 0 .16666667  14646 1.4348285  1.278869         0  9 0 1 0 0 0 14 10 1 0 10  9
    724 2 1 1 0 0 1.1666666  14962 3.5246875  2.089859         0 10 0 1 0 0 0 14 11 1 0 11 10
    725 2 1 1 0 0 1.1666666  15520  5.365011 1.8403236         0 11 0 1 0 0 0 14 12 1 0 12 11
    726 2 1 1 0 0 1.1666666  15734  9.607113 4.2421017         0 12 0 1 0 0 0 14 13 1 0 13 12
    727 2 1 1 0 0 1.1666666  16123 12.757497 3.1503844         0 13 0 1 0 0 0 14 14 1 0 14 13
    714 3 1 1 1 1         0  53780         0         0         0  0 0 0 0 0 0 31  1 1 0  1  0
    715 3 1 1 1 1         0  54204         0         0         0  1 0 0 0 0 0 31  2 1 0  2  1
    716 3 1 1 1 1         0  56357         0         0         0  2 0 0 0 0 0 31  3 1 0  3  2
    717 3 1 1 1 1         0  53457         0         0         0  3 0 0 0 0 0 31  4 1 0  4  3
    718 3 1 1 1 1         0  51336         0         0         0  4 0 0 0 0 0 31  5 1 0  5  4
    719 3 1 1 1 1         0  60125         0         0         0  5 0 0 0 0 0 31  6 1 0  6  5
    720 3 1 1 1 1         0  70577         0         0         0  6 0 0 0 0 0 31  7 1 0  7  6
    721 3 1 1 1 1         0  61177         0         0         0  7 0 0 0 0 0 31  8 1 0  8  7
    722 3 1 1 0 0         0  44959 1.1981796 1.1981796         0  8 0 0 0 0 0 31  9 1 0  9  8
    723 3 1 1 0 0        .2  32052 13.457814 12.259635         0  9 0 0 0 0 0 31 10 1 0 10  9
    724 3 1 1 0 0       1.2  35025  25.09231 11.634498         0 10 0 0 0 0 0 31 11 1 0 11 10
    725 3 1 1 0 0       1.2  36976  29.34672  4.254406         0 11 0 0 0 0 0 31 12 1 0 12 11
    726 3 1 1 0 0       1.2  38088  31.96882  2.622103         0 12 0 0 0 0 0 31 13 1 0 13 12
    727 3 1 1 0 0       1.2  37560 33.896328 1.9275063         0 13 0 0 0 0 0 31 14 1 0 14 13
    728 3 1 1 0 0       1.2  37355 35.789104 1.8927765         0 14 0 0 0 0 0 31 15 1 0 15 14
    729 3 1 1 0 0       1.2  38919   40.0956 4.3065004         0 15 0 0 0 0 0 31 16 1 0 16 15
    730 3 1 1 0 0       1.2  34855  53.13666 13.041057         0 16 0 0 0 0 0 31 17 1 0 17 16
    731 3 1 1 0 0       1.2  41124  84.72345  31.58679  1.435975 17 0 0 0 0 0 31 18 1 0 18 17
    732 3 1 1 0 0       1.2  38249  99.17107 14.447615  9.884704 18 0 0 0 0 0 31 19 1 0 19 18
    733 3 1 1 0 0       1.2  33831 104.97095  5.799884  23.81733 19 0 0 0 0 0 31 20 1 0 20 19
    734 3 1 1 0 0       1.2  33492  107.2284 2.2574399  46.73331 20 0 0 0 0 0 31 21 1 0 21 20
    735 3 1 1 0 0       1.2  32725 111.69118  4.462785  77.14936 21 0 0 0 0 0 31 22 1 0 22 21
    736 3 1 1 0 0       1.2  41687 116.74437  5.053192  97.47453 22 0 0 0 0 0 31 23 1 0 23 22
    737 3 1 1 0 0       1.2  41846  120.6341 3.8897424 107.47088 23 0 0 0 0 0 31 24 1 0 24 23
    738 3 1 1 0 0       1.2  43258 123.23885  2.604738 112.40738 24 0 0 0 0 0 31 25 1 0 25 24
    739 3 1 1 0 0       1.2  44937 127.09386  3.855013  118.4889 25 0 0 0 0 0 31 26 1 0 26 25
    740 3 1 1 0 0       1.2  43021 134.24821  7.154348 124.62994 26 0 0 0 0 0 31 27 1 0 27 26
    741 3 1 1 0 0       1.2  44950 145.44858 11.200375 134.55649 27 0 0 0 0 0 31 28 1 0 28 27
    742 3 1 1 0 0       1.2  43845 162.93506 17.486477  149.5779 28 0 0 0 0 0 31 29 1 0 29 28
    743 3 1 1 0 0       1.2  43655 181.98438 19.049318 165.26677 29 0 0 0 0 0 31 30 1 0 30 29
    744 3 1 1 0 0       1.2      0 196.90085 14.916468 174.72752 30 0 0 0 0 0 31 31 1 0 31 30
    714 4 1 0 0 0         0 108734         0         0         0  0 0 1 0 0 0 29  1 1 0  1  0
    715 4 1 0 0 0         0 116834         0         0         0  1 0 1 0 0 0 29  2 1 0  2  1
    716 4 1 0 0 0         0 120565         0         0         0  2 0 1 0 0 0 29  3 1 0  3  2
    717 4 1 0 0 0         0 115212         0         0         0  3 0 1 0 0 0 29  4 1 0  4  3
    718 4 1 0 0 0         0 112542         0         0         0  4 0 1 0 0 0 29  5 1 0  5  4
    719 4 1 0 0 0         0 125570         0         0         0  5 0 1 0 0 0 29  6 1 0  6  5
    720 4 1 0 0 0         0 143446         0         0         0  6 0 1 0 0 0 29  7 1 0  7  6
    721 4 1 0 0 0         0 130334         0         0         0  7 0 1 0 0 0 29  8 1 0  8  7
    722 4 1 0 0 0         0 103759 .24440205 .24440205         0  8 0 1 0 0 0 29  9 1 0  9  8
    723 4 1 0 0 0         0  78091  5.556073  5.311671         0  9 0 1 0 0 0 29 10 1 0 10  9
    724 4 1 0 0 0  .8333333  82506  12.77408  7.218007         0 10 0 1 0 0 0 29 11 1 0 11 10
    725 4 1 0 0 0  .8333333  86526 16.896328  4.122248         0 11 0 1 0 0 0 29 12 1 0 12 11
    726 4 1 0 0 0  .8333333  87483  21.26298   4.36665         0 12 0 1 0 0 0 29 13 1 0 13 12
    727 4 1 0 0 0  .8333333  88849   26.4443  5.181324         0 13 0 1 0 0 0 29 14 1 0 14 13
    728 4 1 0 0 0  .8333333  89153 35.145016  8.700713         0 14 0 1 0 0 0 29 15 1 0 15 14
    729 4 1 0 0 0  .8333333  92720  50.34682 15.201808         0 15 0 1 0 0 0 29 16 1 0 16 15
    730 4 1 0 0 0  .8333333  85388   64.8643 14.517482         0 16 0 1 0 0 0 29 17 1 0 17 16
    731 4 1 0 0 0  .8333333 103097  95.91966 31.055355   1.28795 17 0 1 0 0 0 29 18 1 0 18 17
    732 4 1 0 0 0  .8333333 104865 116.92194 21.002283  8.336668 18 0 1 0 0 0 29 19 1 0 19 18
    733 4 1 0 0 0  .8333333  96550 136.58817  19.66622  20.84593 19 0 1 0 0 0 29 20 1 0 20 19
    734 4 1 0 0 0  .8333333  97024 146.26648  9.678321   41.1382 20 0 1 0 0 0 29 21 1 0 21 20
    735 4 1 0 0 0  .8333333  99512 150.60054  4.334063  64.93627 21 0 1 0 0 0 29 22 1 0 22 21
    736 4 1 0 0 0  .8333333 127138 157.65562  7.055073  76.24552 22 0 1 0 0 0 29 23 1 0 23 22
    737 4 1 0 0 0  .8333333 125215 161.61493  3.959313  83.28538 23 0 1 0 0 0 29 24 1 0 24 23
    end
    format %tm monthlydate
    ------------------ copy up to and including the previous line ------------------
    Next to run the control function approach:

    Code:
    . ***** FIRST STAGE
    .
    . reghdfe COVID CumMonthsSAH nursing_visits deaths_rate newdeaths_rate cum_num_vacpct, absorb(grpatidtreat monthlydate state, save) cluster(grpatidtreat monthlyda
    > te) residuals(resid)
    (dropped 409 singleton observations)
    (MWFE estimator converged in 6 iterations)
    Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
    
    HDFE Linear regression                            Number of obs   =    699,776
    Absorbing 3 HDFE groups                           F(   5,     30) =      18.97
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.4414
                                                      Adj R-squared   =     0.4200
    Number of clusters (grpatidtreat) =     25,738    Within R-sq.    =     0.0021
    Number of clusters (monthlydate) =         31     Root MSE        =     0.1216
    
                    (Std. err. adjusted for 31 clusters in grpatidtreat monthlydate)
    --------------------------------------------------------------------------------
                   |               Robust
             COVID | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
      CumMonthsSAH |  -.0023079   .0019432    -1.19   0.244    -.0062764    .0016606
    nursing_visits |   1.16e-08   9.70e-09     1.19   0.241    -8.22e-09    3.14e-08
       deaths_rate |   .0002108   .0000232     9.10   0.000     .0001634    .0002581
    newdeaths_rate |  -.0001266   .0000565    -2.24   0.033     -.000242   -.0000111
    cum_num_vacpct |  -.0000918   .0000713    -1.29   0.208    -.0002375    .0000539
             _cons |     .01118   .0037694     2.97   0.006     .0034818    .0188782
    --------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ------------------------------------------------------+
      Absorbed FE | Categories  - Redundant  = Num. Coefs |
    --------------+---------------------------------------|
     grpatidtreat |     25738       25738           0    *|
      monthlydate |        31          31           0    *|
            state |        51           1          50     |
    ------------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    
    
    .                         predict double COVID_fe, r
    (409 missing values generated)
    
    . streg  Female age80plus Asian Black Hispanic chf cad cancer copd dm stroke htn oarth liver renal depress COVID COVID_fe, distribution(weibull)
    
             Failure _d: d==1
       Analysis time _t: stop
      Enter on or after: time start
            ID variable: grpatidtreat
    
    Fitting constant-only model:
    Iteration 0:  Log likelihood = -14499.714
    Iteration 1:  Log likelihood = -14499.673
    Iteration 2:  Log likelihood = -14499.673
    
    Fitting full model:
    Iteration 0:  Log likelihood = -14499.673  
    Iteration 1:  Log likelihood = -14192.215  
    Iteration 2:  Log likelihood = -13608.612  
    Iteration 3:  Log likelihood = -13586.715  
    Iteration 4:  Log likelihood = -13586.538  
    Iteration 5:  Log likelihood = -13586.538  
    
    Weibull PH regression
    
    No. of subjects =  25,738                              Number of obs = 699,776
    No. of failures =   3,746
    Time at risk    = 699,776
                                                           LR chi2(18)   = 1826.27
    Log likelihood = -13586.538                            Prob > chi2   =  0.0000
    
    ------------------------------------------------------------------------------
              _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          Female |   1.050703   .0362327     1.43   0.151     .9820351    1.124173
       age80plus |    3.12706   .1056879    33.73   0.000     2.926627    3.341219
           Asian |   .9366176   .0895084    -0.69   0.493     .7766347    1.129556
           Black |   1.121095    .061301     2.09   0.037     1.007161    1.247917
        Hispanic |   1.142516   .0581472     2.62   0.009      1.03405    1.262361
             chf |   1.239777   .0676932     3.94   0.000     1.113953    1.379813
             cad |   1.144291    .052385     2.94   0.003      1.04609    1.251711
          cancer |   .8817859   .0374588    -2.96   0.003     .8113413    .9583468
            copd |   1.243518   .0614885     4.41   0.000     1.128659    1.370067
              dm |   1.140213   .0422912     3.54   0.000     1.060265    1.226189
          stroke |   2.054509   .1585987     9.33   0.000     1.766035    2.390105
             htn |    1.03796   .0394549     0.98   0.327     .9634406    1.118244
           oarth |   1.081008    .045519     1.85   0.064     .9953747    1.174009
           liver |   1.246264   .1031747     2.66   0.008     1.059599    1.465814
           renal |   1.150626   .0515212     3.13   0.002     1.053951    1.256169
         depress |   1.922831   .0948028    13.26   0.000     1.745716    2.117915
           COVID |   .7349001     .12362    -1.83   0.067     .5284999    1.021908
        COVID_fe |   2.381679   .5697129     3.63   0.000     1.490289     3.80624
           _cons |   .0016569   .0001267   -83.74   0.000     .0014263    .0019247
    -------------+----------------------------------------------------------------
           /ln_p |    .127481   .0162371     7.85   0.000     .0956569    .1593051
    -------------+----------------------------------------------------------------
               p |   1.135963   .0184447                      1.100381    1.172696
             1/p |   .8803101   .0142937                      .8527361    .9087758
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline hazard.
    I have the following questions:

    1) Can I use a linear approximation (as I did above using reghdfe) for the first stage? If not, what could I use? Ideally I would apply a logit but not sure whether using the residual from that is feasible.
    2) Probably I need to bootstrap the errors from the second stage, and I can do that, but first need to confirm whether this procedure is even correct.
    2) Ideally I would like to control for the multiple observations per patient in the streg using the frailty shared(grpatidtreat) options. But, as shown below it does not converge. What are other options to control for multiple observations per individual?

    Code:
    . streg  Female age80plus Asian Black Hispanic chf cad cancer copd dm stroke htn oarth liver renal depress COVID_fe, distribution(weibull) frailty(gamma) shared(grpatidtreat)
    
             Failure _d: d==1
       Analysis time _t: stop
      Enter on or after: time start
            ID variable: grpatidtreat
    
    Fitting Weibull model ...
    
    Fitting constant-only model:
    Iteration 0:  Log likelihood = -19174.658  
    Iteration 1:  Log likelihood = -17394.015  
    Iteration 2:  Log likelihood = -14742.789  
    Iteration 3:  Log likelihood = -14520.083  
    Iteration 4:  Log likelihood = -14499.723  
    Iteration 5:  Log likelihood = -14499.673  
    Iteration 6:  Log likelihood = -14499.673  (not concave)
    Iteration 7:  Log likelihood = -14499.673  (not concave)
    Iteration 8:  Log likelihood = -14499.673  (not concave)
    Iteration 9:  Log likelihood = -14499.673  (not concave)
    Iteration 10: Log likelihood = -14499.673  (not concave)
    Iteration 11: Log likelihood = -14499.673  (not concave)
    Iteration 12: Log likelihood = -14499.673  (not concave)
    Iteration 13: Log likelihood = -14499.673  (not concave)
    Iteration 14: Log likelihood = -14499.673  (not concave)
    Iteration 15: Log likelihood = -14499.673  (not concave)
    Iteration 16: Log likelihood = -14499.673  (not concave)

    3) The hazard of COVID is .7349001 (although not statistically significant at 5% level). But does the hazard imply that having COVID reduces your hazard to about 73%? How should I interpret the (statistically significant) hazard ratio on COVID_fe (2.381679) the residual from the first stage?

    Any guidance you may be able to offer will be superhelpful. Thank you in advance for your time and help.
    Sincerely,
    Sumedha



  • #2
    The short answer is yes, but you'd have to do some work. The answers to your questions are in bold below. For sources and more details, read the help file for my command, ivcloglog.

    When the second stage is linear, you can freely use a linear first stage. That's a classic 2SLS property, Unfortunately, this is no longer true when the second stage is nonlinear. With a binary endogenous variable that is the dependent variable of the first stage, you have to use a nonlinear first stage.

    My command ivcloglog provides an easy and simple way to instrument a discrete-data proportional hazards model (namely, the Prentice and Gloeckler [1978] model) with the control function approach. It provides you with the appropriate standard errors, corrected for the first stage. However, it currently only supports a linear first stage. Though, you consider modifying ivcloglog to do a nonlinear first stage. If I have the time, I could also help you with this – it seems like it would be a simple change.

    By the way, you probably have grouped-time data rather than continuous-time data/a good approximation to continuous-time data. That is, your data will probably be grouped by week, month, etc. In that case, a discrete-data model is more appropriate. Rather than the continuous-time Weibull model, you should consider using a discrete-data model like the Prentice and Gloeckler (1978) model. (This is a flexibly parametric, discrete-data model that's an analog of the semi-parametric, continuous-time Cox model.)

    You can experiment with different distributions for the frailty term to see what works

    As for your last question, the answer is irrelevant because your preliminary results are wrong since you need a nonlinear first stage for them to be valid.

    Comment

    Working...
    X