Dear All,
I am wondering if it may be possible to apply the 2-stage control function approach (as suggested by Prof. Jeff Wooldridge suggests in #12 in ivpoisson with panel-data fixed effects - Statalist) to examine impact of having a covid-19 infection (binary endogenous variable, fully absorbing i.e. once patient has COVID, they stay in the state of having had COVID i.e. value of 1) on new onset of chronic pain (time to new diagnosis). Here is what my data looks like:
Next to run the control function approach:
I have the following questions:
1) Can I use a linear approximation (as I did above using reghdfe) for the first stage? If not, what could I use? Ideally I would apply a logit but not sure whether using the residual from that is feasible.
2) Probably I need to bootstrap the errors from the second stage, and I can do that, but first need to confirm whether this procedure is even correct.
2) Ideally I would like to control for the multiple observations per patient in the streg using the frailty shared(grpatidtreat) options. But, as shown below it does not converge. What are other options to control for multiple observations per individual?
3) The hazard of COVID is .7349001 (although not statistically significant at 5% level). But does the hazard imply that having COVID reduces your hazard to about 73%? How should I interpret the (statistically significant) hazard ratio on COVID_fe (2.381679) the residual from the first stage?
Any guidance you may be able to offer will be superhelpful. Thank you in advance for your time and help.
Sincerely,
Sumedha
I am wondering if it may be possible to apply the 2-stage control function approach (as suggested by Prof. Jeff Wooldridge suggests in #12 in ivpoisson with panel-data fixed effects - Statalist) to examine impact of having a covid-19 infection (binary endogenous variable, fully absorbing i.e. once patient has COVID, they stay in the state of having had COVID i.e. value of 1) on new onset of chronic pain (time to new diagnosis). Here is what my data looks like:
Code:
. stset stop, id(grpatidtreat) enter(start) failure(d=1) time0(start) Survival-time data settings ID variable: grpatidtreat Failure event: d==1 Observed time interval: (start, stop] Enter on or after: time start Exit on or before: failure -------------------------------------------------------------------------- 700,185 total observations 0 exclusions -------------------------------------------------------------------------- 700,185 observations remaining, representing 26,147 subjects 4,072 failures in single-failure-per-subject data 700,185 total analysis time at risk and under observation At risk from t = 0 Earliest observed entry t = 0 Last observed exit t = 31 . dataex monthlydate grpatidtreat Female age80plus chf cad CumMonthsSAH nursing_visits deaths_rate newdeaths_rate cum_num_vacpct start pain dead COVID d t0 failti > me stop _st _d _t _t0 ----------------------- copy starting from the next line ----------------------------------------- copy up to and including the previous line ------------------Code:* Example generated by -dataex-. For more info, type help dataex clear input float(monthlydate grpatidtreat) byte(Female age80plus) float(chf cad CumMonthsSAH) double nursing_visits float(deaths_rate newdeaths_rate cum_num_vacpct start pain dead COVID d t0 failtime stop) byte(_st _d _t _t0) 714 1 0 0 0 1 0 108734 0 0 0 0 0 0 0 0 0 31 1 1 0 1 0 715 1 0 0 0 1 0 116834 0 0 0 1 0 0 0 0 0 31 2 1 0 2 1 716 1 0 0 0 1 0 120565 0 0 0 2 0 0 0 0 0 31 3 1 0 3 2 717 1 0 0 0 1 0 115212 0 0 0 3 0 0 0 0 0 31 4 1 0 4 3 718 1 0 0 0 1 0 112542 0 0 0 4 0 0 0 0 0 31 5 1 0 5 4 719 1 0 0 0 1 0 125570 0 0 0 5 0 0 0 0 0 31 6 1 0 6 5 720 1 0 0 0 1 0 143446 0 0 0 6 0 0 0 0 0 31 7 1 0 7 6 721 1 0 0 0 1 0 130334 0 0 0 7 0 0 0 0 0 31 8 1 0 8 7 722 1 0 0 0 1 0 103759 .24440205 .24440205 0 8 0 0 0 0 0 31 9 1 0 9 8 723 1 0 0 0 1 0 78091 5.556073 5.311671 0 9 0 0 0 0 0 31 10 1 0 10 9 724 1 0 0 0 1 .8333333 82506 12.77408 7.218007 0 10 0 0 0 0 0 31 11 1 0 11 10 725 1 0 0 0 1 .8333333 86526 16.896328 4.122248 0 11 0 0 0 0 0 31 12 1 0 12 11 726 1 0 0 0 1 .8333333 87483 21.26298 4.36665 0 12 0 0 0 0 0 31 13 1 0 13 12 727 1 0 0 0 1 .8333333 88849 26.4443 5.181324 0 13 0 0 0 0 0 31 14 1 0 14 13 728 1 0 0 0 1 .8333333 89153 35.145016 8.700713 0 14 0 0 0 0 0 31 15 1 0 15 14 729 1 0 0 0 1 .8333333 92720 50.34682 15.201808 0 15 0 0 0 0 0 31 16 1 0 16 15 730 1 0 0 0 1 .8333333 85388 64.8643 14.517482 0 16 0 0 0 0 0 31 17 1 0 17 16 731 1 0 0 0 1 .8333333 103097 95.91966 31.055355 1.28795 17 0 0 0 0 0 31 18 1 0 18 17 732 1 0 0 0 1 .8333333 104865 116.92194 21.002283 8.336668 18 0 0 0 0 0 31 19 1 0 19 18 733 1 0 0 0 1 .8333333 96550 136.58817 19.66622 20.84593 19 0 0 0 0 0 31 20 1 0 20 19 734 1 0 0 0 1 .8333333 97024 146.26648 9.678321 41.1382 20 0 0 0 0 0 31 21 1 0 21 20 735 1 0 0 0 1 .8333333 99512 150.60054 4.334063 64.93627 21 0 0 0 0 0 31 22 1 0 22 21 736 1 0 0 0 1 .8333333 127138 157.65562 7.055073 76.24552 22 0 0 0 0 0 31 23 1 0 23 22 737 1 0 0 0 1 .8333333 125215 161.61493 3.959313 83.28538 23 0 0 0 0 0 31 24 1 0 24 23 738 1 0 0 0 1 .8333333 127337 167.98567 6.370747 89.34604 24 0 0 0 0 0 31 25 1 0 25 24 739 1 0 0 0 1 .8333333 131623 183.6437 15.658025 96.881 25 0 0 0 0 0 31 26 1 0 26 25 740 1 0 0 0 1 .8333333 127392 198.65 15.006286 102.72098 26 0 0 0 0 0 31 27 1 0 27 26 741 1 0 0 0 1 .8333333 134953 208.18167 9.53168 111.11894 27 0 0 0 0 0 31 28 1 0 28 27 742 1 0 0 0 1 .8333333 131271 253.08647 44.9048 120.57523 28 0 0 0 0 0 31 29 1 0 29 28 743 1 0 0 0 1 .8333333 137164 264.32898 11.242495 131.78543 29 0 0 0 0 0 31 30 1 0 30 29 744 1 0 0 0 1 .8333333 0 283.84854 19.519577 138.48784 30 0 0 0 0 0 31 31 1 0 31 30 714 2 1 1 0 0 0 23152 0 0 0 0 0 1 0 0 0 14 1 1 0 1 0 715 2 1 1 0 0 0 23589 0 0 0 1 0 1 0 0 0 14 2 1 0 2 1 716 2 1 1 0 0 0 24512 0 0 0 2 0 1 0 0 0 14 3 1 0 3 2 717 2 1 1 0 0 0 24150 0 0 0 3 0 1 0 0 0 14 4 1 0 4 3 718 2 1 1 0 0 0 23678 0 0 0 4 0 1 0 0 0 14 5 1 0 5 4 719 2 1 1 0 0 0 25964 0 0 0 5 0 1 0 0 0 14 6 1 0 6 5 720 2 1 1 0 0 0 27860 0 0 0 6 0 1 0 0 0 14 7 1 0 7 6 721 2 1 1 0 0 0 25949 0 0 0 7 0 1 0 0 0 14 8 1 0 8 7 722 2 1 1 0 0 0 19128 .1559596 .1559596 0 8 0 1 0 0 0 14 9 1 0 9 8 723 2 1 1 0 0 .16666667 14646 1.4348285 1.278869 0 9 0 1 0 0 0 14 10 1 0 10 9 724 2 1 1 0 0 1.1666666 14962 3.5246875 2.089859 0 10 0 1 0 0 0 14 11 1 0 11 10 725 2 1 1 0 0 1.1666666 15520 5.365011 1.8403236 0 11 0 1 0 0 0 14 12 1 0 12 11 726 2 1 1 0 0 1.1666666 15734 9.607113 4.2421017 0 12 0 1 0 0 0 14 13 1 0 13 12 727 2 1 1 0 0 1.1666666 16123 12.757497 3.1503844 0 13 0 1 0 0 0 14 14 1 0 14 13 714 3 1 1 1 1 0 53780 0 0 0 0 0 0 0 0 0 31 1 1 0 1 0 715 3 1 1 1 1 0 54204 0 0 0 1 0 0 0 0 0 31 2 1 0 2 1 716 3 1 1 1 1 0 56357 0 0 0 2 0 0 0 0 0 31 3 1 0 3 2 717 3 1 1 1 1 0 53457 0 0 0 3 0 0 0 0 0 31 4 1 0 4 3 718 3 1 1 1 1 0 51336 0 0 0 4 0 0 0 0 0 31 5 1 0 5 4 719 3 1 1 1 1 0 60125 0 0 0 5 0 0 0 0 0 31 6 1 0 6 5 720 3 1 1 1 1 0 70577 0 0 0 6 0 0 0 0 0 31 7 1 0 7 6 721 3 1 1 1 1 0 61177 0 0 0 7 0 0 0 0 0 31 8 1 0 8 7 722 3 1 1 0 0 0 44959 1.1981796 1.1981796 0 8 0 0 0 0 0 31 9 1 0 9 8 723 3 1 1 0 0 .2 32052 13.457814 12.259635 0 9 0 0 0 0 0 31 10 1 0 10 9 724 3 1 1 0 0 1.2 35025 25.09231 11.634498 0 10 0 0 0 0 0 31 11 1 0 11 10 725 3 1 1 0 0 1.2 36976 29.34672 4.254406 0 11 0 0 0 0 0 31 12 1 0 12 11 726 3 1 1 0 0 1.2 38088 31.96882 2.622103 0 12 0 0 0 0 0 31 13 1 0 13 12 727 3 1 1 0 0 1.2 37560 33.896328 1.9275063 0 13 0 0 0 0 0 31 14 1 0 14 13 728 3 1 1 0 0 1.2 37355 35.789104 1.8927765 0 14 0 0 0 0 0 31 15 1 0 15 14 729 3 1 1 0 0 1.2 38919 40.0956 4.3065004 0 15 0 0 0 0 0 31 16 1 0 16 15 730 3 1 1 0 0 1.2 34855 53.13666 13.041057 0 16 0 0 0 0 0 31 17 1 0 17 16 731 3 1 1 0 0 1.2 41124 84.72345 31.58679 1.435975 17 0 0 0 0 0 31 18 1 0 18 17 732 3 1 1 0 0 1.2 38249 99.17107 14.447615 9.884704 18 0 0 0 0 0 31 19 1 0 19 18 733 3 1 1 0 0 1.2 33831 104.97095 5.799884 23.81733 19 0 0 0 0 0 31 20 1 0 20 19 734 3 1 1 0 0 1.2 33492 107.2284 2.2574399 46.73331 20 0 0 0 0 0 31 21 1 0 21 20 735 3 1 1 0 0 1.2 32725 111.69118 4.462785 77.14936 21 0 0 0 0 0 31 22 1 0 22 21 736 3 1 1 0 0 1.2 41687 116.74437 5.053192 97.47453 22 0 0 0 0 0 31 23 1 0 23 22 737 3 1 1 0 0 1.2 41846 120.6341 3.8897424 107.47088 23 0 0 0 0 0 31 24 1 0 24 23 738 3 1 1 0 0 1.2 43258 123.23885 2.604738 112.40738 24 0 0 0 0 0 31 25 1 0 25 24 739 3 1 1 0 0 1.2 44937 127.09386 3.855013 118.4889 25 0 0 0 0 0 31 26 1 0 26 25 740 3 1 1 0 0 1.2 43021 134.24821 7.154348 124.62994 26 0 0 0 0 0 31 27 1 0 27 26 741 3 1 1 0 0 1.2 44950 145.44858 11.200375 134.55649 27 0 0 0 0 0 31 28 1 0 28 27 742 3 1 1 0 0 1.2 43845 162.93506 17.486477 149.5779 28 0 0 0 0 0 31 29 1 0 29 28 743 3 1 1 0 0 1.2 43655 181.98438 19.049318 165.26677 29 0 0 0 0 0 31 30 1 0 30 29 744 3 1 1 0 0 1.2 0 196.90085 14.916468 174.72752 30 0 0 0 0 0 31 31 1 0 31 30 714 4 1 0 0 0 0 108734 0 0 0 0 0 1 0 0 0 29 1 1 0 1 0 715 4 1 0 0 0 0 116834 0 0 0 1 0 1 0 0 0 29 2 1 0 2 1 716 4 1 0 0 0 0 120565 0 0 0 2 0 1 0 0 0 29 3 1 0 3 2 717 4 1 0 0 0 0 115212 0 0 0 3 0 1 0 0 0 29 4 1 0 4 3 718 4 1 0 0 0 0 112542 0 0 0 4 0 1 0 0 0 29 5 1 0 5 4 719 4 1 0 0 0 0 125570 0 0 0 5 0 1 0 0 0 29 6 1 0 6 5 720 4 1 0 0 0 0 143446 0 0 0 6 0 1 0 0 0 29 7 1 0 7 6 721 4 1 0 0 0 0 130334 0 0 0 7 0 1 0 0 0 29 8 1 0 8 7 722 4 1 0 0 0 0 103759 .24440205 .24440205 0 8 0 1 0 0 0 29 9 1 0 9 8 723 4 1 0 0 0 0 78091 5.556073 5.311671 0 9 0 1 0 0 0 29 10 1 0 10 9 724 4 1 0 0 0 .8333333 82506 12.77408 7.218007 0 10 0 1 0 0 0 29 11 1 0 11 10 725 4 1 0 0 0 .8333333 86526 16.896328 4.122248 0 11 0 1 0 0 0 29 12 1 0 12 11 726 4 1 0 0 0 .8333333 87483 21.26298 4.36665 0 12 0 1 0 0 0 29 13 1 0 13 12 727 4 1 0 0 0 .8333333 88849 26.4443 5.181324 0 13 0 1 0 0 0 29 14 1 0 14 13 728 4 1 0 0 0 .8333333 89153 35.145016 8.700713 0 14 0 1 0 0 0 29 15 1 0 15 14 729 4 1 0 0 0 .8333333 92720 50.34682 15.201808 0 15 0 1 0 0 0 29 16 1 0 16 15 730 4 1 0 0 0 .8333333 85388 64.8643 14.517482 0 16 0 1 0 0 0 29 17 1 0 17 16 731 4 1 0 0 0 .8333333 103097 95.91966 31.055355 1.28795 17 0 1 0 0 0 29 18 1 0 18 17 732 4 1 0 0 0 .8333333 104865 116.92194 21.002283 8.336668 18 0 1 0 0 0 29 19 1 0 19 18 733 4 1 0 0 0 .8333333 96550 136.58817 19.66622 20.84593 19 0 1 0 0 0 29 20 1 0 20 19 734 4 1 0 0 0 .8333333 97024 146.26648 9.678321 41.1382 20 0 1 0 0 0 29 21 1 0 21 20 735 4 1 0 0 0 .8333333 99512 150.60054 4.334063 64.93627 21 0 1 0 0 0 29 22 1 0 22 21 736 4 1 0 0 0 .8333333 127138 157.65562 7.055073 76.24552 22 0 1 0 0 0 29 23 1 0 23 22 737 4 1 0 0 0 .8333333 125215 161.61493 3.959313 83.28538 23 0 1 0 0 0 29 24 1 0 24 23 end format %tm monthlydate
Code:
. ***** FIRST STAGE . . reghdfe COVID CumMonthsSAH nursing_visits deaths_rate newdeaths_rate cum_num_vacpct, absorb(grpatidtreat monthlydate state, save) cluster(grpatidtreat monthlyda > te) residuals(resid) (dropped 409 singleton observations) (MWFE estimator converged in 6 iterations) Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied. HDFE Linear regression Number of obs = 699,776 Absorbing 3 HDFE groups F( 5, 30) = 18.97 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.4414 Adj R-squared = 0.4200 Number of clusters (grpatidtreat) = 25,738 Within R-sq. = 0.0021 Number of clusters (monthlydate) = 31 Root MSE = 0.1216 (Std. err. adjusted for 31 clusters in grpatidtreat monthlydate) -------------------------------------------------------------------------------- | Robust COVID | Coefficient std. err. t P>|t| [95% conf. interval] ---------------+---------------------------------------------------------------- CumMonthsSAH | -.0023079 .0019432 -1.19 0.244 -.0062764 .0016606 nursing_visits | 1.16e-08 9.70e-09 1.19 0.241 -8.22e-09 3.14e-08 deaths_rate | .0002108 .0000232 9.10 0.000 .0001634 .0002581 newdeaths_rate | -.0001266 .0000565 -2.24 0.033 -.000242 -.0000111 cum_num_vacpct | -.0000918 .0000713 -1.29 0.208 -.0002375 .0000539 _cons | .01118 .0037694 2.97 0.006 .0034818 .0188782 -------------------------------------------------------------------------------- Absorbed degrees of freedom: ------------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | --------------+---------------------------------------| grpatidtreat | 25738 25738 0 *| monthlydate | 31 31 0 *| state | 51 1 50 | ------------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation . predict double COVID_fe, r (409 missing values generated) . streg Female age80plus Asian Black Hispanic chf cad cancer copd dm stroke htn oarth liver renal depress COVID COVID_fe, distribution(weibull) Failure _d: d==1 Analysis time _t: stop Enter on or after: time start ID variable: grpatidtreat Fitting constant-only model: Iteration 0: Log likelihood = -14499.714 Iteration 1: Log likelihood = -14499.673 Iteration 2: Log likelihood = -14499.673 Fitting full model: Iteration 0: Log likelihood = -14499.673 Iteration 1: Log likelihood = -14192.215 Iteration 2: Log likelihood = -13608.612 Iteration 3: Log likelihood = -13586.715 Iteration 4: Log likelihood = -13586.538 Iteration 5: Log likelihood = -13586.538 Weibull PH regression No. of subjects = 25,738 Number of obs = 699,776 No. of failures = 3,746 Time at risk = 699,776 LR chi2(18) = 1826.27 Log likelihood = -13586.538 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Haz. ratio Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Female | 1.050703 .0362327 1.43 0.151 .9820351 1.124173 age80plus | 3.12706 .1056879 33.73 0.000 2.926627 3.341219 Asian | .9366176 .0895084 -0.69 0.493 .7766347 1.129556 Black | 1.121095 .061301 2.09 0.037 1.007161 1.247917 Hispanic | 1.142516 .0581472 2.62 0.009 1.03405 1.262361 chf | 1.239777 .0676932 3.94 0.000 1.113953 1.379813 cad | 1.144291 .052385 2.94 0.003 1.04609 1.251711 cancer | .8817859 .0374588 -2.96 0.003 .8113413 .9583468 copd | 1.243518 .0614885 4.41 0.000 1.128659 1.370067 dm | 1.140213 .0422912 3.54 0.000 1.060265 1.226189 stroke | 2.054509 .1585987 9.33 0.000 1.766035 2.390105 htn | 1.03796 .0394549 0.98 0.327 .9634406 1.118244 oarth | 1.081008 .045519 1.85 0.064 .9953747 1.174009 liver | 1.246264 .1031747 2.66 0.008 1.059599 1.465814 renal | 1.150626 .0515212 3.13 0.002 1.053951 1.256169 depress | 1.922831 .0948028 13.26 0.000 1.745716 2.117915 COVID | .7349001 .12362 -1.83 0.067 .5284999 1.021908 COVID_fe | 2.381679 .5697129 3.63 0.000 1.490289 3.80624 _cons | .0016569 .0001267 -83.74 0.000 .0014263 .0019247 -------------+---------------------------------------------------------------- /ln_p | .127481 .0162371 7.85 0.000 .0956569 .1593051 -------------+---------------------------------------------------------------- p | 1.135963 .0184447 1.100381 1.172696 1/p | .8803101 .0142937 .8527361 .9087758 ------------------------------------------------------------------------------ Note: _cons estimates baseline hazard.
1) Can I use a linear approximation (as I did above using reghdfe) for the first stage? If not, what could I use? Ideally I would apply a logit but not sure whether using the residual from that is feasible.
2) Probably I need to bootstrap the errors from the second stage, and I can do that, but first need to confirm whether this procedure is even correct.
2) Ideally I would like to control for the multiple observations per patient in the streg using the frailty shared(grpatidtreat) options. But, as shown below it does not converge. What are other options to control for multiple observations per individual?
Code:
. streg Female age80plus Asian Black Hispanic chf cad cancer copd dm stroke htn oarth liver renal depress COVID_fe, distribution(weibull) frailty(gamma) shared(grpatidtreat) Failure _d: d==1 Analysis time _t: stop Enter on or after: time start ID variable: grpatidtreat Fitting Weibull model ... Fitting constant-only model: Iteration 0: Log likelihood = -19174.658 Iteration 1: Log likelihood = -17394.015 Iteration 2: Log likelihood = -14742.789 Iteration 3: Log likelihood = -14520.083 Iteration 4: Log likelihood = -14499.723 Iteration 5: Log likelihood = -14499.673 Iteration 6: Log likelihood = -14499.673 (not concave) Iteration 7: Log likelihood = -14499.673 (not concave) Iteration 8: Log likelihood = -14499.673 (not concave) Iteration 9: Log likelihood = -14499.673 (not concave) Iteration 10: Log likelihood = -14499.673 (not concave) Iteration 11: Log likelihood = -14499.673 (not concave) Iteration 12: Log likelihood = -14499.673 (not concave) Iteration 13: Log likelihood = -14499.673 (not concave) Iteration 14: Log likelihood = -14499.673 (not concave) Iteration 15: Log likelihood = -14499.673 (not concave) Iteration 16: Log likelihood = -14499.673 (not concave)
3) The hazard of COVID is .7349001 (although not statistically significant at 5% level). But does the hazard imply that having COVID reduces your hazard to about 73%? How should I interpret the (statistically significant) hazard ratio on COVID_fe (2.381679) the residual from the first stage?
Any guidance you may be able to offer will be superhelpful. Thank you in advance for your time and help.
Sincerely,
Sumedha
Comment