Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental variable in Cox regression

    Hi all,

    Does anyone know how to incorporate an instrumental variable in stcox?
    I only know the ivregress command which does not take survival time into account.

    Thanks a lot,

    Marissa

  • #2
    Marissa:
    welcome to the list.
    The following Stata thread might be useful: http://www.stata.com/statalist/archi.../msg01250.html
    Besides, the web provides some interesting entries, as you can see by googling with the string -instrumental variable cox regression-.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hi Marissa,

      I, and many others, have been struggling with the same problem. If time is discrete, there is a very easy solution. Even if it is continuous, you can still solve the problem (see http://www.stata.com/statalist/archive/2013-11/msg00841.html). If you have competing risks, I believe you can use the -stcrprep- package to generate weights.

      The solution is to estimate your model as a Poisson regression.

      First show that Poisson regression is the same as Cox regression. Suppose time is discrete, you model a single risk and your data are in long format, i.e. one row per subject-time:
      1) -stset- your data.
      2a) -stcox covariate1 covariate2, nohr-
      2b) -streg covariate1 covariate2 i._t, dist(exp) nohr-
      2c) -poisson _d covariate1 covariate2 i._t if _st==1-
      --> these all yield the same results

      Now you can continue to use the command capable of instrumentation:
      3) -ivpoisson gmm _d covariate1 covariate2 i._t if _st==1, add-
      --> again, same results

      Suppose covariate2 is endogenous. Instrument it:
      4) -ivpoisson gmm _d covariate1 (covariate2 = instrument1) i._t if _st==1, add-

      Alternatively, you can find ways using -gsem-.


      Note that when you have a large dataset, these estimations can be very time-consuming.
      Good luck.

      Comment


      • #4
        Expanding on Bram's post, the following toy example shows that the MLE of the models (2a), (2b), (2c), (2d) are indeed the same.

        Code:
        webuse catheter, clear
        bysort patient (_t): keep if _n==1
        
        stset time, fail(infect) id(patient)
        stsplit, at(failures) r(rs)
        
        stcox i.female age, nohr breslow // 2a
        streg i.female age i.rs, dist(exp) nohr // 2b
        poisson _d i.female age i.rs // 2c
        xtpoisson _d i.female age, fe i(rs) // 2d
        Last edited by Andrea Discacciati; 12 Jul 2017, 14:12.

        Comment


        • #5
          I wonder whether the suite of - stteffects - , available in Stata 15, wouldn't fit your needs.
          Best regards,

          Marcos

          Comment


          • #6
            I believe that -stteffects- uses counterfactuals based on reweighting schemes, alike to matching. It does not provide suitable commands for exploiting exogenous variation.

            Comment


            • #7
              Thank you all for your valuable comments! In the mean time, I came up with the following code:
              Code:
              *In Stata one first needs to declare data to be time-to-event data using the stset command. The pseudo-observations at time 1 can now be computed and used as response for regression in a GMM approach.
              
              stset vitfup_years, failure(vitstat==1)
              
              * install stpsurv
              stpsurv, at(10)
              ivregress gmm pseudo (behandeling=okh_ziekenhuis), wmatrix(robust)
              
              * in the presence of competing risks
              stpci, at(10)
              ivregress gmm pseudo (behandeling=okh_ziekenhuis), wmatrix(robust)
              I compared these results with the results I obtained using your codes described above and it gives similar results, so that is great!


              Regarding the -stteffects-: I hadn't heard about this until now, so I had to look up some information. I am wondering whether this does the same as the ivregress/ivpoisson models described above. We hope to correct for unmeasured confounding by using an instrumental variable. Regarding the stteffects: I don't know if this works the same, as Bram also mentioned.


              Maybe something else to keep this discussion going: how do you judge your instrumental variable? When is an instrumental variable strong enough to be able to perform an instrumental variable analysis with valid results? I know that it is still possible to correct for other factors in the ivregress methods, however, my opinion is that when you have to correct for other factors, you're instrumental variable is not strong enough and the results may still suffer from the probability of unmeasured confounding.

              Curious to hear your answers

              Thanks, Marissa

              Comment


              • #8
                Just to clarify it, I underlined that - stteffects - may be considered as an option under the given scenario. However, this is not saying it is equal to IV regression.

                Under other circunstances, I mean, in a situation where I could use a IV model, I did some comparisons with teffects (not stteffects, because the model wasn't applicable to survival analysis) , and the (main) results went on a similar verge.
                Best regards,

                Marcos

                Comment


                • #9
                  That sounds good. I am definitely going to look into the stteffects further. Perhaps it may be a better solution for defining real treatments effects when there is no strong IV available in the dataset. Thanks for sharing your ideas!

                  Comment


                  • #10
                    The idea of pseudo-observations sounds interesting. Unfortunately, it is unclear to me how these are imputed and how the variance that you want to analyze is maintained...

                    With respect to the instruments: there is a lot of discussion about this. A rule of thumb is that your instruments yields an F-statistic of at least 10 in the first stage. Personally, I do not think this rule is very meaningful, because the F-statistic hinges on your specification and sample size as well.
                    Adding controls can always be useful to increase precision, and relatedly, to justify conditional IV independence. Imagine that women (in your example) are more likely to be treated, and perhaps also of being in a hospital. Then you would want the (instrumented) probability of receiving treatment, to take account of a patient's sex. Otherwise, you would mix up the covariance and get imprecise estimates.
                    Last edited by Bram Hogendoorn; 13 Jul 2017, 08:33.

                    Comment


                    • #11
                      Dear everyone,

                      this post (and other related likns) was quite useful, because I am having the same issues with IV approach in Cox survival analysis. I applied different proposals and got stuck with this option proposed by Andrea:

                      Originally posted by Andrea Discacciati View Post
                      Expanding on Bram's post, the following toy example shows that the MLE of the models (2a), (2b), (2c), (2d) are indeed the same.

                      Code:
                      webuse catheter, clear
                      bysort patient (_t): keep if _n==1
                      
                      stset time, fail(infect) id(patient)
                      stsplit, at(failures) r(rs)
                      
                      stcox i.female age, nohr breslow // 2a
                      streg i.female age i.rs, dist(exp) nohr // 2b
                      poisson _d i.female age i.rs // 2c
                      xtpoisson _d i.female age, fe i(rs) // 2d
                      As I want to use the ivpoisson gmm command, how would this look like? Instead of using xtpoisson, I replaced it by ivpoisson, but obviously some options are missing. This prcoedure works fine for me up to the poisson model, but afterwords not anymore.

                      Code:
                      stsplit, at(failures) r(rs)
                      stcox $startup_level $first_funding_level $investor_level celebrity_inv_75_all_10yrs, nohr breslow
                      streg $startup_level $first_funding_level $investor_level celebrity_inv_75_all_10yrs i.rs, dist(exp) nohr
                      poisson _d $startup_level $first_funding_level $investor_level celebrity_inv_75_all_10yrs i.rs
                      ivpoisson gmm _d $startup_level $first_funding_level $investor_level celebrity_inv_75_all_10yrs i.rs
                      I would be very happy if we could once and for all find a solution to the problem :-). By the way, I was also using ivreg2 and ivprobit as "robustness checks" and the coefficients sow similar results in terms of direction and significance.

                      Thanks and all the best,
                      Rike

                      Comment


                      • #12
                        Originally posted by Rike Bruchmann View Post
                        This prcoedure works fine for me up to the poisson model, but afterwords not anymore.
                        Next time, it would be great if you could provide some more details.
                        I assume you mean that -ivpoisson gem- won't converge (even without instruments). You're probably asking too much by trying to estimate one parameter per event time (i.rs). For example

                        Code:
                        webuse catheter, clear
                        bysort patient (_t): keep if _n==1
                        
                        stset time, fail(infect) id(patient)
                        stsplit, at(failures) r(rs)
                        
                        ivpoisson gmm _d i.female age i.rs, mult irr conv_maxiter(20) // convergence not achieved

                        You can model the baseline hazard using a parametric function (e.g. splines). Usually, this approach give very similar results in terms of HRs as compared with Cox

                        Code:
                        webuse catheter, clear
                        bysort patient (_t): keep if _n==1
                        
                        stset time, fail(infect) id(patient)
                        stsplit click, every(2)
                        
                        mkspline s = _t, cubic nknots(4)
                        
                        ivpoisson gmm _d i.female age s?, mult irr

                        Comment


                        • #13
                          Hi guys,

                          I am sorry to keep this topic ongoing, but I am still struggling with the subject (although you all tried to help me very well! I appreciate that). I tried different proposed methods for using an instrumental variable in survival analysis, but I am not sure whether I exactly understand the underlying statistics.

                          I ended up with the following code for using 'zkh_hoogsteok' as instrumental (endogenous) variable for 'behandeling':

                          Code:
                          *In Stata one first needs to declare data to be time-to-event data using the stset command. The pseudo-observations at time 1 can now be computed and used as response for regression in a GMM approach.
                          
                          stset vitfup_years, failure(vitstat==1)
                          
                          * install stpsurv
                          stpsurv, at(10)
                          ivregress gmm pseudo (behandeling=zkh_hoogsteok), wmatrix(robust)
                          However, I am not sure whether this method is the best one to use. Does anyone have experience with this method? It is the first time I am using it.

                          Furthermore, I would like to have my output similar to the output of 'normal' Cox regression, it would be nice if I can compare the results using an endogenous variable with results of not using an endogenous variable. So, is there any way that I can get hazard ratio's, or at least similarly interpretable estimates, while incorporating an endogenous variable in my analysis?

                          I tried to understand the codes described above, but I have to admit I still do not understand exactly what happens. My time to event data is in years, but they're not round numbers, so continuous and not discrete. I understand from the responses in this topic that the -gsem- command would be best for my data.

                          I found this link:
                          HTML Code:
                          https://www.stata.com/statalist/archive/2013-11/msg00846.html
                          , and I have read the STATA manual regarding the -gsem- code, but I think I lack some statistical knowledge here...

                          The code starts by the following:

                          Code:
                          stset ty, failure(event) id(id)
                          , however if I apply this to my data, I get an error with the message that I have multiple records with the same ty, in this case. I understand this error, since I have a lot of patients who were censored or died at the same time.
                          Can anyone tell me why this id has to be generated? What is exactly happening when using the following code?:

                          Code:
                          webuse brcancer,clear
                          rename censrec event
                          
                          /* Convert from days to years */
                          gen ty = rectime/365.25
                          stset ty, failure(event) id(id)
                          
                          /* stsplit to one year intervals */
                          stsplit tx , every(1)
                          replace event=0 if event==.
                          
                          /* calculate person-time in each interval */
                          gen ptime = 0 if ty< tx
                          replace ptime = 1 if  ty>=tx+1
                          replace ptime = ty - tx if ty> tx & ty< tx +1
                          
                          drop if tx == 7 /* no events */
                          
                          /* Piecewise exponential & Poisson: Same */
                          streg ibn.tx, nocons d(exponential)
                          poisson event ibn.tx, nocons exp(ptime)  irr
                          
                          /* Now GSEM linked equations */
                          gsem  ///
                          (event <- ibn.tx hormon, poisson exp(ptime) nocons) ///
                          (hormon <- x7, f(binomial) link(logit))
                          estat eform  event hormon
                          note: this code is taken from the above mentioned link.

                          Hope someone is able to help me, I am really getting stuck at this point.

                          Marissa

                          Comment


                          • #14
                            Hi Marissa and others,

                            Hope this helps.
                            At first, we open the data.
                            Code:
                            webuse brcancer,clear
                            rename censrec event
                            Variable "ty" gives the number of days during which the patient was observed. The observation period ends either with death or censoring (patient still alive). We convert "ty" from number of days to number of years.
                            Then, we tell Stata that we have survival data. Stata runs some checks to see if the data structure is compatible with survival analysis. Option -id()- is required for the command -stsplit-, later on.
                            Code:
                            /* Convert from days to years */
                            gen ty = rectime/365.25
                            stset ty, failure(event) id(id)
                            Now, we split the patient records into patient-year records. A patient gets a record for each year that they were observed. Note that variable "_t0" now indicates the record's start year and "_t" the record's end year.
                            Because of the splitting, the variable "event" is missing in newly created patient-year records. We set it to 0 in those years, because the patient did not die in those years.
                            Code:
                            /* stsplit to one year intervals */
                            stsplit tx , every(1)
                            replace event=0 if event==.
                            We create a new variable "ptime" that indicates the length of the patient-year record. Usually, this is one year, but it may be shorter for the patient's last year record, because that is when their observation period ended (either by death or censoring). Note that this could also have been achieved with (much) simpler code: gen ptime2 = _t - _t0.
                            Then, all patients who were observed at least 7 years, are dropped. Why? None of them died, so there exists no hazard rate at t=7.
                            Code:
                            /* calculate person-time in each interval */
                            gen ptime = 0 if ty< tx
                            replace ptime = 1 if  ty>=tx+1
                            replace ptime = ty - tx if ty> tx & ty< tx +1
                            drop if tx == 7 /* no events */
                            With the current data structure, it can be shown that a survival model with an exponential baseline hazard can be estimated in two ways. Either as a Cox model with the baseline hazard function parametrized as an exponential distribution, or as a Poisson model with piecewise-constant baseline hazards (with "pieces" defined at each time point).
                            Code:
                            /* Piecewise exponential & Poisson: Same */
                            streg ibn.tx, nocons d(exponential)
                            poisson event ibn.tx, nocons exp(ptime) irr
                            We can display these results also without exponentiation.
                            Code:
                            streg ibn.tx, nocons d(exponential) nohr
                            poisson event ibn.tx, nocons exp(ptime)
                            Finally, let's have a look at instrumental variables. Given that the survival model can be estimated using a Poisson regression, the command -ivpoisson- is at our disposal.
                            First we fit the same model as before, then a model including "hormon" as an exogenous variable, and lastly a model including "hormon" as an endogenous variable instrumented with "x2".
                            Code:
                            ivpoisson gmm event ibn.tx, exp(ptime) nocons add
                            ivpoisson gmm event hormon ibn.tx, exp(ptime) nocons add
                            ivpoisson gmm event ibn.tx (hormon = x2), exp(ptime) nocons add

                            Comment


                            • #15
                              Dear all

                              I hope all is well.
                              I am very very interested in your answer and post about: instrumental variable cox regression.

                              I tried to do:
                              Stset finaltime , failure ( gstatus)
                              poison _d covariate 1 covariate 2 i_t if _st==1

                              However I got the following :message:
                              _t: factor variable may not contain noninteger values.

                              Any idea how to solve this?

                              Comment

                              Working...
                              X