Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with conversion of SAS code to Stata - immortal time bias

    Dear All

    This code in SAS implements the clone-censor-weight method of dealing with immortal time bias
    The variables are as follows:

    Variables from underlying dataset
    time_exposure: Time from cohort entry (e.g., cancer diagnosis) to exposure (e.g., surgery) in days; missing if individual never has exposure during follow-up
    exposure: Binary variable that equals 1 if individual has exposure during follow-up and equals 0 if individual does not have exposure during follow-up
    time_outcome: Time from cohort entry to outcome (e.g., death) or loss-to-follow-up in days
    outcome: Binary indicator that equals 1 if individual experiences the outcome during follow-up and equals 0 if individual is censored (loss to follow-up or administrative)

    Variables created for clone-censor-weight method
    exposure_final: Binary indicator that equals 1 for exposed clones and 0 for unexposed clones; this variable is set by the analyst/programmer and not based on underlying data
    follow_up: Amount of follow-up, calculated as time from cohort entry until censoring due to treatment deviation, the outcome, or censoring
    outcome_intermediate: Binary indicator that equals 1 if individual dies at end of the follow-up interval and equals 0 if individual is censored (either due to treatment deviation or censoring); this variable is created in an intermediate step before intervals for censoring weights are created
    deviate_intermediate: Binary indicator that equals 1 if an individual deviated from their assigned exposure during the treatment assessment window; this variable is created in an intermediate step before intervals for censoring weights are created
    time1: Start of time interval – in this example, time is broken into months and the conditional probability of remaining uncensored is calculated for each month
    time2: End of time interval – in this example, either the end of the month, or the time within the month that an event, treatment deviation, or censoring occurs
    outcome_final: Binary variable indicating if outcome occurred during the time interval
    deviate_final: Binary variable indicating if individual deviated from treatment assignment during that interval

    Any help from experts in SAS and Stata would be much appreciated and code is below

    Regards
    Suhail

    Code:
    *Use a macro variable (cut) to specify the duration of the treatment assessment window. In this example, we use an 8-month treatment assessment window. Since our other timing variables are recorded in days, we convert months to days when defining the landmark macro variable;
    %let cut=8*(365/12);
     
    *Step 1: create dataset that allocates each treatment to a clone. In this dataset, there will be two observations for each individual – one where they are assigned to be treated/exposed and one where they are assigned to not receive treatment/unexposed;
    data clone_treatment;
          set underlying_data;
          row=_n_;
     
          *EVERYONE IS SET TO EXPOSED;
          exposure_final=1;
    *[1] If individuals die during the treatment assessment window (i.e., before or on the cut), then keep exposure_final set to 1 and their time and outcome the same as in the underlying data (regardless of actual exposure);
                if time_outcome<=&cut. then do;
                      outcome_intermediate=outcome;
                      follow_up=time_outcome;
                      deviate_intermediate=0;
                end;
     
    *[2] If they are exposed after the treatment assessment window (i.e., after the cut) or they are never exposed, censor at the end of the treatment assessment window (8 months) and flag as a deviation in assigned treatment status;
                else if time_exposure>(&cut.)|time_exposure=. then do;
                      outcome_intermediate=0;
                      follow_up=(&cut.);
                      deviate_intermediate=1;
                end;
     
    *[3] Otherwise, if they do not receive the exposure during the treatment assessment window (i.e., before or on the cut), then keep exposure_final set to 1 and their time and outcome the same as in the underlying data;
                else if time_exposure<=&cut.;
                      outcome_intermediate=outcome;
                      follow_up=time_outcome;
                      deviate_intermediate=0;
                end;
     
                id=row||"_1";
                output;
              
          *EVERYONE IS SET TO UNEXPOSED;
          exposure_final=0;
    *[1] If they are exposed during the treatment assessment window (i.e., before or on the cut), censor at time of exposure and flag as a deviation in assigned treatment status;
                if exposure=1 & time_exposure<=&cut. then do;
                      outcome_intermediate=0;
                      follow_up=time_exposure;
                      deviate_intermediate=1;
                end;
     
    *[2] Otherwise, if they do not receive the exposure during follow-up or if they are exposed after the treatment assessment window (i.e., after the cut) then keep exposure_final set to 0 and their time and outcome the same as in the underlying data;
                else if exposure=0 | (exposure=1 & time_exposure>&cut.);
                      outcome_intermediate=outcome;
                      follow_up=time_outcome;
                      deviate_intermediate=0;
                end;
     
                id=row||"_0";
                output;
    run;
     
    proc sort; by id; run;
     
    *Step 2: Create intervals for censoring weights -- note different intervals can be used or you can re-estimate weights every time someone deviates from treatment assignment. We chose 1 month intervals based on our data and study question.
     
    In this example, we apply monthly censoring weights for the 3-years of follow-up in our study. This means that for each observation created in Step 1 (two observations per individual) there will be up to 36 observations in the new dataset, one for each month of follow-up at the assigned treatment.;
    data clone_month;
          set clone_treatment;
     
    *Loop through each month of follow-up – 36 for our analysis of 3-year survival;
          do month=1 to 36;
                time1=month-1;
                time2=min(month, follow_up/(365.25/12));
     
    *[1] If the outcome occurred during that month/time interval, set the outcome and censoring/deviation variables to what was observed in Step 1(above);
                if month = ceil(follow_up/(365.25/12)) then do;
                      outcome_final=outcome_intermediate;
                      deviate_final=deviate_intermediate;
                end;
     
    *[2] Otherwise, if the outcome did not occur during that month/time interval, the individual remains uncensored and has no outcome (death);
                else do;
                      outcome_final=0;
                      deviate_final=0;
                end;
     
    *After determining outcome and deviation status for that month, output observation to new dataset;
                if month <= ceil(follow_up/(365.25/12)) then output;
          end;
    run;
     
     
    *Step 3: estimate inverse-probability of censoring weights using logistic regression models;
     
    *Assume all covariates are stored in a “covars” macro variable that is previously defined;
     
    proc genmod data= clone_month;
          by exposure_final;
          class month (ref="&cut.");
          model deviate_final(ref="1")=&covars. month /link=logit dist=bin;
          output out=censor_den p=den;
    run;
     
    proc genmod data=censor_den;
          by exposure_final;
          class month(ref="&cut.");
          model deviate_final(ref="1")= month /link=logit dist=bin;
          output out=censor_num p=num;
    run;
     
    data clone_final;
          set censor_num;
          retain n d;
          by id;
     
          if first.id then do;
                n=1;
    d=1;
          end;
     
          *Deviation weights;
          n=num*n;
          d=den*d;
          wt=n/d;
    run;
     
     
    *Step 4: run proportional hazards regression model to estimate the HR in the cloned population;
     
    *Note: Bootstrapping should be used to obtain appropriate confidence intervals;
     
    proc phreg data=clone_final;
          id id;
          weight wt;
          model (time1, time2)*outcome_final(0)=exposure_final/ties=efron;
    run;
    Last edited by Suhail Doi; 15 Mar 2025, 03:50. Reason: Edited title
    Regards
    Suhail Doi

  • #2
    I use DeepSeek to do this work. Below is the result it gives.

    Code:
    **Step 1: Create the clone dataset**
    * Use a local macro to specify the duration of the treatment assessment window.
    local cut = 8 * (365 / 12)
    
    * Step 1: Create dataset that allocates each treatment to a clone.
    gen row = _n
    
    * EVERYONE IS SET TO EXPOSED
    gen exposure_final = 1
    
    * [1] If individuals die during the treatment assessment window (i.e., before or on the cut), then keep exposure_final set to 1 and their time and outcome the same as in the underlying data (regardless of actual exposure).
    gen outcome_intermediate = outcome if time_outcome <= `cut'
    gen follow_up = time_outcome if time_outcome <= `cut'
    gen deviate_intermediate = 0 if time_outcome <= `cut'
    
    * [2] If they are exposed after the treatment assessment window (i.e., after the cut) or they are never exposed, censor at the end of the treatment assessment window (8 months) and flag as a deviation in assigned treatment status.
    replace outcome_intermediate = 0 if time_exposure > `cut' | missing(time_exposure)
    replace follow_up = `cut' if time_exposure > `cut' | missing(time_exposure)
    replace deviate_intermediate = 1 if time_exposure > `cut' | missing(time_exposure)
    
    * [3] Otherwise, if they do not receive the exposure during the treatment assessment window (i.e., before or on the cut), then keep exposure_final set to 1 and their time and outcome the same as in the underlying data.
    replace outcome_intermediate = outcome if time_exposure <= `cut'
    replace follow_up = time_outcome if time_exposure <= `cut'
    replace deviate_intermediate = 0 if time_exposure <= `cut'
    
    * Generate a unique ID for each observation.
    gen id = string(row) + "_1"
    
    * Save the dataset.
    save clone_treatment, replace
    
    * EVERYONE IS SET TO UNEXPOSED
    gen exposure_final_unexposed = 0
    
    * [1] If they are exposed during the treatment assessment window (i.e., before or on the cut), censor at time of exposure and flag as a deviation in assigned treatment status.
    replace outcome_intermediate = 0 if exposure == 1 & time_exposure <= `cut'
    replace follow_up = time_exposure if exposure == 1 & time_exposure <= `cut'
    replace deviate_intermediate = 1 if exposure == 1 & time_exposure <= `cut'
    
    * [2] Otherwise, if they do not receive the exposure during follow-up or if they are exposed after the treatment assessment window (i.e., after the cut), then keep exposure_final set to 0 and their time and outcome the same as in the underlying data.
    replace outcome_intermediate = outcome if exposure == 0 | (exposure == 1 & time_exposure > `cut')
    replace follow_up = time_outcome if exposure == 0 | (exposure == 1 & time_exposure > `cut')
    replace deviate_intermediate = 0 if exposure == 0 | (exposure == 1 & time_exposure > `cut')
    
    * Generate a unique ID for each observation in the unexposed group.
    gen id_unexposed = string(row) + "_0"
    
    * Append the unexposed group to the exposed group.
    append using clone_treatment
    
    * Save the final dataset.
    save final_clone_treatment, replace
    
    **Step 2: Create intervals for censoring weights**
    * Create monthly intervals for censoring weights.
    gen month = .
    gen time1 = .
    gen time2 = .
    gen outcome_final = .
    gen deviate_final = .
    
    local max_month = 36  * 3 years of follow-up, 36 months
    
    forvalues month = 1/`max_month' {
        replace time1 = `month' - 1
        replace time2 = min(`month', follow_up / (365.25 / 12))
    
        * [1] If the outcome occurred during that month/time interval, set the outcome and censoring/deviation variables to what was observed in Step 1.
        replace outcome_final = outcome_intermediate if `month' == ceil(follow_up / (365.25 / 12))
        replace deviate_final = deviate_intermediate if `month' == ceil(follow_up / (365.25 / 12))
    
        * [2] Otherwise, if the outcome did not occur during that month/time interval, the individual remains uncensored and has no outcome (death).
        replace outcome_final = 0 if `month' < ceil(follow_up / (365.25 / 12))
        replace deviate_final = 0 if `month' < ceil(follow_up / (365.25 / 12))
    
        * Output observations for each month.
        if `month' <= ceil(follow_up / (365.25 / 12)) {
            save clone_month, replace
        }
    }
    
    **Step 3: Estimate inverse-probability of censoring weights**
    * Estimate the denominator of the weights using logistic regression.
    logit deviate_final i.month `covars' if exposure_final == 1, vce(robust)
    predict den if exposure_final == 1
    
    logit deviate_final i.month `covars' if exposure_final == 0, vce(robust)
    predict den if exposure_final == 0
    
    * Estimate the numerator of the weights using logistic regression.
    logit deviate_final i.month if exposure_final == 1, vce(robust)
    predict num if exposure_final == 1
    
    logit deviate_final i.month if exposure_final == 0, vce(robust)
    predict num if exposure_final == 0
    
    * Calculate the weights.
    gen wt = num / den
    
    **Step 4: Run proportional hazards regression model**
    * Run the Cox proportional hazards model with weights.
    stset time2, failure(outcome_final) id(id)
    stcox exposure_final [pweight=wt], vce(robust) ties(efron)
    
    **Notes**
    - Ensure that all variable names (e.g., `time_outcome`, `time_exposure`, `outcome`, etc.) match those in the dataset.
    - If adjustments to the time intervals or weight calculation methods are needed, modify the code according to specific requirements.
    - It is recommended to use the Bootstrap method to calculate confidence intervals.

    Comment


    • #3
      I've not tested this code, but it's clear in advance that

      Code:
      local max_month = 36  * 3 years of follow-up, 36 months
      will fail as the intended comment needs to be separated:

      Code:
      local max_month = 36  
      * 3 years of follow-up, 36 months

      Comment


      • #4
        Thanks Chen and Nick
        I will try it out on sample data

        Note it gave
        Code:
        stset time2, failure(outcome_final) id(id)
        stcox exposure_final [pweight=wt], vce(robust) ties(efron)
        Should perhaps be
        Code:
        stset time2 [pweight=wt], failure(outcome_final) origin(time1) id(id)
        stcox exposure_final , vce(robust) breslow
        Also weights are not supported with efron and exactp
        Any thoughts about origin?

        Regards
        Suhail
        Last edited by Suhail Doi; 15 Mar 2025, 11:04.
        Regards
        Suhail Doi

        Comment

        Working...
        X