Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • accounting for missing treatment with propensity score adjustment


    Hi,

    I am working on a project in which I am trying to assess the impact of smoking cessation on long-term mortality rates in a specific patient population. My issue is that, while I have complete data on whether or not the patients were smokers initially (i.e. 1 if yes, 0 if no), I don't have complete data on whether they quit smoking within a given time frame. I have roughly 1100 smokers, for which I have cessation data (i.e. 1 if they quit, 0 if they did not quit) on roughly 800. To be clear, I have long-term mortality data (censored) for the full 1100.

    Up until now, I have been using propensity score adjustment. My syntax has been of the form:

    logit quitteryn [varlist = variables associated with the outcome of death]
    predict quitterps
    stcox quitteryn quitterps

    (I have been led to believe that this is the correct methodology).

    I have been trying to determine how best to account for those patients for which I do not have treatment data (i.e. quitteryn = .). However, I'll be honest that I'm a bit lost as this is my first time encountering this type of analysis. It would appear that IPW would provide a solution, but I am uncertain as to whether I need to do two layers of IPW - one that accounts for differences in the type of treatment (i.e. quitteryn = 1 vs = 0) and another that accounts for the existence of missing data (i.e. quitteryn ~= . vs == .). Also, if I do need to do this, I really have no idea what the syntax should be.

    I've seen IPW in conjunction with stteffects and seem to have been able to make that code work. However, as my primary results are in the form of a hazard ratio, I would love to be able to incorporate the correction into a cox model (i.e. stcox). Hazard ratios are simply the standard of reporting being used.

    Any assistance or explanation (preferably along with STATA syntax) would be much appreciated! Thanks in advance.

    ~ Dave


  • #2
    You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    There is a massive literature on missing data. Your technique is certainly not what you want - you're using predicted values even where you have actual. You could look at GSEM's approaches to missing data or multiple imputation.

    Comment


    • #3
      1) I would recommend first examining the nature of missingness- there are various types of ways in which data are missing, the solutions for which can be different. For instance, you can have MAR (missing at random), MCAR (missing completely at random) etc.
      2) You may also have a presence of observations that fall under an "incomplete spell" type. Duration/hazard models can naturally incorporate such spells.

      Comment

      Working...
      X