Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation on continuous outcome variable

    Hello,

    I have been running regression analysis on CFU outcomes where there is a lower limits of detection (3 log10). I have total of 134 observations. Among those I have 23 observations that were zero ( but not true zero they are under detection limit) and 2 other missing observations. I have treatment and clustered pen effects that I would like to measure effects of treatment on CFU using linear regression with cluster of pen effects.

    Basically my goal is to impute those observations between zero and lower limits of detection. Here are the steps I followed:

    mi set wide
    mi register regular treatment pen
    mi register imputed cfu
    gen ulimplog10=3
    gen llimplog10=0
    replace llimp = cfu if cfu> 0 & cfu< 99999999999
    replace ulimplog10 = cfu if cfu> 0 & cfu< 99999999999

    mi impute intreg cfuimputed b(1).treatment pen, add(20) rseed(1234) ll(llimplog10) ul(ulimplog10)

    mi estimate: mixed cfuimputed ib(1).treatment|| pen_id:

    mimrgns treatment, cmdmargins
    marginsplot, xdimension(trteatment) recast(bar)

    The model converged nicely. I wonder if this is a valid approach?

    specially adding treatment and pen during imputation process?

    Many thanks in advance

    Gizem




  • #2
    I do not follow this completely; partly because I have no clear idea what CFU, pen, etc. is but also because your code is hard to follow: you seem to use abbreviated names inconsistently. For example, you register cfu as imputed but go on an impute cfuimputed. It also makes me nervous to see values such as 99999999999; code those as "hard" missing values, e.g., .a, .b, or .z; mi will not impute them but you do not run the risk of having magic numbers mess up any estimation commands. If there are no missing values in treatment and pen, there should be an equals sign preceding these variables.

    However, before jumping into syntax and technical details, let me remind you that imputing only the outcome variable is not recommended, especially if the imputation model does not include additional variables. For an explanation of why this is probably not a good idea, see Von Hippel (2007). Considering the additional potential problems that the clustered nature of the data might pose during the imputation process, you might really be better off just using complete case analysis here.

    Yet another problem is that the lower limit of detection arguably does not justify the liner model (xtmixed; which, by the way, is called mixed in modern Stata). One could argue that you should actually impute the "true" values for all observations that are at the lower limit of detection. Conversely, one might say that the linear model is a reasonable approximation; but then why not use a liner model or, perhaps better, predictive mean matching for the imputation? If I were to impute here (and, as explained above, I would really be reluctant), I would probably stick with one model for both the imputation and analyses.

    Given the small size of the dataset, I guess it should not take a lot of time to try different specifications. Play around a bit and see whether results are sensitive to the choice of the imputation model (and the choice to impute at all).

    I hope this helps.

    Best
    Daniel


    Von Hippel, Paul. 2007. Regression with missing Ys: An improved strategy for analyzing multiply-impured data. Sociological Methodology, 37, 83-117.

    Comment


    • #3
      I should add that the "missing" values in the outcome depend on the outcome and only on the outcome. That is, the reason for those values beeing missing (or 0) is that the values are too low to be detected. One could argue that this situation does not qualify as MAR which underlies conventional imputation methods. Thus, if you want to impute those values, you should probably do so under a variety of plausible assumptions about the underlying distribution.

      Best
      Daniel

      Comment


      • #4
        Thank you Daniel for your kind response! I apologize for my late acknowledgement. This was a great help! Yes you were right, the missing values are MNAR. But based on the lit search, I figured I could use MI with MNAR data. Please correct me if I am wrong.


        Many thanks again! I highly appreciate your expertise and inputs.


        Gizem

        Comment

        Working...
        X