Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No Output, Infinite Cycling Wheel (Multiple Imputation Fixed Effects Logistic Regression)

    Need help troubleshooting a Stata issue. I've been running into dead ends the last few days.



    The issue:

    I am looking at the relationship of students achieving X1 benchmark and two outcomes (a continuous outcome (Y1) and binary outcome(Y2) using multiple regression and then logistic regression. Also, I need to do all of this once by casewise deletion and another time by multiple imputation.

    The end result should produce:

    A. Not Imputed (Casewise Deletion) OLS Regression: Y1 on X1

    B. Imputed (Casewise Deletion) OLS Regression: Y1 on X1

    C. Not Imputed (Multiple Imputation) Multiple Regression: Y1 on X1+X2+X3...

    D. Imputed (Multiple Imputation) Multiple Regression: Y1 on X1+X2+X3...

    E. Not Imputed (Casewise Deletion) Logit: Y2 on X1

    F. Imputed (Multiple Imputation) Logit: Y2 on X1

    G. Not Imputed (Casewise Deletion) Logit: Y2 on on X1+X2+X3...

    H. Imputed (Multiple Imputation) Logit: Y2 on on X1+X2+X3...




    Thus far:
    • A and B estimates are the same (as they should be)
    • C and D estimates are almost the same (as they should be)
    • E and F estimates are very different (something seems wrong, need help)
    • G estimates post fine and seem reasonable
    • H estimates never post, no error message either (loading wheel at bottom cycles for hours, need help)

  • #2
    David,
    welcome to this forum.
    Please read the FAQ on how to post more effectively. Thanks.
    With no quantitative details from your side, my surely not that helpful comment is: no wonder that E differ from F, as those models imply totally different procedures.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Got it, thank you.

      Comment


      • #4
        To add to Carlo Lazzaro's advice, the fact that your results differ with vs without multiple imputation suggests that the analysis of complete cases does indeed bias your results. It is precisely the purpose of multiple imputation to reduce that bias. So when multiple imputation produces a substantial change in your results, you should be glad you did multiple imputation!

        That said, you would be well advised to scrutinize the outputs of E and F to make sure that the variables used in the analysis are exactly the same and that the only difference between them is due to multiple imputation.

        As for what is happening with H, you provide so little detail that it is not possible to advise. At what phase of the analysis do things appear to get stuck? During the imputation, or in the subsequent logistic regression.

        If it appears stalled during the imputation, consider also that multiple imputation is a computationally-intense process, particularly if some of the imputations are multinomial logistic imputations. The size of the data set is a factor as well. I have had some multiple imputations that have taken days to complete. Hours, unless you are running a small and simple problem, doesn't strike me as cause for concern. One possible way to speed it up, if you do have multinomial variables being imputed, is to simplify those to dichotomous (binary) variables, provided you can do so in a sensible way that preserves the overall meaning of the variables in the context of your problem. I would also note that, in my experience at least, if the wheel is still spinning, Stata is still working. You can also check, on a Windows system, by opening Task Manager and watching for "signs of life" in the Stata process. By "signs of life," I mean that when Stata is still running you will see reasonable amounts of CPU and memory allocated to it, and you will see those varying up and down from time to time. I imagine that Mac and Unix have something similar to Task Manager that allows you to see, in real time, the resource use of all your processes, and I suppose you could similarly get a sense of whether the process is still really running.

        If it appears hung during the logistic regression, then you have to consider how large a problem you are running. How many observations? How many variables? And of course, you have to then multiply that by the number of imputations, that must run consecutively. In general, logistic regressions run pretty quickly, but if the problem is large enough, it, too, could take hours.

        Added: If the delay appears to be while you are doing the imputation, you could kill the process, and then add the -dots- option to the -mi impute- command. That way you will see a new dot added to the Results window as each imputed data set is created. The data sets will all take roughly the same amount of time to impute, with only a little variation. So if things have been zipping along, with dots coming at more or less regular intervals, and then there is a long period with no more dots coming, then it probably really is stuck. And if the dots keep coming at roughly regular (if long) intervals, then you know just have to wait for it to finish.
        Last edited by Clyde Schechter; 20 Jun 2022, 17:32.

        Comment

        Working...
        X