Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -mlogit- in multiple imputation with chained equations

    I'm trying to run -mi impute chained- on a fairly large data set with a large number of variables. The code is too long to post here, but I think the following excerpt includes everything relevant:
    Code:
    //    IDENTIFY REGULAR VARIABLES AND VARIABLES TO IMPUTE
    mi set mlong
    mi set M = 1 // UNTIL WE GET IT WORKING, THEN M = 50
    
    ds /*several_dozen_variables*/
    local for_pmm `r(varlist)'
    
    ds /*another_bunch_of_count_variables*/
    local for_poisson `r(varlist)'
    
    ds /*a_few_dichotomies*/  
    local for_logit `r(varlist)'
    
    ds work_status maritalstatus
    local for_mlogit `r(varlist)'
    
    /*
    DEFINITIONS OF LOCAL MACROS regular passive imputed HERE
    */
    
    mi register regular `regular'
    mi register passive `passive'
    mi register imputed `imputed'
    
    
    mi impute chained ///
        (pmm,  knn(1) noisily) `for_pmm' ///
        (poisson, noisily iterate(100)) `for_poisson' ///
        (mlogit, augment noisily iterate(100)) `for_mlogit' ///
        (logit, augment noisily iterate(100)) `for_logit'  ///
        = `regular', augment report replace force
    Stata runs through things just fine until it gets to the -mlogits-. Then, it performs a bunch of them successfully for both the marital status (3 levels) and work status (4 level) variables.(Output too long to show here.) But then, it aborts with the following output:

    Code:
    Running mlogit on data from iteration 1, m=1:
    
    
    note: ethnicityethn3 omitted because of collinearity
    too few categories
    error occurred during imputation of income cage_score sdsworkyessq001 sdssq002 sdssq003 qol_total phq2_score v4_ptsd_level
    medical_conditions_after_911 rescue_occasions work_status maritalstatus ethn3a trainingsq001 trainingsq002 on m = 1
    r(148);
    I don't know what to make of "too few categories," and I don't know how to proceed to troubleshoot it. Clearly, both marital status and work status have more than 2 levels: they have already been used as the outcomes of mlogit earlier. I suppose that in some of the iterations it might happen that only one or two of the outcome levels are actually instantiated in the estimation sample (though it's a bit surprising as none of the outcomes is particularly rare):
    Code:
    . mi xeq 0: tab1 maritalstatus work_status
    
    m=0 data:
    -> tab1 maritalstatus work_status
    
    -> tabulation of maritalstatus  
    
                 maritalstatus |      Freq.     Percent        Cum.
    ---------------------------+-----------------------------------
                 Never Married |        603       14.54       14.54
            Married/Cohabiting |      3,054       73.63       88.16
    Widowed/Divorced/Separated |        491       11.84      100.00
    ---------------------------+-----------------------------------
                         Total |      4,148      100.00
    
    -> tabulation of work_status  
    
             work_status |      Freq.     Percent        Cum.
    ---------------------+-----------------------------------
         Working (FT/PT) |      2,639       64.55       64.55
                Disabled |        457       11.18       75.73
                 Retired |        821       20.08       95.82
    Unempl/Retired/Other |        171        4.18      100.00
    ---------------------+-----------------------------------
                   Total |      4,088      100.00
    But I can't even start to troubleshoot this, because the output doesn't even tell me which of these outcome variables is implicated in the problem.

    Any thoughts on how I can figure this out?



  • #2
    A while back, I replicated this problem (http://www.statalist.org/forums/foru...gories-problem see post #4) but I am not sure that the solution applies here. To summarize, if the omitted variable (ethnicityethn3 in your case) has positive values for only one level of your outcome, and missing values for the rest, then this will result in the error.

    Comment


    • #3
      Andrew,

      Thanks for the suggestion, but I've checked and that isn't happening in this data set.

      Comment


      • #4
        Just though I'd let everybody know that, following a suggestion from Rich Goldstein, I've resolved this problem by using Patrick Royston's -ice- rather than -mi-.

        Comment


        • #5
          Interesting. Any ideas as to why ice works better? Or is this just one of those things you take on blind faith so long as it works?
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          Stata Version: 17.0 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            I will say that this kind of thing has happened to me more than once and I am in conversation with Yulia Marchenko about a particular example; I start with MI but if it fails for non-obvious reasons I do move to ice

            Comment


            • #7
              I have no idea if these issues are related but I was having problems imputing count variables using mi impute chained with either the nbreg or poisson distribution. I was in contact with technical support folks at Stata and they replicated the problems and indicated there was a bug. I've not seen any updates that would suggest this has been resolved. Perhaps ice is more robust?

              Comment

              Working...
              X