-mlogit- in multiple imputation with chained equations

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

-mlogit- in multiple imputation with chained equations

10 Aug 2016, 10:39

I'm trying to run -mi impute chained- on a fairly large data set with a large number of variables. The code is too long to post here, but I think the following excerpt includes everything relevant:

Code:

//    IDENTIFY REGULAR VARIABLES AND VARIABLES TO IMPUTE
mi set mlong
mi set M = 1 // UNTIL WE GET IT WORKING, THEN M = 50

ds /*several_dozen_variables*/
local for_pmm `r(varlist)'

ds /*another_bunch_of_count_variables*/
local for_poisson `r(varlist)'

ds /*a_few_dichotomies*/  
local for_logit `r(varlist)'

ds work_status maritalstatus
local for_mlogit `r(varlist)'

/*
DEFINITIONS OF LOCAL MACROS regular passive imputed HERE
*/

mi register regular `regular'
mi register passive `passive'
mi register imputed `imputed'


mi impute chained ///
    (pmm,  knn(1) noisily) `for_pmm' ///
    (poisson, noisily iterate(100)) `for_poisson' ///
    (mlogit, augment noisily iterate(100)) `for_mlogit' ///
    (logit, augment noisily iterate(100)) `for_logit'  ///
    = `regular', augment report replace force

Stata runs through things just fine until it gets to the -mlogits-. Then, it performs a bunch of them successfully for both the marital status (3 levels) and work status (4 level) variables.(Output too long to show here.) But then, it aborts with the following output:

Code:

Running mlogit on data from iteration 1, m=1:


note: ethnicityethn3 omitted because of collinearity
too few categories
error occurred during imputation of income cage_score sdsworkyessq001 sdssq002 sdssq003 qol_total phq2_score v4_ptsd_level
medical_conditions_after_911 rescue_occasions work_status maritalstatus ethn3a trainingsq001 trainingsq002 on m = 1
r(148);

I don't know what to make of "too few categories," and I don't know how to proceed to troubleshoot it. Clearly, both marital status and work status have more than 2 levels: they have already been used as the outcomes of mlogit earlier. I suppose that in some of the iterations it might happen that only one or two of the outcome levels are actually instantiated in the estimation sample (though it's a bit surprising as none of the outcomes is particularly rare):

Code:

. mi xeq 0: tab1 maritalstatus work_status

m=0 data:
-> tab1 maritalstatus work_status

-> tabulation of maritalstatus  

             maritalstatus |      Freq.     Percent        Cum.
---------------------------+-----------------------------------
             Never Married |        603       14.54       14.54
        Married/Cohabiting |      3,054       73.63       88.16
Widowed/Divorced/Separated |        491       11.84      100.00
---------------------------+-----------------------------------
                     Total |      4,148      100.00

-> tabulation of work_status  

         work_status |      Freq.     Percent        Cum.
---------------------+-----------------------------------
     Working (FT/PT) |      2,639       64.55       64.55
            Disabled |        457       11.18       75.73
             Retired |        821       20.08       95.82
Unempl/Retired/Other |        171        4.18      100.00
---------------------+-----------------------------------
               Total |      4,088      100.00

But I can't even start to troubleshoot this, because the output doesn't even tell me which of these outcome variables is implicated in the problem.

Any thoughts on how I can figure this out?

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10194
#2

10 Aug 2016, 17:30

A while back, I replicated this problem (http://www.statalist.org/forums/foru...gories-problem see post #4) but I am not sure that the solution applies here. To summarize, if the omitted variable (ethnicityethn3 in your case) has positive values for only one level of your outcome, and missing values for the rest, then this will result in the error.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

10 Aug 2016, 18:37

Andrew,

Thanks for the suggestion, but I've checked and that isn't happening in this data set.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

11 Aug 2016, 16:21

Just though I'd let everybody know that, following a suggestion from Rich Goldstein, I've resolved this problem by using Patrick Royston's -ice- rather than -mi-.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#5

11 Aug 2016, 16:26

Interesting. Any ideas as to why ice works better? Or is this just one of those things you take on blind faith so long as it works?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#6

11 Aug 2016, 19:31

I will say that this kind of thing has happened to me more than once and I am in conversation with Yulia Marchenko about a particular example; I start with MI but if it fails for non-obvious reasons I do move to ice
Comment
Brad Anderson

Join Date: Sep 2014

Posts: 70
#7

12 Aug 2016, 08:17

I have no idea if these issues are related but I was having problems imputing count variables using mi impute chained with either the nbreg or poisson distribution. I was in contact with technical support folks at Stata and they replicated the problems and indicated there was a bug. I've not seen any updates that would suggest this has been resolved. Perhaps ice is more robust?
Comment

Announcement

-mlogit- in multiple imputation with chained equations

Comment

Comment

Comment

Comment

Comment

Comment