I'm trying to run -mi impute chained- on a fairly large data set with a large number of variables. The code is too long to post here, but I think the following excerpt includes everything relevant:
Stata runs through things just fine until it gets to the -mlogits-. Then, it performs a bunch of them successfully for both the marital status (3 levels) and work status (4 level) variables.(Output too long to show here.) But then, it aborts with the following output:
I don't know what to make of "too few categories," and I don't know how to proceed to troubleshoot it. Clearly, both marital status and work status have more than 2 levels: they have already been used as the outcomes of mlogit earlier. I suppose that in some of the iterations it might happen that only one or two of the outcome levels are actually instantiated in the estimation sample (though it's a bit surprising as none of the outcomes is particularly rare):
But I can't even start to troubleshoot this, because the output doesn't even tell me which of these outcome variables is implicated in the problem.
Any thoughts on how I can figure this out?
Code:
// IDENTIFY REGULAR VARIABLES AND VARIABLES TO IMPUTE
mi set mlong
mi set M = 1 // UNTIL WE GET IT WORKING, THEN M = 50
ds /*several_dozen_variables*/
local for_pmm `r(varlist)'
ds /*another_bunch_of_count_variables*/
local for_poisson `r(varlist)'
ds /*a_few_dichotomies*/
local for_logit `r(varlist)'
ds work_status maritalstatus
local for_mlogit `r(varlist)'
/*
DEFINITIONS OF LOCAL MACROS regular passive imputed HERE
*/
mi register regular `regular'
mi register passive `passive'
mi register imputed `imputed'
mi impute chained ///
(pmm, knn(1) noisily) `for_pmm' ///
(poisson, noisily iterate(100)) `for_poisson' ///
(mlogit, augment noisily iterate(100)) `for_mlogit' ///
(logit, augment noisily iterate(100)) `for_logit' ///
= `regular', augment report replace force
Code:
Running mlogit on data from iteration 1, m=1: note: ethnicityethn3 omitted because of collinearity too few categories error occurred during imputation of income cage_score sdsworkyessq001 sdssq002 sdssq003 qol_total phq2_score v4_ptsd_level medical_conditions_after_911 rescue_occasions work_status maritalstatus ethn3a trainingsq001 trainingsq002 on m = 1 r(148);
Code:
. mi xeq 0: tab1 maritalstatus work_status
m=0 data:
-> tab1 maritalstatus work_status
-> tabulation of maritalstatus
maritalstatus | Freq. Percent Cum.
---------------------------+-----------------------------------
Never Married | 603 14.54 14.54
Married/Cohabiting | 3,054 73.63 88.16
Widowed/Divorced/Separated | 491 11.84 100.00
---------------------------+-----------------------------------
Total | 4,148 100.00
-> tabulation of work_status
work_status | Freq. Percent Cum.
---------------------+-----------------------------------
Working (FT/PT) | 2,639 64.55 64.55
Disabled | 457 11.18 75.73
Retired | 821 20.08 95.82
Unempl/Retired/Other | 171 4.18 100.00
---------------------+-----------------------------------
Total | 4,088 100.00
Any thoughts on how I can figure this out?

Comment