mlogit warns observations completely determined and Standard errors questionable.

arma ayat

Join Date: Jan 2020

Posts: 12
#1

mlogit warns observations completely determined and Standard errors questionable.

27 Jan 2020, 12:15

Hi Dears,
I'm running mlogit on data with 4807 transactions and 23 variables (contains 6 continuous and others are categorical). My mlogit warns doesn't converge("not concave"), but the loglikelihood doesn't change after 10 11 iteration. This problem is weird for me . I also run Multinomial logestic Regression in Rstudio software by package "multinom" and "mnlogit" , and their model converges. I compare theresults of Rstuido's packages with the results of Stata's mlogit ,then the loglikelihood and estimated coefficients of both software are the same, but in Stata mlogit warned "observations completely determined and Standard errors questionable", while Rstudio converged and give standard errors. I have no idea why this happens in Stata and how can I fix it? please help me to solve this problem, I need urgent help because my time is limited.

I get confused to trust which results and packages. Can I use the results of Stata? because its coefficients and loglikelihood are equal to Rstudio.
I attached my Stata codes and pictures of reults in Stata and Rstudio.

global ylist method
global xlist Age(continuous) hh_size(continuous) i.work(2 levels) i.marital(4 levels) i.gender(2 levels) i.educ(4 levels) i.region(8 levels) i.hh_income(7 levels) /*
*/ i.merch_accep_cash(2 levels) i.merch_accep_card(2 levels) i.spend /*
*/ i.importance_attribute_payment(8 levels) i.cc_hasbal(2 levels) i.cc_reward(2 levels) i.cc_ratio(2 levels) i.cc_revolver(2 levels) i.dc_free(2 levels) /*
end_cash_bal(continuous) log_Amount(continuous) EaseCC(continuous) EaseDC(continuous) CostCC(continuous) CostDC RecordCC RecordDC

mlogit $ylist $xlist [pw=ind_weight] , baseoutcome(1)

The "method" has three levels: Cash, Credit Card and Debit Card
Attached Files

Last edited by arma ayat; 27 Jan 2020, 12:19.
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

27 Jan 2020, 14:28

Arma,

Please read the FAQ. We ask people not to post screenshots of results, as they are frequently hard or impossible to read.

You should familiarize yourself with the issue of perfect prediction in regular logistic regression (i.e. for binary variables). Basically, if some combination of categorical covariates only has successes or failures, then the maximum likelihood estimate doesn't exist. For example, say you are trying to calculate the relative risk of (for example) heart failure by gender. Imagine that by sheer chance, 50% of the women and 100% of the men in your sample have heart failure. There is no maximum likelihood estimate for the effect of male gender on HF. It's a relative risk of infinity. In logistic regression, you have to drop cases where there's perfect prediction. The same is true of multinomial logistic regression, only the problem is possibly amplified as the outcome now has multiple categories.

If you examine the model coefficients from R, are any of them missing, or do they have missing standard errors? Can you compare the number of retained observations between the programs? Are any of the odds ratios suspiciously large? If both programs have the same coefficients and the same number of observations, then you have the same problem in both software packages. This article by Cook et al indicated that as of 2015, neither Stata nor R warned about perfect prediction in multinomial logit, and that researchers needed to be careful. (I'm not sure if they meant R in general, or just some R packages; they didn't state which R packages were used.)

R and Stata do use different criteria to declare convergence, so it's possible R may have declared convergence where Stata did not. In general, I'd refrain from altering Stata's convergence criteria for most regression models. If Stata refused to declare convergence, chances are there's a data problem. From the snippet of output that I can see, I do wonder if you manually limited the number of iterations without reporting in your syntax (e.g. add the , iterate(60) option).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
arma ayat

Join Date: Jan 2020

Posts: 12
#3

30 Jan 2020, 05:27

Thank so much for your help. I appreciate from your explanations.
I find in my dataset two variables "merch_accep_card(2 category)" and "merch_accep_cash(2 category)" have many value 1, that cause mlogit doesn't converge and error "observations completely determined". When I drop these two variables form independent variables, mlogit converges.
but I need to observe the effects of these two variables. how can I handle this problem? That is mlogit converges ,and also I can observe the the effects of these two variables
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

30 Jan 2020, 12:52

One answer is that you need a bigger sample so that you can get enough cases where either or both those variables are zero. You simply can’t estimate the effect of those variables being zero on the outcomes as you defined them. alternatively, you can use Bayesian estimation with a vague prior - all that this does, however, is that your estimated effects are going to be pretty close to the prior.

then again, you might want to think about why those variables are there. I would guess that you’re trying to estimate a person’s choice of payment (e.g. cash, card, etc), and your covariates include does the merchant accept cash, or do they accept cards. How may merchants don’t accept cards these days? It can’t be that many in a lot of developed countries. It may be even more so for cash. Do those variables make sense to include at all? Maybe they do, but maybe you want to think about this.

that paragraph doesn’t help you much, I guess. If you have a good reason to look at this question with those variables, then you may be stuck unless you collapse some outcome categories, or you collapse those two independent variables (e.g. accepts both, doesn’t accept one or the other).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

30 Jan 2020, 13:25

How may merchants don’t accept cards these days? It can’t be that many in a lot of developed countries.

While I agree with the thrust of what Weiwen Ng is sayin, and endorse his recommendations, this quote leads me to believe that Weiwen has never been to Philadelphia, PA.
Comment

Announcement

mlogit warns observations completely determined and Standard errors questionable.

Comment

Comment

Comment

Comment