Out-of-Sample Prediction using Multinomial Logit Model

Minchul Park

Join Date: Jun 2019

Posts: 58
#1

Out-of-Sample Prediction using Multinomial Logit Model

20 Nov 2020, 00:19

Hi I have a question about the out-of-sample prediction using multinomial logit model.

<Situation>
I have a cross-section data set including eleven countries.
Let the dependent variable be y (a nominal variable having three vaelus, 1, 2, and 3), and suppose that we only have two regressors, x1 and x2.

<What I want to show>
The response pattern of a choice probability of the alternative one corresponding to the changes in x1 is different in the 11th country

<What I have done>
First, I ran the multinomial logit regression using only the first ten countries like below (For simplicity, I do not write the details of options)

Code:

mlogit y x1 x2 if 1 <= country_id & country_id =< 10

Second, I predict the choice probability of the alternative one in the 11th country using the previous estimate for each value of x1 (from 0 to 5)

Code:

keep if country_id == 11 margins, at(x1=(0(1)5)) predict(outcome(1))

Third, I kept the sample means of a dummy of the alternative one

Code:

gen y1 = (y == 1) summarize y1 if x1 == 0 summarize y1 if x1 == 1 . . summarize y1 if x1 == 5

Finally, I compared the predicted choice probability of alternative one to the sample means at each value of x1

The results show that the predicted choice probabilities are pretty different from the sample means at each x1 value.
Here, can I conclude that the response of choice probability are different in the 11th country? (Since we cannot well predict the choice probability in 11th country by using the other countries)

For simplicity, my explanation could be too simple.
I am sorry if you cannot understand what I'm saying.

Thank you for your time spending to read this question.

Last edited by Minchul Park; 20 Nov 2020, 00:23.
Tags: predict margins mlogit
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#2

20 Nov 2020, 01:18

Originally posted by Minchul Park View Post

can I conclude that the response of choice probability are different in the 11th country? (Since we cannot well predict the choice probability in 11th country by using the other countries)

Either that or that the eleventh country's choice probabilities don't actually differ, but rather your model doesn't include one or more important explanatory variables that if included would have enabled the fitted model to predict the eleventh country's choices well.
Comment
Minchul Park

Join Date: Jun 2019

Posts: 58
#3

20 Nov 2020, 04:49

Joseph Coveney Thank you for your response! the answer is pretty informative. Thank you again.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#4

20 Nov 2020, 08:13

I have a question for you.

What you went through above seems like a lot of work to show that the regression coefficients for the two predictors for all of the other ten countries don't predict the choice profile for the eleventh.

My question: could you accomplish the same objective (or an equivalent objective) in the following manner?

Code:

mlogit y c.x2 ib11.country_id##i.x1

and scanning down the list of coefficients (and their standard errors) for the interaction terms, which by virtue of Stata's factor notation are the differences (log-odds ratios) from country eleven's intercept coefficient for each of the nonredundant outcome choices.

Maybe even compute a test statistic as a summary of the lack of generalization to country eleven from the other ten, using something like

Code:

testparm i.country_id#i.x1

immediately after fitting the multinomial logistic regression model above.
Comment
Minchul Park

Join Date: Jun 2019

Posts: 58
#5

20 Nov 2020, 08:40

Joseph Coveney Your advice is pretty clever. I did not think about this approach. I ran the regression and I found a statistically significant differences. Nevertheless, I faced a problem that some variables are omitted. If I solve this problem, this approach is what I was finding out. Thank you very much.
Comment
Minchul Park

Join Date: Jun 2019

Posts: 58
#6

20 Nov 2020, 08:47

Joseph Coveney I want to ask an additional question. How do you think about the code below

Code:

gen d11 = (country_id == 11) mlogit y i.x1 c.x2 d11#i.x1

That is, we just assume that the other ten countries have same coefficient.
Of course, the code you recommended is more general. But, I am facing some omission problem.

Thank you.

Last edited by Minchul Park; 20 Nov 2020, 08:55.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#7

20 Nov 2020, 23:06

I think that you mean

Code:

mlogit y i.d11 i.x1 c.x2 i.d11#i.x1

or more succinctly

Code:

mlogit y c.x2 i.d11##i.x1

That you have an omission problem would give me pause. I would wonder whether the dataset is adequate to assess similarity of profile of choices between countries across levels of x1.
Comment

Announcement

Out-of-Sample Prediction using Multinomial Logit Model

Comment

Comment

Comment

Comment

Comment

Comment