Multinomial Logistic Regression - Interpretation Method

Anshul Anand

Join Date: May 2015

Posts: 113
#1

Multinomial Logistic Regression - Interpretation Method

08 Jun 2015, 09:00

Hey everyone! I have a question about which interpretation should be taken for research purposes. As in binary logistic regression with the command "logit y x1 x2 x3" we can interpret the the positive/negative sign as increasing/decreasing the relative probalitiy of being in y=1. According to a book in german "Datenanalyse mit Stata by Ulrich Kohler and Frauke Kreuter" this method can't be used for multinomial logistic regression. But this is the only book I have seen, which mentioned this. The reason according to them is, that in multinomial logistic regression a positive sign as example doesn't have to mean "increasing probability". Is that true? I am asking, because according to http://www.ats.ucla.edu/stat/stata/dae/mlogit.htm we can interpret the log odds as in binary logistic regression:

A one-unit increase in the variable write is associated with a .058 decrease in the relative log odds of being in general program vs. academic program
Tags: None
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#2

08 Jun 2015, 09:58

Hi Anshul,

That is indeed correct in the context of a multinomial logit. Intuitively, if you have three possible options (let's just say A, B, and C), and A is the reference group, then when calculating marginal effects, the change in probability associated with B is dependent on the change in probability associated with C. In other words, if the coefficient on some variable, X, is positive for both option B and C, then the probability of picking one of the two cannot be considered independently of the other! After taking derivatives, the actual 'formula' for the marginal effects of coefficient B in option C is: (BC)(PC)[1-(BB)(PB)], where BC is the coefficient for variable X in option C, PC is the starting probability for option C, BB is the coefficient for variable X in option B, and PB is the starting probability for option B. (This is easily extended to more than two choices, but it is easiest to explain with just the two.)

Hope this helps.

Josh
1 like
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#3

08 Jun 2015, 10:04

Thanks Joshua D Merfeld!! So that website http://www.ats.ucla.edu/stat/stata/dae/mlogit.htm is not correct. But If I would use odds ratio, can this be interpreted same as in binary logistic regression?
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#4

08 Jun 2015, 10:30

But I don't understand why books are giving wrong interpretations.... I am now reading the book "Microeconometrics Using Stata by A.Colin Cameron and Pravin K. Trivedy (2010) (revised edition)". on page 500 they write:

Thus ^b(j) can be viewed as parameters of a binarty logit model between alternative j and alternative 1 (the base). So a positive coefficient from mlogit means that as the regressor increases, we are more likely to choose alternative j than alternative 1. This interpretation will vary with the base category and is clearly most useful when there is a natural base category.

Do you think, I should take the interpretation of relative risk ratio? Is that good to interpret if we have also an interaction therm between two continous variable?
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#5

08 Jun 2015, 10:58

Hi Anshul,

In your second post just above (with the quote from Cameron and Trivedi), note that they are talking about a comparison between alternative j and alternative 1. This interpretation is correct. The same goes for the website you linked: they are talking about the relative odds of "being in general program vs. academic program", where academic program is the base outcome.

Have a look at some of the graphs using the "margins" on the UCLA webpage, specifically the very first graph below the line "Sometimes, a couple of plots can convey a good deal amount of information. Below, we plot the predicted probabilities against the writing score by the level of ses for different levels of the outcome variable." You will notice that the marginal effect of writing score on the probability of being in the general group is positive and then negative! (The marginal effect is the slope of the line, not the value of the line itself.) For SES=1, increasing writing scores increase the probability of being in the general group until a score of about 45~47, then increasing writing scores decrease the probability of being in the general group. And also note that the coefficient on writing score is positive.

Hope this helps,

Josh
2 likes
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#6

08 Jun 2015, 12:28

Now I am confused.. Because Cameron and Trivedi do say "we are more likely to choose alternative j than alternative 1" if there is a positive sign on the log odds (^b)which is similar for me with "increases the probalitity of being in alternative j relative to alternative 1". This is the same thing as in binary logistic regression. But if you say this is right, then it contradict Ulrich Kohler and Frauke Keuter where they say about log odds as in #1 mentioned

multinomial logistic regression a positive sign as example doesn't have to mean "increasing probability

For my research I have three category in the dependent variable, female_occupation male_occupation and mix_occupation where mix_occupation is the reference category where I have 3 categorical independent variable and 1 continuous (income; maybe the continuous would be changed with an interaction with age (also continuous)). So what exactly do I say if I use the mlogit command (log odds) if I am only interested if these variables have an effect. For example female occupation and mix occupation(reference category): If the independent variable x1 (categorical variable with category a b c; focus on c) has a positive sign and x2 (continuous variable) has a positive sign.

Sorry for asking again, I didn't get it really...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#7

08 Jun 2015, 12:36

Josh Merfeld has given you two posts explaining why multinomial regression is different. I can't think of a way to explain it more clearly than he has. But perhaps the best thing is to move from the realm of theory to your data. Why don't you calculate the predicted probabilities of each of your three outcomes at various levels of your predictor variables, and just look at them so you can see what is going in in your data? The -margins- command will do this for you quickly and easily. Don't forget to include a predict option for each of your outcome levels in the command. And if you want to directly see the marginal effects, there is -margins, dydx()-.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5016
#8

08 Jun 2015, 12:56

Don't forget to include a predict option for each of your outcome levels in the command.

Just as a sidelight, if you have Stata 14 you don't need to include a predict option for each outcome. That happens by default in Stata 14.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#9

08 Jun 2015, 13:16

Just tried it, and Richard is right. That is now the default behavior in Stata 14. Makes a great command even better! Interestingly, the help file for -mlogit postestimation margins- does not say that, and in fact it even gives an example with multiple -predict()- options specified.
1 like
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#10

08 Jun 2015, 15:30

Anshul,

When comparing some category to the base outcome, a positive coefficient always means you are more likely to be in the comparison category than in the base category. However, when talking about the probability of being in a given category, the marginal effect (that is, the effect on the probability of being in a specific group by increasing the independent variable by one) does not have to be positive. This is because, in the former example, we are only comparing the two categories to one another. On the other hand, in the latter example, we are talking about comparing the probability of being in the comparison category not just to the base outcome (which will always be higher if the coefficient is positive) but also to the probability of being in all the other comparison groups.

Let's assume we have gropus A, B, and C. A is the base outcome. If a coefficient on variable X is positive for group B, then increasing X will always make you more likely to be in B than A. However, increasing X does not necessarily make you more likely to be in group B, because the effect on the probability of being in group C may be even greater than the probability of being in group B, which means the marginal effect of X may actually be negative for group B (because you are more likely at this point to be in group C), but you are still more likely to be in group B than group A.
1 like
Comment

Anshul Anand

Join Date: May 2015
Posts: 113

#11

08 Jun 2015, 15:44

firstly, thank you very much to you all for helping me! So, I did my multinomial logistic regression and the output is (reference for occupational status is "normal worker" and for level of employment "full-time"):

Code:

mlogit occupation i.occupational_status i.flexible_working_time i.level_of_employment wage_per_hour, base(3)

Iteration 0:   log likelihood = -26971.582  
Iteration 1:   log likelihood = -24004.907  
Iteration 2:   log likelihood = -23896.055  
Iteration 3:   log likelihood = -23895.044  
Iteration 4:   log likelihood = -23895.043  

Multinomial logistic regression                   Number of obs   =    24552
LR chi2(12)     =    6153.08
Prob > chi2     =    0.0000
Log likelihood = -23895.043                       Pseudo R2       =    0.1141

            
occupation                          Coef.          Std. Err.      z        P>z       [95% Conf.    Interval]
            
female_occupation                                   
occupational_status
2.boss                               -.1784459    .0408285    -4.37    0.000    -.2584683    -.0984235
3.decision making body      -.1117161    .0642268    -1.74    0.082    -.2375983    .014166
                                    
flexible_working_time
1.yes                                -.6404859    .0361008   -17.74    0.000    -.7112421    -.5697297
                                    
level_of_employment
2. 50-89%                          .7454996    .0388484    19.19    0.000    .6693581    .821641
3. < 50%                            .9684125   .0525276    18.44    0.000    .8654603    1.071365
                                    
wage_per_hour                  -.0256214    .001027     -24.95    0.000    -.0276342    -.0236085
_cons                                 1.118981   .0486696    22.99    0.000    1.023591    1.214372
            
male_occupation                                   
occupational_status
2.boss                               .0576462   .0371688     1.55    0.121    -.0152033    .1304956
3.decision making body     -.0528364   .0549858     -0.96    0.337    -.1606065    .0549337
                                    
flexible_working_time
1.yes                               -.2255912   .0346991    -6.50    0.000    -.2936003    -.1575822
                                    
level_of_employment
2. 50-89%                         -1.545974   .0530296   -29.15    0.000    -1.64991    -1.442038
3. < 50%                           -1.906788   .0912916   -20.89    0.000    -2.085716    -1.72786
                                    
wage_per_hour                  -.0081834   .00088       -9.30     0.000    -.0099082    -.0064585
_cons                                .8206965   .0440631   18.63     0.000    .7343344    .9070587
            
mix_occupation                                       (base outcome)

Now I did margins as you recommended, I did it for level of employment now, the first one is female occupation, the second is male occupation and the third is mix occupation:

Code:

margins level_of_employment, atmeans predict (outcome(female_occupation))
margins level_of_employment, atmeans predict (outcome(male_occupation))
margins level_of_employment, atmeans predict (outcome(mix_occupation))

Code:

                          
            Delta-method
                                              Margin         Std. Err.     z         P>z      [95% Conf.    Interval]
                            
    level_of_employment  
1 .full_time                              .2181253    .0033033    66.03    0.000    .211651    .2245997
2. 50-89%                               .5228222    .0073726    70.91    0.000    .5083722    .5372722
3. < 50%                                .5957369    .0107463    55.44    0.000    .5746746    .6167992


                          
            Delta-method
                                              Margin      Std. Err.      z           P>z    [95% Conf.    Interval]
                            
    level_of_employment  
1. full_time                             .4604296    .00395       116.56    0.000    .4526878    .4681715
2. 50-89%                              .1115929    .0045611    24.47     0.000    .1026534    .1205324
3. < 50%                                .0709297    .0054996    12.90     0.000    .0601507    .0817086
                            

                          
            Delta-method
                                              Margin       Std. Err.        z        P>z     [95% Conf.    Interval]
                            
    level_of_employment  
1. full_time                              .321445      .0037423    85.90    0.000    .3141103    .3287797
2. 50-89%                               .3655849    .0071116    51.41     0.000    .3516464    .3795233
3. < 50%                                .3333334    .0103956    32.06     0.000    .3129583    .3537085

Now to the interpretation:

1. according to the log odds interpretaion:
Being in a female occupation is more likely relative to the mix occupation if worker has a level of employment of 50-89% (same for <50%)
==> because of the positive sign of the log odds

Being in a male occupation is less likely relative to the mix occupation if worker has a level of employment of 50-89% (same for <50%).
==> because of the negative sign of the log odds

2. according to margins:
The average probability of being in a female occupation with a level of employment with 50-89% is 52.28%

The average probability of being in a male occupation with a level of employment with 50-89% is 11.16%

The average probability of being in a mix occupation with a level of employment with 50-89& is 36.56%

==> because the effects in mlogit is significant (high significant), we can do a comparison between all the three occupation, wheras in the 1. there can only be a comparison to the reference category "mix occupation". So, I can conclude here: female occupation has a higher average probability than men occupation and mix occupation; mix occupation has a higher average probability than men occupation if a worker has 50-89% level of employment.

My thought

a) If we have categorical variable in the multinomial logistic regression, I can be sure that the sign of the log odds says: positive sign = higher probability and negative sign = lower probability. So I don't need margins?

b) margins can be useful for continuous independent variable? I couldn't check it because if I do

Code:

 margins wage_per_hour, atmeans predict (outcome(1))

it says "wage_per_hour not found in list of covariates".

c) margins can be useful for interaction term? especially with continuous and continuous interaction term?

Sorry for the bad output, but I tried to change that so it looks good, but it didn't work... I really apologize for that!

Last edited by Anshul Anand; 08 Jun 2015, 16:04.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#12

08 Jun 2015, 17:18

My thought

a) If we have categorical variable in the multinomial logistic regression, I can be sure that the sign of the log odds says: positive sign = higher probability and negative sign = lower probability.

That is not correct reasoning. In your data it happens to work out that way, looking at the -margins-. But without the -margins- results you would not be able to make this statement because in general, the association between sign of the coefficient and change in outcome probability does not hold. Please re-read Joshua Merfeld's carefully crafted explanations.

b) margins can be useful for continuous independent variable?

Yes it can be used for continuous independent variables, but the syntax is different. You have to select particular values of the variable (I don't know what representative values of wage_per_hour would be in this data--that's up to you.)) Just for the sake of an example, let's say that interesting low, medium, and high values of wages per hour are 15, 30, and 60. Then you would code this as:

Code:

margins, at(wage_per_hour = (15 30 60)) atmeans predict(outcome(1)) predict(outcome(2)) /// predict(outcome(3)) // YOU CAN LEAVE OFF THE -predict()- OPTIONS IF YOU ARE RUNNING VERSION 14

c) margins can be useful for interaction term? especially with continuous and continuous interaction term?

Yes, you have to specify the values you want to test at for both of the continuous variables using at() options. But I don't see any interaction variables at all in your model, of any kind. If you plan to add some variables to your model, and also want to include an interaction term, say continuous variables x1 and x2, be sure to enter that in the regression command as c.x1##c.x2. The c.'s are needed to prevent Stata from treating x1 and x2 as discrete. And by using the ## notation Stata will also automatically enter the main effects of x1 and x2 in the model as well. If you do this in your command, the -margins- command will do everything right when you get margins for the various combinations of x1 and x2 you specify. [What you should emphatically not do is generate your own interaction variable (e.g. -gen x1x2 = x1*x2) and enter that: Stata would not know that that's an interaction term, and the margins would be computed incorrectly.]
2 likes
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#13

08 Jun 2015, 17:28

You'll notice that the sum of the three coefficients for full-time (.218+.460+.321), for 50-89% (.523+.112+.366), and for less than 50% (.596+.071+.333) each equal one! That is, you have calculated the predicted probability of being in each group at the means of all the other variables for full-time==1, 50-89%==1, and <50%==1, respectively. You have not calculated the MARGINAL effect of being full-time, 50-89%, etc. There is no way for the probability of being in a group to be negative.

If I remember my margins commands correctly (which may be doubtful!), if you want to compute the MARGINAL effect, as opposed to the predicted probabilities, you want to use -margins, dydx(wage_per_hour) atmeans predict(outcome(#))-. This will give you the marginal effect of an increase in one of wage_per_hour from its mean, with all the other variables held constant at their means, for outcome #. This one can be a different sign from its coefficient.

On a slight aside, also note that if you change the base outcome, the coefficients can switch signs, as well, so this should help your intuition regarding why the signs of the coefficients themselves are not always interesting in a multinomial logit, but the marginal effects usually are.
1 like
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#14

09 Jun 2015, 06:29

Thank you very much for your answers! Now I understand it much better.But there is one thing: because I have three categorical independent variable, I can't talk about "increasing X". So for categorical independent variable I use [/CODE] margins X, atmeans predict (outcome(#)) [/CODE] and for continuous independent variable I use [/CODE] margins, dydx (X) atmeans predict (outcome(#)) so I can do comments about all three occupations?
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#15

09 Jun 2015, 07:04

Again, if you are looking for marginal effects, then dydx(X) is the way to get them. You can do it with both continuous and categorical variables, which enter the regression as a set of mutually exclusive dummy variables. For example:

Code:

margins, dydx(level_of_employment wage_per_hours) predict(outcome(1))

should give you the marginal effects of both level_of_employment and wage_per_hours for outcome 1.
Comment

Announcement