Multinomial Logit when dependent variable can be positive, negative, or zero

Diego Gomes

Join Date: Oct 2018
Posts: 5

Multinomial Logit when dependent variable can be positive, negative, or zero

24 Oct 2018, 22:41

Hi guys,

I have a categorical variable with four options: (i) no planning, (ii) only health planning, (iii) only financial planning, and (iv) both plannings. I want to run a multinomial logit model using wealth as an independent variable. Wealth in my data is measured as assets minus debts, so it can be positive, negative, or zero. I then use the following command:

Code:

. mlogit genplan wealth if (year == 2012) [pweight=rwtresp], baseoutcome(1)

Multinomial logistic regression                   Number of obs   =       9601
                                                  Wald chi2(3)    =     131.26
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood =  -45455497                 Pseudo R2       =     0.0494

--------------------------------------------------------------------------------
               |               Robust
       genplan |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
No_Planning    |  (base outcome)
---------------+----------------------------------------------------------------
Only_Health    |
        wealth |  -7.17e-07   3.48e-07    -2.06   0.039    -1.40e-06   -3.55e-08
         _cons |  -1.112479    .076825   -14.48   0.000    -1.263053   -.9619045
---------------+----------------------------------------------------------------
Only_Financial |
        wealth |   1.73e-06   1.75e-07     9.93   0.000     1.39e-06    2.08e-06
         _cons |  -1.018951   .0575423   -17.71   0.000    -1.131732   -.9061704
---------------+----------------------------------------------------------------
Both_Plannings |
        wealth |   1.73e-06   1.74e-07     9.94   0.000     1.39e-06    2.07e-06
         _cons |   .0092796   .0505876     0.18   0.854    -.0898703    .1084296
--------------------------------------------------------------------------------

As you can see, the coefficients are zero. By this result, my understanding is that wealth has almost no effect.

However, I then run a second regression using a categorical variable for wealth, where I group wealth into four categories based on the quartiles of the distribution. Here are the results:

Code:

. mlogit genplan i.q4wealth if (year == 2012) [pweight=rwtresp], baseoutcome (1)

Multinomial logistic regression                   Number of obs   =       9601
                                                  Wald chi2(9)    =    1032.85
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood =  -44175787                 Pseudo R2       =     0.0762

--------------------------------------------------------------------------------
               |               Robust
       genplan |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
No_Planning    |  (base outcome)
---------------+----------------------------------------------------------------
Only_Health    |
      q4wealth |
 2nd Quartile  |  -.1316657   .1147428    -1.15   0.251    -.3565574     .093226
 3rd Quartile  |  -.4551799   .1646513    -2.76   0.006    -.7778905   -.1324693
 4th Quartile  |  -.1941912   .2018469    -0.96   0.336     -.589804    .2014215
               |
         _cons |  -1.129302   .0685136   -16.48   0.000    -1.263586   -.9950174
---------------+----------------------------------------------------------------
Only_Financial |
      q4wealth |
 2nd Quartile  |   1.141276   .1107899    10.30   0.000     .9241312     1.35842
 3rd Quartile  |   1.777783    .116215    15.30   0.000     1.550006    2.005561
 4th Quartile  |   2.314921   .1310731    17.66   0.000     2.058022    2.571819
               |
         _cons |  -1.593136   .0850319   -18.74   0.000    -1.759795   -1.426476
---------------+----------------------------------------------------------------
Both_Plannings |
      q4wealth |
 2nd Quartile  |   .9827859   .0802326    12.25   0.000     .8255329    1.140039
 3rd Quartile  |   1.727817    .088295    19.57   0.000     1.554762    1.900872
 4th Quartile  |   2.604634   .1040449    25.03   0.000      2.40071    2.808559
               |
         _cons |   -.604249   .0567623   -10.65   0.000    -.7155011   -.4929969
--------------------------------------------------------------------------------

Now the results are quite different. It is clear that the wealthier groups are more likely to do financial or both plannings. This result is reasonable and expected in my opinion.

Does anybody know why the results are so different in both regressions? Am I missing something?

Thanks!

Last edited by Diego Gomes; 24 Oct 2018, 22:43. Reason: Adding tags

Tags: categorical, logit, multinomial

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#2

24 Oct 2018, 23:32

Originally posted by Diego Gomes View Post

As you can see, the coefficients are zero.

No they're not. Divide your wealth variable by 10⁷, refit your model and report back to the list.
1 like
Comment

Diego Gomes

Join Date: Oct 2018
Posts: 5

25 Oct 2018, 10:19

Hi Joseph,

Thanks for the reply. I created the wealth2 variable. Here is the result of your request:

Code:

. mlogit genplan wealth2 if (year == 2012) [pweight=rwtresp], baseoutcome(1)

Multinomial logistic regression                   Number of obs   =       9601
                                                  Wald chi2(3)    =     131.26
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood =  -45455497                 Pseudo R2       =     0.0494

--------------------------------------------------------------------------------
               |               Robust
       genplan |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
No_Planning    |  (base outcome)
---------------+----------------------------------------------------------------
Only_Health    |
       wealth2 |  -7.172241   3.478485    -2.06   0.039    -13.98995   -.3545361
         _cons |  -1.112479    .076825   -14.48   0.000    -1.263053   -.9619045
---------------+----------------------------------------------------------------
Only_Financial |
       wealth2 |   17.34878   1.746351     9.93   0.000     13.92599    20.77156
         _cons |  -1.018951   .0575423   -17.71   0.000    -1.131732   -.9061704
---------------+----------------------------------------------------------------
Both_Plannings |
       wealth2 |   17.29581   1.740808     9.94   0.000     13.88389    20.70773
         _cons |   .0092796   .0505876     0.18   0.854    -.0898703    .1084296
--------------------------------------------------------------------------------

Now the coefficients are reasonable to me. Do you think is a good practice to normalize continuous variables before running logits (like subtracting the mean)?

Thanks!

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#4

25 Oct 2018, 20:02

Originally posted by Diego Gomes View Post

Do you think is a good practice to normalize continuous variables before running logits (like subtracting the mean)?

Mean centering of predictors is topical: see this recent thread for some advice from list members.
Comment
Diego Gomes

Join Date: Oct 2018

Posts: 5
#5

25 Oct 2018, 21:09

Thank you! I'll have a look. If I have any new question about it I reply to you.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

26 Oct 2018, 13:27

Originally posted by Diego Gomes View Post

Hi Joseph,

Thanks for the reply. I created the wealth2 variable. Here is the result of your request:

Code:

. mlogit genplan wealth2 if (year == 2012) [pweight=rwtresp], baseoutcome(1) Multinomial logistic regression Number of obs = 9601 Wald chi2(3) = 131.26 Prob > chi2 = 0.0000 Log pseudolikelihood = -45455497 Pseudo R2 = 0.0494 -------------------------------------------------------------------------------- | Robust genplan | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- No_Planning | (base outcome) ---------------+---------------------------------------------------------------- Only_Health | wealth2 | -7.172241 3.478485 -2.06 0.039 -13.98995 -.3545361 _cons | -1.112479 .076825 -14.48 0.000 -1.263053 -.9619045 ---------------+---------------------------------------------------------------- Only_Financial | wealth2 | 17.34878 1.746351 9.93 0.000 13.92599 20.77156 _cons | -1.018951 .0575423 -17.71 0.000 -1.131732 -.9061704 ---------------+---------------------------------------------------------------- Both_Plannings | wealth2 | 17.29581 1.740808 9.94 0.000 13.88389 20.70773 _cons | .0092796 .0505876 0.18 0.854 -.0898703 .1084296 --------------------------------------------------------------------------------

Now the coefficients are reasonable to me. Do you think is a good practice to normalize continuous variables before running logits (like subtracting the mean)?

Thanks!

I agree with Joseph that the wisdom and acceptability of mean centering is something that varies by topic. I'd just add that you want wealth to be on some sort of reasonable scale. I assume wealth was denominated in dollars originally. So, each additional dollar of net wealth was associated with a -7.17e-07 lower log odds of engaging in only health planning (relative to no planning). That's a bit hard to interpret. Joseph had you redenominate wealth in units of $1^e7 dollars, which I think is ten million dollars. Maybe that's also a bit odd! However, it did clearly demonstrate that the coefficients on the effects of wealth treated continuously were not zero. Mean centering may be irrelevant for this problem if there isn't a natural and easily understood value of mean wealth. If the distribution of wealth is very right skewed, then most people don't have anywhere near the mean level of wealth.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Diego Gomes

Join Date: Oct 2018

Posts: 5
#7

26 Oct 2018, 13:33

Thanks, Weiwen!

Yes, the distribution is very skewed, and I agree to your point. I was planning just to put the wealth variable on a different scale (like dividing by 100,000). Do you have any other suggestion?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

26 Oct 2018, 13:40

Originally posted by Diego Gomes View Post

Thanks, Weiwen!

Yes, the distribution is very skewed, and I agree to your point. I was planning just to put the wealth variable on a different scale (like dividing by 100,000). Do you have any other suggestion?

Not really, this isn't my field. I would probably choose some sort of sensible increment of wealth. $100k seems sensible enough to me. Also, the -margins- command can really help you interpret the log odds.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Multinomial Logit when dependent variable can be positive, negative, or zero

Comment

Comment

Comment

Comment

Comment

Comment

Comment