Test of Independence: Continuous IV; Categorical DV

Seán McKiernan

Join Date: Jan 2015

Posts: 11
#1

Test of Independence: Continuous IV; Categorical DV

23 Jan 2015, 12:57

Hi,
I have spent hours trawling through the deepest depths of Google trying to find a test that will do this for me. Eventually I found the test of significance contained within an OLOGIT output would work for the ordinal categorical DVs (it's measured in Z scores, but as I understand it, they aren't standard Z scores), but it's no good for the categorical DVs that are not ordinal.

My data is based on a survery of behaviours. The IV is Age and the dependent variables include

Ordinal- "How Frequently Do you Take Care of the Kids While your Partner is Sick?" (Categories: Never, sometimes, etc,)
Non-Ordinal- " Who has the Final Say Regarding Spending Time with Relations?" (You, Partner, Both, etc)

I've been stumped on this for weeks, I've never used a forum like this before (preferring to be a parasite searching older posts!) I really hope you can help!

Cheers,
Seán
Tags: None
Dick Campbell

Join Date: Apr 2014

Posts: 279
#2

23 Jan 2015, 13:08

You can do this with multinomial logistic regression which assumes no ordering of the categories. Do help mlogit in Stata.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment
Seán McKiernan

Join Date: Jan 2015

Posts: 11
#3

24 Jan 2015, 17:37

Hi Dick,
Thanks a lot for your speedy reply.

Can you explain a little more about the test of significance done in a multinominal logit? Like the Ologit it's measured in Z-scores, because- I think- it assumes a normal distribution? But as far as I can make out it isn't a normal Z-test, is that correct?

Is there any way to perform this test independently from the multinominal logit regression? All I want is the test of significance/independence, so it'd be great if there was a neater way to get that rather than having a whole regression output. This is more of an aesthetic point so no worries if not.

Also, please note I mistakenly posted this topic twice as the first time the "server encountered an error" and I assumed it didn't post. The link to the other post is here:
http://www.statalist.org/forums/foru...categorical-dv
Feel free to post your reply on that page, it might keep things neater.

Thanks for your help,
Seán.
Comment

Dick Campbell

Join Date: Apr 2014
Posts: 279

25 Jan 2015, 13:41

Mlogit results are reported in the same way that simple logistic, two category outcome results are reported -- as increments to log odds or as odds ratios is you specify that option.in mlogit, Because you have >2 categories, one of them is chosen as the reference category and results are reported as contrasts to that category. The Stata help file and particularly the manual (to which you have access via the help file) explains this more completely and clearly than I have here.

For your purposes, however, the reference category is irrelevant. All you need is the global chi-square test, which is the same regardless of the base category chosen. You can see that if you run the examples shown in the mlogit help file as shown below. The first model uses category 1 as the reference category and the second model uses category 2. In both cases, the likelihood ratio chi-square is 42.99. That statistic is a global test of whether any of the independent variables allow you to distinguish either of the categories included in the model from the base category. If that statistic is NS, you can conclude that your set of independent variables, be they categorical or continuous, is independent of the outcome.

People are often confused by mlogit because to interpret the results you you have to keep track of what contrasts are being estimated and in what form they are being reported . The Stata output makes this pretty clear.

Code:

CODE]. webuse sysdsn1
(Health insurance data)

. numlabel insure, add

. tab insure

      insure |      Freq.     Percent        Cum.
-------------+-----------------------------------
1. Indemnity |        294       47.73       47.73
  2. Prepaid |        277       44.97       92.69
 3. Uninsure |         45        7.31      100.00
-------------+-----------------------------------
       Total |        616      100.00

. mlogit insure age male nonwhite i.site

Iteration 0:   log likelihood = -555.85446  
Iteration 1:   log likelihood = -534.67443  
Iteration 2:   log likelihood = -534.36284  
Iteration 3:   log likelihood = -534.36165  
Iteration 4:   log likelihood = -534.36165  

Multinomial logistic regression                   Number of obs   =        615
                                                  LR chi2(10)     =      42.99
                                                  Prob > chi2     =     0.0000
Log likelihood = -534.36165                       Pseudo R2       =     0.0387

------------------------------------------------------------------------------
      insure |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1__Indemnity |  (base outcome)
-------------+----------------------------------------------------------------
2__Prepaid   |
         age |   -.011745   .0061946    -1.90   0.058    -.0238862    .0003962
        male |   .5616934   .2027465     2.77   0.006     .1643175    .9590693
    nonwhite |   .9747768   .2363213     4.12   0.000     .5115955    1.437958
             |
        site |
          2  |   .1130359   .2101903     0.54   0.591    -.2989296    .5250013
          3  |  -.5879879   .2279351    -2.58   0.010    -1.034733   -.1412433
             |
       _cons |   .2697127   .3284422     0.82   0.412    -.3740222    .9134476
-------------+----------------------------------------------------------------
3__Uninsure  |
         age |  -.0077961   .0114418    -0.68   0.496    -.0302217    .0146294
        male |   .4518496   .3674867     1.23   0.219     -.268411     1.17211
    nonwhite |   .2170589   .4256361     0.51   0.610    -.6171725     1.05129
             |
        site |
          2  |  -1.211563   .4705127    -2.57   0.010    -2.133751   -.2893747
          3  |  -.2078123   .3662926    -0.57   0.570    -.9257327     .510108
             |
       _cons |  -1.286943   .5923219    -2.17   0.030    -2.447872   -.1260134
------------------------------------------------------------------------------

. mlogit insure age male nonwhite i.site, base(2)

Iteration 0:   log likelihood = -555.85446  
Iteration 1:   log likelihood = -534.67443  
Iteration 2:   log likelihood = -534.36284  
Iteration 3:   log likelihood = -534.36165  
Iteration 4:   log likelihood = -534.36165  

Multinomial logistic regression                   Number of obs   =        615
                                                  LR chi2(10)     =      42.99
                                                  Prob > chi2     =     0.0000
Log likelihood = -534.36165                       Pseudo R2       =     0.0387

------------------------------------------------------------------------------
      insure |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1__Indemnity |
         age |    .011745   .0061946     1.90   0.058    -.0003962    .0238862
        male |  -.5616934   .2027465    -2.77   0.006    -.9590693   -.1643175
    nonwhite |  -.9747768   .2363213    -4.12   0.000    -1.437958   -.5115955
             |
        site |
          2  |  -.1130359   .2101903    -0.54   0.591    -.5250013    .2989296
          3  |   .5879879   .2279351     2.58   0.010     .1412433    1.034733
             |
       _cons |  -.2697127   .3284422    -0.82   0.412    -.9134476    .3740222
-------------+----------------------------------------------------------------
2__Prepaid   |  (base outcome)
-------------+----------------------------------------------------------------
3__Uninsure  |
         age |   .0039489   .0115994     0.34   0.734    -.0187855    .0266832
        male |  -.1098438   .3651883    -0.30   0.764    -.8255998    .6059122
    nonwhite |  -.7577178   .4195759    -1.81   0.071    -1.580071    .0646357
             |
        site |
          2  |  -1.324599   .4697954    -2.82   0.005    -2.245381   -.4038165
          3  |   .3801756   .3728188     1.02   0.308    -.3505358    1.110887
             |
       _cons |  -1.556656   .5963286    -2.61   0.009    -2.725438    -.387873
------------------------------------------------------------------------------

[/CODE]

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago

Comment

Seán McKiernan

Join Date: Jan 2015

Posts: 11
#5

28 Jan 2015, 12:44

Hi Richard,
Thanks for your reply. I understand what you have told me, however what you have written does not solve my problem exactly. Rather than look at the global Chi2 for the whole model, I am only interested in the significance of individual elements in the model.
The regression here is a means to an endwhereby it allows me the only way I know of to these whether there is a significant relationship between a continuous IV & and categorical DV. Such is the case that I will be doing each regression seperately, just to keep the results clean and seperate in the output.

Does what I plan make sense?

Thanks,
Seán
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#6

28 Jan 2015, 15:58

It looks like you have two threads running simultaneously on this, one with Clyde and one with me. Yes, as Clyde says, you can do this one variable at a time, just as though you were doing multiple tests of independence using simple chi-square tests for categorical data. Many analysts would be uncomfortable with this. What you have when you are done is a set of essentially descriptive statistics because the p values for the various tests are not independent of each other and don't mean what you think they mean. With regard to Clyde's suggestion of doing it "backwards" via ANOVA, that would work, but if there is a clear dependent variable I wouldn't do it that way, although that judgment reflects my own particular history and tastes. .

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment
Seán McKiernan

Join Date: Jan 2015

Posts: 11
#7

28 Jan 2015, 16:24

Hi Richard,
Thanks for your input on this, I really didn't expect it to become so complicated so I do appreciate your patience. It seems neither approach is perfect, but also that ANOVA would be less unorthodox and easier to explain/stand over so I'll probably go with that. I think association will still be valuable, even without directionality.

Appreciate your time on this,
Seán
Comment

Announcement