Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test of Independence: Continuous IV; Categorical DV

    Hi,
    I have spent hours trawling through the deepest depths of Google trying to find a test that will do this for me. Eventually I found the test of significance contained within an OLOGIT output would work for the ordinal categorical DVs (it's measured in Z scores, but as I understand it, they aren't standard Z scores), but it's no good for the categorical DVs that are not ordinal.

    My data is based on a survery of behaviours. The IV is Age and the dependent variables include

    Ordinal- "How Frequently Do you Take Care of the Kids While your Partner is Sick?" (Categories: Never, sometimes, etc,)
    Non-Ordinal- " Who has the Final Say Regarding Spending Time with Relations?" (You, Partner, Both, etc)

    I've been stumped on this for weeks, I've never used a forum like this before (preferring to be a parasite searching older posts!) I really hope you can help!

    Cheers,
    Seán

  • #2
    You can do this with multinomial logistic regression which assumes no ordering of the categories. Do help mlogit in Stata.
    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

    Comment


    • #3
      Hi Dick,
      Thanks a lot for your speedy reply.

      Can you explain a little more about the test of significance done in a multinominal logit? Like the Ologit it's measured in Z-scores, because- I think- it assumes a normal distribution? But as far as I can make out it isn't a normal Z-test, is that correct?

      Is there any way to perform this test independently from the multinominal logit regression? All I want is the test of significance/independence, so it'd be great if there was a neater way to get that rather than having a whole regression output. This is more of an aesthetic point so no worries if not.

      Also, please note I mistakenly posted this topic twice as the first time the "server encountered an error" and I assumed it didn't post. The link to the other post is here:
      http://www.statalist.org/forums/foru...categorical-dv
      Feel free to post your reply on that page, it might keep things neater.

      Thanks for your help,
      Seán.

      Comment


      • #4
        Mlogit results are reported in the same way that simple logistic, two category outcome results are reported -- as increments to log odds or as odds ratios is you specify that option.in mlogit, Because you have >2 categories, one of them is chosen as the reference category and results are reported as contrasts to that category. The Stata help file and particularly the manual (to which you have access via the help file) explains this more completely and clearly than I have here.

        For your purposes, however, the reference category is irrelevant. All you need is the global chi-square test, which is the same regardless of the base category chosen. You can see that if you run the examples shown in the mlogit help file as shown below. The first model uses category 1 as the reference category and the second model uses category 2. In both cases, the likelihood ratio chi-square is 42.99. That statistic is a global test of whether any of the independent variables allow you to distinguish either of the categories included in the model from the base category. If that statistic is NS, you can conclude that your set of independent variables, be they categorical or continuous, is independent of the outcome.

        People are often confused by mlogit because to interpret the results you you have to keep track of what contrasts are being estimated and in what form they are being reported . The Stata output makes this pretty clear.

        Code:
        CODE]. webuse sysdsn1
        (Health insurance data)
        
        . numlabel insure, add
        
        . tab insure
        
              insure |      Freq.     Percent        Cum.
        -------------+-----------------------------------
        1. Indemnity |        294       47.73       47.73
          2. Prepaid |        277       44.97       92.69
         3. Uninsure |         45        7.31      100.00
        -------------+-----------------------------------
               Total |        616      100.00
        
        . mlogit insure age male nonwhite i.site
        
        Iteration 0:   log likelihood = -555.85446  
        Iteration 1:   log likelihood = -534.67443  
        Iteration 2:   log likelihood = -534.36284  
        Iteration 3:   log likelihood = -534.36165  
        Iteration 4:   log likelihood = -534.36165  
        
        Multinomial logistic regression                   Number of obs   =        615
                                                          LR chi2(10)     =      42.99
                                                          Prob > chi2     =     0.0000
        Log likelihood = -534.36165                       Pseudo R2       =     0.0387
        
        ------------------------------------------------------------------------------
              insure |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        1__Indemnity |  (base outcome)
        -------------+----------------------------------------------------------------
        2__Prepaid   |
                 age |   -.011745   .0061946    -1.90   0.058    -.0238862    .0003962
                male |   .5616934   .2027465     2.77   0.006     .1643175    .9590693
            nonwhite |   .9747768   .2363213     4.12   0.000     .5115955    1.437958
                     |
                site |
                  2  |   .1130359   .2101903     0.54   0.591    -.2989296    .5250013
                  3  |  -.5879879   .2279351    -2.58   0.010    -1.034733   -.1412433
                     |
               _cons |   .2697127   .3284422     0.82   0.412    -.3740222    .9134476
        -------------+----------------------------------------------------------------
        3__Uninsure  |
                 age |  -.0077961   .0114418    -0.68   0.496    -.0302217    .0146294
                male |   .4518496   .3674867     1.23   0.219     -.268411     1.17211
            nonwhite |   .2170589   .4256361     0.51   0.610    -.6171725     1.05129
                     |
                site |
                  2  |  -1.211563   .4705127    -2.57   0.010    -2.133751   -.2893747
                  3  |  -.2078123   .3662926    -0.57   0.570    -.9257327     .510108
                     |
               _cons |  -1.286943   .5923219    -2.17   0.030    -2.447872   -.1260134
        ------------------------------------------------------------------------------
        
        . mlogit insure age male nonwhite i.site, base(2)
        
        Iteration 0:   log likelihood = -555.85446  
        Iteration 1:   log likelihood = -534.67443  
        Iteration 2:   log likelihood = -534.36284  
        Iteration 3:   log likelihood = -534.36165  
        Iteration 4:   log likelihood = -534.36165  
        
        Multinomial logistic regression                   Number of obs   =        615
                                                          LR chi2(10)     =      42.99
                                                          Prob > chi2     =     0.0000
        Log likelihood = -534.36165                       Pseudo R2       =     0.0387
        
        ------------------------------------------------------------------------------
              insure |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        1__Indemnity |
                 age |    .011745   .0061946     1.90   0.058    -.0003962    .0238862
                male |  -.5616934   .2027465    -2.77   0.006    -.9590693   -.1643175
            nonwhite |  -.9747768   .2363213    -4.12   0.000    -1.437958   -.5115955
                     |
                site |
                  2  |  -.1130359   .2101903    -0.54   0.591    -.5250013    .2989296
                  3  |   .5879879   .2279351     2.58   0.010     .1412433    1.034733
                     |
               _cons |  -.2697127   .3284422    -0.82   0.412    -.9134476    .3740222
        -------------+----------------------------------------------------------------
        2__Prepaid   |  (base outcome)
        -------------+----------------------------------------------------------------
        3__Uninsure  |
                 age |   .0039489   .0115994     0.34   0.734    -.0187855    .0266832
                male |  -.1098438   .3651883    -0.30   0.764    -.8255998    .6059122
            nonwhite |  -.7577178   .4195759    -1.81   0.071    -1.580071    .0646357
                     |
                site |
                  2  |  -1.324599   .4697954    -2.82   0.005    -2.245381   -.4038165
                  3  |   .3801756   .3728188     1.02   0.308    -.3505358    1.110887
                     |
               _cons |  -1.556656   .5963286    -2.61   0.009    -2.725438    -.387873
        ------------------------------------------------------------------------------
        [/CODE]
        Richard T. Campbell
        Emeritus Professor of Biostatistics and Sociology
        University of Illinois at Chicago

        Comment


        • #5
          Hi Richard,
          Thanks for your reply. I understand what you have told me, however what you have written does not solve my problem exactly. Rather than look at the global Chi2 for the whole model, I am only interested in the significance of individual elements in the model.
          The regression here is a means to an endwhereby it allows me the only way I know of to these whether there is a significant relationship between a continuous IV & and categorical DV. Such is the case that I will be doing each regression seperately, just to keep the results clean and seperate in the output.

          Does what I plan make sense?

          Thanks,
          Seán

          Comment


          • #6
            It looks like you have two threads running simultaneously on this, one with Clyde and one with me. Yes, as Clyde says, you can do this one variable at a time, just as though you were doing multiple tests of independence using simple chi-square tests for categorical data. Many analysts would be uncomfortable with this. What you have when you are done is a set of essentially descriptive statistics because the p values for the various tests are not independent of each other and don't mean what you think they mean. With regard to Clyde's suggestion of doing it "backwards" via ANOVA, that would work, but if there is a clear dependent variable I wouldn't do it that way, although that judgment reflects my own particular history and tastes. .
            Richard T. Campbell
            Emeritus Professor of Biostatistics and Sociology
            University of Illinois at Chicago

            Comment


            • #7
              Hi Richard,
              Thanks for your input on this, I really didn't expect it to become so complicated so I do appreciate your patience. It seems neither approach is perfect, but also that ANOVA would be less unorthodox and easier to explain/stand over so I'll probably go with that. I think association will still be valuable, even without directionality.

              Appreciate your time on this,
              Seán

              Comment

              Working...
              X