Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Wald test to determine whether categorical x categorical interaction is needed in binary logit with cluster robust VCE

    Hello,

    This question is related to another question I found on this forum.

    Like the author of that question, I am also testing two binary logistic regression models, both of which have vce(cluster) specified to account for intraclass clustering at the classroom level. In my case, I am trying to determine whether or not I should include a categorical x categorical interaction of the two categorical independent variables in my model. One of my independent variables (Phase) has three levels (0,1,2) ; the other (Semester) has two levels (0,1).

    I can't use a likelihood ratio chi2 test (--lrtest--) to test the significance of the interaction because of the clustering and so am trying to follow the advice which says to use a Wald test. However, I'm not sure whether I have run and interpreted the Wald test correctly since, if I'm not mistaken, my code is not really testing the overall significance of the interaction in the model but only the difference relative to the base (reference) level.

    Code:
    . logit DFW  i.PHASE_  i.SEMESTER_, vce(cluster STRM_SECT) allbase nolog or
    
    . estimates store a
    
    . logit DFW  i.PHASE_  i.SEMESTER_  PHASE_#SEMESTER_, vce(cluster STRM_SECT) allbase nolog or
    
    . test a
    a not found
    r(111);
    
    . test 1.PHASE_#1.SEMESTER_ 2.PHASE_#1.SEMESTER_
    
     ( 1)  [DFW]1.PHASE_#1.SEMESTER_ = 0
     ( 2)  [DFW]2.PHASE_#1.SEMESTER_ = 0
    
               chi2(  2) =    0.41
             Prob > chi2 =    0.8165
    I see that the Wald chi2 for the model with the interaction increased: (Wald chi2[5] = 49.58) versus (Wald chi2[3] = 20.15. The pseudo R2 also increased (although it's still quite low): 0.0174 versus 0.0172. The coefficient on Phase 3 is no longer significant.

    After reading another forum post (I seem to have lost the link, sorry!), I also explored the AIC and BIC (n=2412). I don't know how applicable they are in this instance, but I saw that the model without the interaction has slightly lower AIC and BIC values.

    Code:
    . * clear past estimates
    . est clear
    
    . * Model 0: Intercept only
    . quietly logit DFW, vce(cluster STRM_SECT) or
    . est store M0 
    
    . * Model 1: PHASE added
    . quietly logit DFW i.PHASE_, vce(cluster STRM_SECT) or
    . est store M1
    
    . * Model 2: PHASE + SEMESTER
    . quietly logit DFW i.PHASE_ i.SEMESTER_, vce(cluster STRM_SECT) or
    . est store M2
    
    . * Model 3: PHASE + SEMESTER + PHASE#SEMESTER
    . quietly logit DFW i.PHASE_ i.SEMESTER_ i.PHASE_#i.SEMESTER_, vce(cluster STRM_SECT) or
    . est store M3
    
    .  * Table of results 
    .  est table M0 M1 M2 M3, stats(chi2 df N aic bic rank) star(.05 .01 .001) eform varwidth(24) style(nolines) 
    
    ------------------------------------------------------------------------------------------
                    Variable        M0              M1              M2              M3        
    ------------------------------------------------------------------------------------------
                      PHASE_  
                    Phase 2                     .79809394       .85430332       .92232566     
                    Phase 3                      .4898581**     .47872228***    .49277439     
                              
                   SEMESTER_  
                     Spring                                     1.6415463***      1.83125**   
                              
            PHASE_#SEMESTER_  
             Phase 2#Spring                                                     .82446964     
                              
            PHASE_#SEMESTER_  
             Phase 3#Spring                                                     .93510388     
                              
                       _cons    .15351506***    .20309051***    .16518016***    .15699659***  
    ------------------------------------------------------------------------------------------
                        chi2                     8.546371       20.150958       49.579609     
                          df                                                                  
                           N         2412            2412            2412            2412     
                         aic    1894.0142       1880.7794       1867.5571       1871.0909     
                         bic    1899.8024        1898.144         1890.71       1905.8202     
                        rank            1               3               4               6     
    ------------------------------------------------------------------------------------------
                                                         legend: * p<.05; ** p<.01; *** p<.001
    All in all, would I be correct in concluding that I don't need the interaction in the model? Is there another way I should be using the Wald test? (If so, what is the correct syntax?)

    Thanks in advance for your help!
Working...
X