Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered Mlogit missing Wald Chi

    Dear Statalisters,
    My name is: Sindra Sharma

    This is my first question on Statalist so apologies if there are errors in the form.

    I am running a mlogit vce(cluster).

    My sample size is small (205). I am trying to cluster my SE at the village level (8 clusters). I am finding when I add vce(cluster) I do not get a Wald Chi2 value. Why would this be? This problem does not occur without the cluster command.

    Thanks in advance.

  • #2
    There are several possibilities, and had you shown us the exact command you gave and the exact output that Stata gave you in response, we could probably be more helpful.

    But let me make a guess. You have only 8 clusters. So if you have 6 or more variables in your model (and if you have categorical variables represented by multiple indicators, then each indicator counts as a separate variable), you have exhausted your degrees of freedom and there will be no overall model test. (Depending on how many levels your outcome variable has, the maximum number of variables you can include before exhausting your degrees of freedom can be even smaller.)

    That said, you will still see Wald tests of the individual variables in the regression table output, and you can test various combinations of the variables using the -test- command.

    If that isn't your problem, then please copy your command and Stata's output from the Stata Results window and paste them directly into a code block here on the forum. (See FAQ for how to create a code block.) You will get more specific advice then.

    Also, the norm in this community is to use our real first and last names for our user names. So, please use the Contact Us button to ask the Forum administrator to change your user name to Sindra Sharma. Thanks!

    Comment


    • #3
      Dear Clyde,
      Thanks for your reply. It was extremely useful.

      Also please except my apologies for responding so late, I had thought the question had gone unanswered. Once again, thanks for your help.

      Comment


      • #4
        Hello Sindra and Clyde,

        i have i similar issue as i get a missing value for Chi2 (Stata/SE 17.0). I have made two regressions, Regression 1 where i get a Chi2 value and where i can check whether my model is statistically significant. On the other hand, when i regress with the vce(cluster variable) option (Regression 2) i get a missing value for chi2 and i cannot check whether my model is statistically significant. However, it seems as Regression 2 is the better model as more variables are significant.

        My question is as follows: How do i know which model is better?

        Thank you very much for your help!


        logit TERMINATED CSRRQ FRQ_111_w i.HOSTILE i.MULTIBID TOEHOLD i.TENDER i.INDUSTRY i.STOCK SIZE_w sdCFO_w GROWTH_w RoE_w MTB_Refi
        > nitiv_w LEVERAGE_w

        Iteration 0: log likelihood = -63.815899
        Iteration 1: log likelihood = -33.030832
        Iteration 2: log likelihood = -31.019907
        Iteration 3: log likelihood = -30.875754
        Iteration 4: log likelihood = -30.875427
        Iteration 5: log likelihood = -30.875427

        Logistic regression Number of obs = 101
        LR chi2(14) = 65.88
        Prob > chi2 = 0.0000
        Log likelihood = -30.875427 Pseudo R2 = 0.5162

        ---------------------------------------------------------------------------------
        TERMINATED | Coefficient Std. err. z P>|z| [95% conf. interval]
        ----------------+----------------------------------------------------------------
        CSRRQ | -.2842261 .4313439 -0.66 0.510 -1.129645 .5611925
        FRQ_111_w | -6.437695 2.955846 -2.18 0.029 -12.23105 -.6443431
        1.HOSTILE | 4.362367 .8989699 4.85 0.000 2.600419 6.124316
        1.MULTIBID | 1.363963 .9316441 1.46 0.143 -.4620259 3.189952
        TOEHOLD | -.0056767 .0302713 -0.19 0.851 -.0650075 .053654
        1.TENDER | -1.955211 1.28334 -1.52 0.128 -4.470511 .560089
        1.INDUSTRY | -.9345645 1.039687 -0.90 0.369 -2.972314 1.103185
        1.STOCK | .4648789 .9889564 0.47 0.638 -1.47344 2.403198
        SIZE_w | .2414528 .3883149 0.62 0.534 -.5196304 1.002536
        sdCFO_w | -.0005694 .0040501 -0.14 0.888 -.0085074 .0073686
        GROWTH_w | .7304614 .5897109 1.24 0.215 -.4253508 1.886274
        RoE_w | -1.01383 1.041253 -0.97 0.330 -3.054649 1.026988
        MTB_Refinitiv_w | -.0839447 .1699159 -0.49 0.621 -.4169738 .2490845
        LEVERAGE_w | .3295281 .3912153 0.84 0.400 -.4372398 1.096296
        _cons | -7.899894 8.08393 -0.98 0.328 -23.74411 7.944318
        ---------------------------------------------------------------------------------

        . logit TERMINATED CSRRQ FRQ_111_w i.HOSTILE i.MULTIBID TOEHOLD i.TENDER i.INDUSTRY i.STOCK SIZE_w sdCFO_w GROWTH_w RoE_w MTB_Refi
        > nitiv_w LEVERAGE_w, vce(cluster YEAR)

        Iteration 0: log pseudolikelihood = -63.815899
        Iteration 1: log pseudolikelihood = -33.030832
        Iteration 2: log pseudolikelihood = -31.019907
        Iteration 3: log pseudolikelihood = -30.875754
        Iteration 4: log pseudolikelihood = -30.875427
        Iteration 5: log pseudolikelihood = -30.875427

        Logistic regression Number of obs = 101
        Wald chi2(4) = .
        Prob > chi2 = .
        Log pseudolikelihood = -30.875427 Pseudo R2 = 0.5162

        (Std. err. adjusted for 6 clusters in YEAR)
        ---------------------------------------------------------------------------------
        | Robust
        TERMINATED | Coefficient std. err. z P>|z| [95% conf. interval]
        ----------------+----------------------------------------------------------------
        CSRRQ | -.2842261 .5091825 -0.56 0.577 -1.282205 .7137533
        FRQ_111_w | -6.437695 3.720289 -1.73 0.084 -13.72933 .8539377
        1.HOSTILE | 4.362367 .8443637 5.17 0.000 2.707445 6.01729
        1.MULTIBID | 1.363963 .9642846 1.41 0.157 -.5260001 3.253926
        TOEHOLD | -.0056767 .0153461 -0.37 0.711 -.0357546 .0244011
        1.TENDER | -1.955211 1.163918 -1.68 0.093 -4.236449 .3260267
        1.INDUSTRY | -.9345645 1.173526 -0.80 0.426 -3.234634 1.365505
        1.STOCK | .4648789 .6301557 0.74 0.461 -.7702034 1.699961
        SIZE_w | .2414528 .4762599 0.51 0.612 -.6919995 1.174905
        sdCFO_w | -.0005694 .0037698 -0.15 0.880 -.0079581 .0068193
        GROWTH_w | .7304614 .3415955 2.14 0.032 .0609466 1.399976
        RoE_w | -1.01383 .5258354 -1.93 0.054 -2.044449 .0167879
        MTB_Refinitiv_w | -.0839447 .1709889 -0.49 0.623 -.4190767 .2511874
        LEVERAGE_w | .3295281 .1735415 1.90 0.058 -.010607 .6696632
        _cons | -7.899894 9.893131 -0.80 0.425 -27.29007 11.49029
        ---------------------------------------------------------------------------------



        Comment


        • #5
          Click image for larger version

Name:	Regression1.png
Views:	1
Size:	274.8 KB
ID:	1668889
          Click image for larger version

Name:	Regression2.png
Views:	1
Size:	294.9 KB
ID:	1668890

          The first screenshot is regression 1 and the second screenshot is regression 2.

          Comment


          • #6
            The command lrtest does not allow to compare models with robust VCE.
            In the following, you can find my stata code:


            . * coompare models

            quietly logit TERMINATED CSRRQ FRQ_111_w i.HOSTILE i.MULTIBID TOEHOLD i.TENDER i.INDUSTRY i.STOCK SIZE_w sdCFO_w GROWTH_w RoE_w MTB_Refinitiv_w LEVERAGE_w
            est store M1

            quietly logit TERMINATED CSRRQ FRQ_111_w i.HOSTILE i.MULTIBID TOEHOLD i.TENDER i.INDUSTRY i.STOCK SIZE_w sdCFO_w GROWTH_w RoE_w MTB_Refinitiv_w LEVERAGE_w, vce(cluster YEAR)
            est store M2

            lrtest M2 M1
            LR test likely invalid for models with robust VCE
            r(498);



            Comment


            • #7
              Not getting a chi square test for the model with -vce(cluster year)- is precisely because you have only 6 clusters defined by year. Since the number of explanatory variables in the model greatly exceeds that, no model chi square test is possible. There aren't enough degrees of freedom.

              Why do you care? Is it really part of your research goals to test the null hypothesis that all those coefficients are simultaneously zero? That's a pretty unusual situation to be in. If that hypothesis test isn't a specific part of your research goals, then just ignore the absence of the model chi square. It's a test of a hypothesis that nobody cares about. If it is a specific goal of your research to test that hypothesis, then you can't use the clustered standard errors. Either use the regular standard errors, or if that is unacceptable, go bootstrap.

              As for comparing the models, as you have seen, you cannot do a likelihood ratio test with robust or clustered standard errors. I suggest using AIC or BIC. -estat ic- after each regression will give you those statistics. That said, any of these tests is, in my opinion, a second-rate to compare logistic models even when all of them are available. I would instead compare the discrimination (-lroc-) and calibration (-estat gof, group(10)-) of the two models to see which is more suitable for purpose.

              Comment


              • #8
                Dear Clyde,

                thank you for your very helpful comment. I used AIC and BIC and results clearly showed that regression 2 is better. However, when using the estat gof, group(10) command, for both regression i receive the exact same result (see below). From this, i would conclude that both models are equally good at predicting the regression outcome for the deciles? However, as i have to rejct H0, i have to conclude that the regression is not a good fit? Or could i still go with regression two as the overall number of observations is low?

                quietly logit TERMINATED CSRRQ FRQ_111_w i.HOSTILE i.MULTIBID TOEHOLD i.TENDER i.INDUSTRY i.STOCK SIZE_w sdCFO_w GROWTH_w RoE_w
                > MTB_Refinitiv_w LEVERAGE_w, vce(robust)

                . est store M3

                . estat gof, group(10)
                note: obs collapsed on 10 quantiles of estimated probabilities.

                Goodness-of-fit test after logistic model
                Variable: TERMINATED

                Number of observations = 101
                Number of groups = 10
                Hosmer–Lemeshow chi2(8) = 16.28
                Prob > chi2 = 0.0385

                Thank you again!

                Comment


                • #9
                  Sorry, I forgot while responding that your models differed only in the presence or absence of cluster-robust standard errors. Neither -lroc- nor -estat gof- will distinguish these because they reflect only the predicted values, not the standard errors, and those two models produce the same predicted values.

                  In fact, I overlooked the most important thing regarding choice of standard errors here: you do not have nearly enough clusters to use -vce(cluster)-, regardless of what any test comparing the two models says. Cluster-robust (and ordinary robust) standard errors are large-sample statistics. While there is no universal agreement on how many clusters are needed, nobody thinks that 6 would suffice to produce valid results. I apologize for wasting your time on -estat ic-, -lroc-, and -estat gof-, when your question has such a simple and direct answer. You simply can't use -vce(cluster)- here.

                  Comment


                  • #10
                    Thank you, Clyde! No apologies necessary, you helped me a lot - and i learned something new

                    Comment

                    Working...
                    X