Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Walds test - how to use it properly?

    Dear statalisters,

    My questions is about Walds test and how to use it properly?

    I have performed some mixed linear regression models like the following.

    Code:
    webuse auto
    Code:
    mixed price mpg weight i.rep78 || foreign:

    HTML Code:
    Performing EM optimization: 
    
    Performing gradient-based optimization: 
    
    Iteration 0:   log likelihood = -628.14785  
    Iteration 1:   log likelihood = -628.14785  
    
    Computing standard errors:
    
    Mixed-effects ML regression                     Number of obs     =         69
    Group variable: foreign                         Number of groups  =          2
    
                                                    Obs per group:
                                                                  min =         21
                                                                  avg =       34.5
                                                                  max =         48
    
                                                    Wald chi2(3)      =      63.90
    Log likelihood = -628.14785                     Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |   17.59927   75.57848     0.23   0.816    -130.5318    165.7304
          weight |   3.387514   .6326358     5.35   0.000     2.147571    4.627458
           rep78 |    206.673   321.9493     0.64   0.521    -424.3361    837.6821
           _cons |  -4599.017   3437.865    -1.34   0.181    -11337.11    2139.075
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    foreign: Identity            |
                      var(_cons) |    2526074    2889011      268501.8    2.38e+07
    -----------------------------+------------------------------------------------
                   var(Residual) |    4338805     751500       3089856     6092590
    ------------------------------------------------------------------------------
    LR test vs. linear model: chibar2(01) = 8.00          Prob >= chibar2 = 0.0023

    Code:
    mixed price mpg weight i.rep78 || foreign: , coeflegend
    HTML Code:
    Output left out on purpose
    Can I use the Wald test (as presented below) to say something about wether rep78 category 2 affect the price more than rep78 category 3 or category 4 and so on? Or can I only use the Wald test to exclude variables from the regression model?

    Code:
    test _b[price:mpg] = _b[price:mpg]+_b[price:2.rep78] =_b[price:mpg]+_b[price:3.rep78] =_b[price:mpg]+_b[price:4.rep78] , mtest(bon)
    HTML Code:
     ( 1)  - [price]2.rep78 = 0
     ( 2)  - [price]3.rep78 = 0
     ( 3)  - [price]4.rep78 = 0
    
    ---------------------------------------
           |        chi2     df       p
    -------+-------------------------------
      (1)  |        0.12      1     1.0000 #
      (2)  |        0.35      1     1.0000 #
      (3)  |        0.19      1     1.0000 #
    -------+-------------------------------
      all  |        0.49      3     0.9222
    ---------------------------------------
             # Bonferroni-adjusted p-values

    I prefer to avoid the likelihood ratio test modellings.



    best regards

  • #2
    Asymptotically, as the sample size grows very large, the Wald and LR chi square statistics have the same distribution. So in very large samples, it really doesn't matter which one you use: you will get, to a very high degree of approximation, the same result either way.

    In small samples (and for this purpose, the sample of 74 in auto.dta is small) they can disagree. It can be shown that the Wald test always gives a higher value than the LR test. So, using the same critical values of the chi square distribution, the Wald test is less conservative than the LR test at rejecting the null hypothesis. But a deeper analysis shows that actually the Wald statistic is a monotone function of the LR statistic. That means that, in principle, one could "calibrate" the critical value used for hypothesis testing to a larger value and achieve the same results as the LR test while keeping the same statistical power. The difficulty is that calculating the actual calibration is too difficult to be of practical use. See https://www.google.com/search?q=Wald...hrome&ie=UTF-8 for details.

    Bear in mind that for small samples, neither the LR nor the Wald test is actually giving exactly correct results.

    Finally, stop and consider whether you really should be doing a significance test. The American Statistical Association now recommends that the practice be discontinued. See https://www.tandfonline.com/doi/full...5.2019.1583913. In nearly all situations, it is better to focus on actual effect estimates and the degree of precision associated with them, rather than testing whether the effects "are zero" (which is a misinterpretation of statistical significance anyway.)

    Comment


    • #3
      Thank you very much for the very thorough explanation.

      Would anova with bonferroni adjusted testing be a more appropriate model (given that I need to do the testing)?

      Code:
      anova price mpg c.weight##i.rep78, nocons

      HTML Code:
      . anova price mpg c.weight##i.rep78, nocons
      
                               Number of obs =         69    R-squared     =  0.9685
                               Root MSE      =    1582.77    Adj R-squared =  0.9457
      
                        Source | Partial SS         df         MS        F    Prob>F
                  -------------+----------------------------------------------------
                         Model |  3.083e+09         29   1.063e+08     42.44  0.0000
                               |
                           mpg |  1.761e+08         20   8806592.3      3.52  0.0004
                        weight |   81003349          1    81003349     32.33  0.0000
                         rep78 |   77731094          4    19432773      7.76  0.0001
                  rep78#weight |   92131628          4    23032907      9.19  0.0000
                               |
                      Residual |  1.002e+08         40   2505175.4  
                  -------------+----------------------------------------------------
                         Total |  3.183e+09         69    46133227
      Code:
      test (1.rep78#c.weight = 2.rep78#c.weight = 3.rep78#c.weight), mtest(bonferroni)
      HTML Code:
      . test (1.rep78#c.weight = 2.rep78#c.weight = 3.rep78#c.weight), mtest(bonferroni)
      
       ( 1)  1b.rep78#co.weight - 2.rep78#c.weight = 0
       ( 2)  1b.rep78#co.weight - 3.rep78#c.weight = 0
      
      ---------------------------------------
             |    F(df,40)     df       p
      -------+-------------------------------
        (1)  |        3.04      1     0.1779 #
        (2)  |        5.51      1     0.0478 #
      -------+-------------------------------
        all  |        3.01      2     0.0606
      ---------------------------------------
               # Bonferroni-adjusted p-values

      Comment


      • #4
        Bonferroni correction is equally applicable whether the underlying model is ANOVA or -mixed-. The question is whether it is appropriate or necessary here. If this particular suite of tests is precisely the suite of hypothesis tests that you planned to run ahead of time, and you do not intend to do any more, then it would be appropriate.

        If, however, you are also going to be other tests, then it is debatable whether the correction should rely on the total number of tests done. And then other questions arise: what about other tests that you did "just for fun" but didn't present in your results? What about other tests that another person on the project did "just for fun" that also didn't make it into the report, and perhaps you don't eve know about them? Bonferroni correction was designed for use when there was a pre-specified plan of multiple tests and that plan was scrupulously followed. Its applicability in other scenarios (which, in my experience are far more common) is questionable. For my part, I have handled this kind of problem by not doing any corrections for multiple tests but making a clear statement to the reader that the results shown are uncorrected, make of them what you will. For that matter, I don't really do much hypothesis testing any more and prefer to emphasize effect sizes and effect differences and estimates of their uncertainty (standard errors or CIs) in preference to p-values and null hypothesis tests. This approach is now preferred by the American Statistical Association as well. See https://www.tandfonline.com/doi/full...5.2019.1583913.

        Comment


        • #5
          We had prespecified these hypothesis testing in our protocol but I was all at sudden very insecure about the Wald test and its assumptions.

          Comment

          Working...
          X