Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression doesn't produce the same results as ANOVA?

    Hello,

    Recently, as an explanatory aid, I analyzed data using both a regression approach and a linear approach.

    Code:
    . regress depress i.sex##i.educ
    
          Source |       SS           df       MS      Number of obs   =       100
    -------------+----------------------------------   F(7, 92)        =      1.91
           Model |  15.2473617         7  2.17819453   Prob > F        =    0.0761
        Residual |  104.712638        92  1.13818085   R-squared       =    0.1271
    -------------+----------------------------------   Adj R-squared   =    0.0607
           Total |      119.96        99  1.21171717   Root MSE        =    1.0669
    
    ------------------------------------------------------------------------------
         depress |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           1.sex |   .5714286   .7362003     0.78   0.440    -.8907288    2.033586
                 |
            educ |
              2  |   .7797619    .458281     1.70   0.092    -.1304237    1.689948
              3  |   .2751323   .4524954     0.61   0.545    -.6235627    1.173827
              4  |   .2987013    .515818     0.58   0.564    -.7257579     1.32316
                 |
        sex#educ |
            1 2  |   .4202381   .9039089     0.46   0.643    -1.375003    2.215479
            1 3  |  -.0251323   .8094949    -0.03   0.975    -1.632859    1.582595
            1 4  |   -.012987   .8989211    -0.01   0.989    -1.798322    1.772348
                 |
           _cons |   4.428571   .4032335    10.98   0.000     3.627715    5.229428
    ------------------------------------------------------------------------------
    
    . anova depress i.sex##i.educ
    
                             Number of obs =        100    R-squared     =  0.1271
                             Root MSE      =    1.06686    Adj R-squared =  0.0607
    
                      Source | Partial SS         df         MS        F    Prob>F
                  -----------+----------------------------------------------------
                       Model |  15.247362          7   2.1781945      1.91  0.0761
                             |
                         sex |  6.7709298          1   6.7709298      5.95  0.0166
                        educ |  8.0498893          3   2.6832964      2.36  0.0768
                    sex#educ |  .63825576          3   .21275192      0.19  0.9051
                             |
                    Residual |  104.71264         92   1.1381809  
                  -----------+----------------------------------------------------
                       Total |     119.96         99   1.2117172
    As you can see, the model F-statistic is identical. However, the value (and significance) for sex has changed. I am unsure of why this has happened given that one model suggests that sex is non-significant, while the other model suggests that sex is significant. Given that ANOVA and regression are identical, I'm unsure of how to interpret the research question of whether sex is a predictor of depression.

    Could anyone give me an intuitive explanation for why this happens?

    Cheers,

    David.

  • #2
    You cannot, neither in the -anova- nor in the -regress- version of your analysis use the output in the sex row of the table to determine whether sex is a predictor of depression. That's because you have an interaction model. In order to run that test you need to test the joint significance of sex and the sex#educ interaction term.

    Here's an example using the built-in auto.dta set that shows that both -regress- and -anova- agree on that kind of test, even though here, as in your example, the row of output for the foreign variable is different:

    Code:
    . clear*
    
    . sysuse auto
    (1978 Automobile Data)
    
    . 
    . keep if rep78 >= 3 // ELIMINATE ZERO CELLS
    (10 observations deleted)
    
    . 
    . regress price i.foreign##i.rep78
    
          Source |       SS           df       MS      Number of obs   =        59
    -------------+----------------------------------   F(5, 53)        =      0.44
           Model |  19070228.2         5  3814045.63   Prob > F        =    0.8204
        Residual |   462156727        53  8719938.25   R-squared       =    0.0396
    -------------+----------------------------------   Adj R-squared   =   -0.0510
           Total |   481226956        58  8297016.48   Root MSE        =      2953
    
    -------------------------------------------------------------------------------
            price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
          foreign |
         Foreign  |  -1778.407   1797.111    -0.99   0.327    -5382.955     1826.14
                  |
            rep78 |
               4  |  -725.5185   1136.593    -0.64   0.526    -3005.235    1554.198
               5  |  -2402.574   2164.008    -1.11   0.272    -6743.024    1937.876
                  |
    foreign#rep78 |
       Foreign#4  |   2158.296   2273.185     0.95   0.347    -2401.136    6717.728
       Foreign#5  |   3866.574   2925.484     1.32   0.192    -2001.204    9734.352
                  |
            _cons |   6607.074   568.2963    11.63   0.000     5467.216    7746.932
    -------------------------------------------------------------------------------
    
    . test 1.foreign 1.foreign#4.rep78 1.foreign#5.rep78
    
     ( 1)  1.foreign = 0
     ( 2)  1.foreign#4.rep78 = 0
     ( 3)  1.foreign#5.rep78 = 0
    
           F(  3,    53) =    0.62
                Prob > F =    0.6026
    
    . 
    . anova price i.foreign##i.rep78
    
                             Number of obs =         59    R-squared     =  0.0396
                             Root MSE      =    2952.95    Adj R-squared = -0.0510
    
                      Source | Partial SS         df         MS        F    Prob>F
               --------------+----------------------------------------------------
                       Model |   19070228          5   3814045.6      0.44  0.8204
                             |
                     foreign |  395125.95          1   395125.95      0.05  0.8322
                       rep78 |  3385563.3          2   1692781.7      0.19  0.8241
               foreign#rep78 |   16312126          2   8156062.8      0.94  0.3988
                             |
                    Residual |  4.622e+08         53   8719938.3  
               --------------+----------------------------------------------------
                       Total |  4.812e+08         58   8297016.5  
    
    . test 1.foreign 1.foreign#4.rep78 1.foreign#5.rep78
    
     ( 1)  1.foreign = 0
     ( 2)  1.foreign#4.rep78 = 0
     ( 3)  1.foreign#5.rep78 = 0
    
           F(  3,    53) =    0.62
                Prob > F =    0.6026
    In the regress output that you show, the statistics from the row in sex pertain not to the effect of sex, but to the effect of sex only among those for whom educ = 0. To be honest, I have no idea what the F and p-value statistics in the anova output refer to: I haven't used -anova- in well over a decade now. What I do know is that -anova- parameterizes the model differently from the way it is done in -regress-, and, in particular, in the presence of interactions you get apparently discrepant results like that. But when you do meaningful hypothesis tests, such as the joint test of sex and sex#educ you will find that the results of -test- are the same either way.

    I haven't really answered your question, because I don't really remember what the -anova- outputs mean anymore, but perhaps it is enough that I pointed out that you are looking at a statistic that isn't what you apparently thought it is.

    Comment


    • #3
      Continuing with Clyde's point, I think that in the ANOVA model as presented, the main effect for sex tests if the means of the dependent variable differs among the values of sex - unlike regression, ANOVA doesn't seem to control for the levels of education in that main effect test.

      Moreover, the F test is a simultaneous test that, for example, the mean depression score does not differ among levels of education - that is, it's a simultaneous test that all 4 groups have the same mean level of depression. If the F test rejects the null, it could be that one group differs from the other three, it could be that all 4 groups differ, or anything in between. In contrast, in regression, we test for differences relative to the base value of a categorical variable, which is a t-test (hence, the t statistic). P-values are, of course, p-values.

      A lot of people, myself included, aren't that familiar with ANOVA anymore, to be honest. But a) you're right, the fundamental goal is not that different from regression, and b) you are in fact testing different things, but as Clyde demonstrated, when you manually go and test the same thing, you get the same result.
      Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

      Code:
      ssc install dataex

      Comment


      • #4
        Here's another way to think about it: You can use -contrast- commands after your -regress- command to generate the same F-tests that -anova- reports.

        Code:
        clear*
        sysuse auto
        keep if rep78 >= 3 // ELIMINATE ZERO CELLS
        regress price i.foreign##i.rep78
        * Use -contrast- commands to get the F-tests -anova- reports
        contrast foreign
        contrast rep78
        contrast foreign#rep78
        anova price i.foreign##i.rep78
        HTH.
        --
        Bruce Weaver
        Email: bweaver@lakeheadu.ca
        Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
        Stata version: 15.1 IC (Windows)

        Comment

        Working...
        X