This is a cross post from stackexchange, see: http://stats.stackexchange.com/quest...es-stata-and-r
I have a question about what the difference is in how Stata and R compute ANOVAs. I have run exactly the same ANOVA in both softwares, but curiously get a different F-statistics for one of the predictors. I´m not too familiar with Stata, but as far as I understood it, I do a Type 2 SS ANOVA for both.
To understand my output, this is my model:
Outcome variable is a continuous variable called vertrauen (=trust)
predictor 1 is a 2-level factor called trustee in R and Goodguy in Stata
predictor 2 is also a 2 level factor called Group in R and uw in Stata.
This is the R output:
This is the Stata output:
As you can see, the F-statistics for the Group (UW) main effect and for the Group (UW) x trustee (Goodguy) interaction are the same, but for the trustee (Goodguy) main effect they differ. In R it´s almost twice as high as in Stata. I tried to change the order of the predictor and the reference levels, but that didn´t change my R output.
Does anyone know what causes the difference in the F-statistic here? I´m really puzzled about it. I expected it to be the same.
Here is the Stata output without the interaction:
And here is the R output without the interaction:
It´s the same, thus it has to do something with how the two softwares incorporate the interaction term.
I also tried to manually compute the interaction term and found something interesting:
Here is the R output:
And here is the Stata output:
Thus it seems that there is a difference in how R/ Stata computes the interactions. The R output of the manually computed interaction matches the automatically computed interaction output in Stata.
And finally the descriptives from R:
and from Stata:
I have a question about what the difference is in how Stata and R compute ANOVAs. I have run exactly the same ANOVA in both softwares, but curiously get a different F-statistics for one of the predictors. I´m not too familiar with Stata, but as far as I understood it, I do a Type 2 SS ANOVA for both.
To understand my output, this is my model:
Outcome variable is a continuous variable called vertrauen (=trust)
predictor 1 is a 2-level factor called trustee in R and Goodguy in Stata
predictor 2 is also a 2 level factor called Group in R and uw in Stata.
This is the R output:
Code:
> m2-lm(vertrauen~trustee*Group,data=RTG.UWD.short.50) > Anova(m2,type="2") > Anova Table (Type II tests) >Response: vertrauen > Sum Sq Df F value Pr(>F) >trustee 2.4928 1 24.5497 1.367e-05 *** >Group 0.0030 1 0.0292 0.8651 >trustee:Group 0.1137 1 1.1200 0.2963 >Residuals 4.0617 40 > >Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >
Code:
. anova vertrauen uw Goodguy uw#Goodguy
Number of obs = 44 R-squared = 0.3912
Root MSE = .318658 Adj R-squared = 0.3455
Source | Partial SS df MS F Prob>F
-----------+----------------------------------------------------
Model | 2.6095358 3 .86984526 8.57 0.0002
|
uw | .00296733 1 .00296733 0.03 0.8651
Goodguy | 1.2981586 1 1.2981586 12.78 0.0009
uw#Goodguy | .11373073 1 .11373073 1.12 0.2963
|
Residual | 4.0617062 40 .10154266
-----------+----------------------------------------------------
Total | 6.671242 43 .15514516
Does anyone know what causes the difference in the F-statistic here? I´m really puzzled about it. I expected it to be the same.
Here is the Stata output without the interaction:
Code:
. anova vertrauen uw Goodguy
Number of obs = 44 R-squared = 0.3741
Root MSE = .319124 Adj R-squared = 0.3436
Source | Partial SS df MS F Prob>F
-----------+----------------------------------------------------
Model | 2.495805 2 1.2479025 12.25 0.0001
|
uw | .00296733 1 .00296733 0.03 0.8653
Goodguy | 2.4928377 1 2.4928377 24.48 0.0000
|
Residual | 4.175437 41 .10183993
-----------+----------------------------------------------------
Total | 6.671242 43 .15514516
Code:
> m2.4-lm(vertrauen~trustee+Group,data=RTG.UWD.short.50)
> Anova(m2.4)
Anova Table (Type II tests)
Response: vertrauen
Sum Sq Df F value Pr(>F)
trustee 2.4928 1 24.4780 1.328e-05 ***
Group 0.0030 1 0.0291 0.8653
Residuals 4.1754 41
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
I also tried to manually compute the interaction term and found something interesting:
Here is the R output:
Code:
RTG.UWD.short.50$interaction-as.numeric(RTG.UWD.short.50$trustee)*as.numeric(RTG.UWD.short.50$Group)
> m2.7 Anova(m2.7)
Anova Table (Type II tests)
Response: vertrauen
Sum Sq Df F value Pr(>F)
trustee 1.2982 1 12.7844 0.0009316 ***
Group 0.0030 1 0.0292 0.8651282
interaction 0.1137 1 1.1200 0.2962617
Residuals 4.0617 40
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
Code:
. gen interaction=uw*Goodguy
. anova vertrauen uw Goodguy interaction
Number of obs = 44 R-squared = 0.3912
Root MSE = .318658 Adj R-squared = 0.3455
Source | Partial SS df MS F Prob>F
------------+----------------------------------------------------
Model | 2.6095358 3 .86984526 8.57 0.0002
|
uw | .0399785 1 .0399785 0.39 0.5339
Goodguy | 2.3984067 1 2.3984067 23.62 0.0000
interaction | .11373073 1 .11373073 1.12 0.2963
|
Residual | 4.0617062 40 .10154266
------------+----------------------------------------------------
Total | 6.671242 43 .15514516
And finally the descriptives from R:
Code:
> describe(RTG.UWD.short.50$vertrauen)
RTG.UWD.short.50$vertrauen
n missing unique Info Mean
44 0 43 1 0.5046
> describe(RTG.UWD.short.50$Group)
RTG.UWD.short.50$Group
n missing unique
44 0 2
1 (34, 77%), 2 (10, 23%)
> describe(RTG.UWD.short.50$trustee)
RTG.UWD.short.50$trustee
n missing unique
44 0 2
bad (22, 50%), good (22, 50%)
Code:
. sum vertrauen uw Goodguy
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
vertrauen | 44 .5045969 .3938847 .000998 1
uw | 44 .2272727 .4239151 0 1
Goodguy | 44 .5 .5057805 0 1

Comment