Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating difference in means

    Hi all, I'm trying to solve what is likely a fairly easy problem but is tripping me up. I want to chronicle the difference in means for gpa when some variables are 0 vs 1. For example, if I was doing them by hand, I could just do something like this and subtract the two means myself:

    Code:
    sum gpa if tutor == 0
    sum gpa if tutor == 1
    
    sum gpa if attendance == 0
    sum gpa if attendance == 1
    
    sum gpa if question == 0
    sum gpa if question == 1
    Any ideas for simplifying/automating this summing and subtracting of the means?

  • #2
    There are two fairly simple ways. One is to do a t-test and disregard all of the output other than the mean difference. This is only possible if you are comparing only two categories. If you are contrasting three or more categories, you can't use this. The other simple way is to do a regression: the coefficient of the grouping variable will be the mean difference. The following illustrates the approaches (after separately calculating the means along the lines you show in #1.)

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    .
    . summ mpg if foreign == 0
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
             mpg |         52    19.82692    4.743297         12         34
    
    . summ mpg if foreign == 1
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
             mpg |         22    24.77273    6.611187         14         41
    
    .
    . ttest mpg, by(foreign)
    
    Two-sample t test with equal variances
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
    Domestic |      52    19.82692     .657777    4.743297    18.50638    21.14747
     Foreign |      22    24.77273     1.40951    6.611187    21.84149    27.70396
    ---------+--------------------------------------------------------------------
    Combined |      74     21.2973    .6725511    5.785503     19.9569    22.63769
    ---------+--------------------------------------------------------------------
        diff |           -4.945804    1.362162               -7.661225   -2.230384
    ------------------------------------------------------------------------------
        diff = mean(Domestic) - mean(Foreign)                         t =  -3.6308
    H0: diff = 0                                     Degrees of freedom =       72
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.0003         Pr(|T| > |t|) = 0.0005          Pr(T > t) = 0.9997
    
    .
    . regress mpg i.foreign
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =     13.18
           Model |  378.153515         1  378.153515   Prob > F        =    0.0005
        Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
    -------------+----------------------------------   Adj R-squared   =    0.1430
           Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558
    
    ------------------------------------------------------------------------------
             mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         foreign |
        Foreign  |   4.945804   1.362162     3.63   0.001     2.230384    7.661225
           _cons |   19.82692   .7427186    26.70   0.000     18.34634    21.30751
    ------------------------------------------------------------------------------
    I have put the mean differences in bold face here, but the actual Stata output does not distinguish them that way.

    Notice that the two approaches show different signs. That's because the t-test calculates Domestic - Foreign, whereas the regression approach calculates Foreign - Domestic.

    Comment


    • #3
      Thanks Clyde!

      Comment

      Working...
      X