Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • difference-of-means test for overlapping groups

    I need to perform a difference-of-means test for overlapping groups--that is, where individuals can be in group A, group B, both, or neither. Using the nlsw88.dta dataset, for example, I want to test whether the mean wage for married individuals is statistically significantly different from the mean wage for individuals living in the South. The tricky part is that some individuals are both married and living in the South.

    Is the following approach using regress, suest, and test appropriate? If not, is there a better alternative?

    Code:
    . sysuse nlsw88
    (NLSW, 1988 extract)
    
    . tab married south // note overlap
    
               |    lives in south
       married |         0          1 |     Total
    -----------+----------------------+----------
        single |       464        340 |       804
       married |       840        602 |     1,442
    -----------+----------------------+----------
         Total |     1,304        942 |     2,246
    
    
    . quietly regress wage married
    
    . estimates store eq1
    
    . quietly regress wage south
    
    . estimates store eq2
    
    . suest eq1 eq2, coeflegend
    
    Simultaneous results for eq1, eq2
    
                                                    Number of obs     =      2,246
    
    ------------------------------------------------------------------------------
                 |      Coef.  Legend
    -------------+----------------------------------------------------------------
    eq1_mean     |
         married |  -.4887873  _b[eq1_mean:married]
           _cons |   8.080765  _b[eq1_mean:_cons]
    -------------+----------------------------------------------------------------
    eq1_lnvar    |
           _cons |   3.499106  _b[eq1_lnvar:_cons]
    -------------+----------------------------------------------------------------
    eq2_mean     |
           south |  -1.514791  _b[eq2_mean:south]
           _cons |   8.402271  _b[eq2_mean:_cons]
    -------------+----------------------------------------------------------------
    eq2_lnvar    |
           _cons |   3.483747  _b[eq2_lnvar:_cons]
    ------------------------------------------------------------------------------
    
    . test [eq1_mean]married + [eq1_mean]_cons = [eq2_mean]south + [eq2_mean]_cons
    
     ( 1)  [eq1_mean]married + [eq1_mean]_cons - [eq2_mean]south - [eq2_mean]_cons = 0
    
               chi2(  1) =   17.47
             Prob > chi2 =    0.0000
    David Radwin
    Senior Researcher, California Competes
    californiacompetes.org
    Pronouns: He/Him

  • #2
    For posterity's sake, I think I have an answer to my own question. A statistician colleague suggested a solution using the statistical package SUDAAN, which I was able to recreate in Stata.

    It is largely the same as the example above, but the code is slightly simpler and yields a t-statistic instead of a chi-square statistic. The p-value is very similar but not identical.

    Code:
    . sysuse nlsw88
    (NLSW, 1988 extract)
    
    . svyset _n
    
          pweight: <none>
              VCE: linearized
      Single unit: missing
         Strata 1: <one>
             SU 1: <observations>
            FPC 1: <zero>
    
    . svy: regress wage if married == 1
    (running regress on estimation sample)
    
    Survey: Linear regression
    
    Number of strata   =         1                  Number of obs     =      1,442
    Number of PSUs     =     1,442                  Population size   =      1,442
                                                    Design df         =      1,441
                                                    F(   0,   1441)   =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.0000
    
    ------------------------------------------------------------------------------
                 |             Linearized
            wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   7.591978   .1421835    53.40   0.000     7.313069    7.870887
    ------------------------------------------------------------------------------
    
    . estimates store eq1
    
    . svy: regress wage if south == 1
    (running regress on estimation sample)
    
    Survey: Linear regression
    
    Number of strata   =         1                  Number of obs     =        942
    Number of PSUs     =       942                  Population size   =        942
                                                    Design df         =        941
                                                    F(   0,    941)   =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.0000
    
    ------------------------------------------------------------------------------
                 |             Linearized
            wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |    6.88748   .1721331    40.01   0.000     6.549671    7.225289
    ------------------------------------------------------------------------------
    
    . estimates store eq2
    
    . suest eq1 eq2
    
    Simultaneous survey results for eq1, eq2
    
    Number of strata   =         1                  Number of obs     =      1,782
    Number of PSUs     =     1,782                  Population size   =      1,782
                                                    Design df         =      1,781
    
    ------------------------------------------------------------------------------
                 |             Linearized
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    eq1          |
           _cons |   7.591978   .1421741    53.40   0.000     7.313132    7.870824
    -------------+----------------------------------------------------------------
    eq2          |
           _cons |    6.88748     .17209    40.02   0.000     6.549961       7.225
    ------------------------------------------------------------------------------
    
    . lincom [eq1]_cons - [eq2]_cons, noci
    
     ( 1)  [eq1]_cons - [eq2]_cons = 0
    
    -----------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|
    -------------+---------------------------------------
             (1) |   .7044979   .1685593     4.18   0.000
    -----------------------------------------------------
    Disclosure: SUDAAN is produced by RTI International, my employer.
    David Radwin
    Senior Researcher, California Competes
    californiacompetes.org
    Pronouns: He/Him

    Comment

    Working...
    X