Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exact matching and cell means

    Hi all,

    I'm trying to understand the difference between cell means and treatment effects estimation on discrete variables (exact macthing). Similar to here: https://blog.stata.com/2016/08/16/ex...on-adjustment/

    Difference between cell means on a single "treatment" variable is the same as regressing outcome on same variable, as can be seen here:
    Code:
    cls
    clear all
    sysuse auto
    drop if inlist(rep78,1,2)
    
          Source |       SS           df       MS      Number of obs   =        64
    -------------+----------------------------------   F(1, 62)        =      0.08
           Model |  701899.735         1  701899.735   Prob > F        =    0.7772
        Residual |   538613171        62  8687309.21   R-squared       =    0.0013
    -------------+----------------------------------   Adj R-squared   =   -0.0148
           Total |   539315071        63  8560556.68   Root MSE        =    2947.4
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         foreign |
        Foreign  |   220.4913   775.7051     0.28   0.777    -1330.121    1771.104
           _cons |    6164.19   454.7974    13.55   0.000     5255.063    7073.318
    ------------------------------------------------------------------------------
    
    . margins r.foreign
    
    Contrasts of adjusted predictions
    Model VCE    : OLS
    
    Expression   : Linear prediction, predict()
    
    ------------------------------------------------
                 |         df           F        P>F
    -------------+----------------------------------
         foreign |          1        0.08     0.7772
                 |
     Denominator |         62
    ------------------------------------------------
    
    ------------------------------------------------------------------------
                           |            Delta-method
                           |   Contrast   Std. Err.     [95% Conf. Interval]
    -----------------------+------------------------------------------------
                   foreign |
    (Foreign vs Domestic)  |   220.4913   775.7051     -1330.121    1771.104
    ------------------------------------------------------------------------
    
    . 
    . table foreign, c(mean price)
    
    -----------------------
     Car type | mean(price)
    ----------+------------
     Domestic |     6,164.2
      Foreign |     6,384.7
    -----------------------
    
    . di 6384.7 - 6164.2
    220.5
    If we have more than one variable on which we wish to match, regressing on full set of interactions is the same as exact matching:
    Code:
    . reg price i.foreign##i.rep78
    
          Source |       SS           df       MS      Number of obs   =        59
    -------------+----------------------------------   F(5, 53)        =      0.44
           Model |  19070228.2         5  3814045.63   Prob > F        =    0.8204
        Residual |   462156727        53  8719938.25   R-squared       =    0.0396
    -------------+----------------------------------   Adj R-squared   =   -0.0510
           Total |   481226956        58  8297016.48   Root MSE        =      2953
    
    -------------------------------------------------------------------------------
            price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
          foreign |
         Foreign  |  -1778.407   1797.111    -0.99   0.327    -5382.955     1826.14
                  |
            rep78 |
               4  |  -725.5185   1136.593    -0.64   0.526    -3005.235    1554.198
               5  |  -2402.574   2164.008    -1.11   0.272    -6743.024    1937.876
                  |
    foreign#rep78 |
       Foreign#4  |   2158.296   2273.185     0.95   0.347    -2401.136    6717.728
       Foreign#5  |   3866.574   2925.484     1.32   0.192    -2001.204    9734.352
                  |
            _cons |   6607.074   568.2963    11.63   0.000     5467.216    7746.932
    -------------------------------------------------------------------------------
    
    . margins r.foreign
    
    Contrasts of predictive margins
    Model VCE    : OLS
    
    Expression   : Linear prediction, predict()
    
    ------------------------------------------------
                 |         df           F        P>F
    -------------+----------------------------------
         foreign |          1        0.13     0.7172
                 |
     Denominator |         53
    ------------------------------------------------
    
    ------------------------------------------------------------------------
                           |            Delta-method
                           |   Contrast   Std. Err.     [95% Conf. Interval]
    -----------------------+------------------------------------------------
                   foreign |
    (Foreign vs Domestic)  |  -399.0574   1095.717     -2596.787    1798.672
    ------------------------------------------------------------------------
    
    . teffects nnmatch (price) (foreign), ematch(rep78) vce(iid)
    
    Treatment-effects estimation                   Number of obs      =         59
    Estimator      : nearest-neighbor matching     Matches: requested =          1
    Outcome model  : matching                                     min =          2
    Distance metric: Mahalanobis                                  max =         27
    ----------------------------------------------------------------------------------------
                     price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
    ATE                    |
                   foreign |
    (Foreign vs Domestic)  |  -399.0574   943.6028    -0.42   0.672    -2248.485     1450.37
    ----------------------------------------------------------------------------------------
    However, I thought that this should be the same as cell means by both foreign and rep78, yet it is not...
    Code:
    . table foreign rep78, c(mean price) col
    
    ----------------------------------------------
              |         Repair Record 1978        
     Car type |       3        4        5    Total
    ----------+-----------------------------------
     Domestic | 6,607.1  5,881.6  4,204.5  6,308.8
      Foreign | 4,828.7  6,261.4  6,292.7  6,070.1
    ----------------------------------------------
    
    . di 6070.1 - 6308.8
    -238.7
    My question is - what is the difference then? why are these not the same?
    Last edited by Ariel Karlinsky; 15 Nov 2018, 06:30.

  • #2
    What your question boils down to is how do you interpret the coefficient of a 0/1 indicator in a regression model? If the indicator is the only regressor in the model, then it is the average difference in the dependent variable between the category for which the categorical variable is 0 (i.e., the reference category) and the category for which the categorical variable is 1. With more than 1 regressor, you need to adjust the definition to include the terms "after controlling for other independent variables in the model". In fact, you can say that this is the main justification of regression analysis. So you cannot just do a straight comparison of means, ignoring other regressors in the model.

    Comment

    Working...
    X