Applying ttest for treatment and control group after matching process.

Konstantin Hofmann

Join Date: Jul 2018
Posts: 18

Applying ttest for treatment and control group after matching process.

02 Sep 2018, 07:43

Hello guys,

for my thesis, i am using a combination of matching and DiD.

As i want to show the matching quality after my matching step, i want to run "ttest" to see if the difference in the covariates are significant between the treatment and the control group.

Depending on what covariate i am adding to the analysis, the sample size is decreasing (as not every variable is observable for every year i have in my data).

My question is. How do i run a ttest for the "reduced" sample size?

To make my point clear, i will try to show you the problem.

I am running this code to get my estimates...

Code:

set more off
 
global exact welle_* 

global xvars age age2 mig foreign lifesat lifesat2 uni voctrain labinc labinc2 pgerwzeit pgerwzeit2 nounemp pgexpft pgexpft2 ///
        kids_* badhlth medhlth goodhlth pgpsbil2_1 pgpsbil2_2 pgpsbil2_3 pgpsbil2_5 d11106 own rel3 mardur yearmar_2 yearmar_3

foreach x in 0 1 {
    preserve
    keep if female==`x'
    foreach y in lifesat {
        qui reg d1`y' treat $xvars $exact [weight=w_treat] 
        est store m`x'_1
        qui count if treat==1 & e(sample)
        estadd scalar obs =r(N)
    }
    restore
}

which gives me the following output

Code:

. foreach x in 0 1 {
  2.         preserve
  3.         keep if female==`x'
  4.         foreach y in lifesat {
  5.                 qui reg d1`y' treat $xvars $exact [weight=w_treat] 
  6.                 est store m`x'_1
  7.                 qui count if treat==1 & e(sample)
  8.                 estadd scalar obs =r(N)
  9.         }
 10.         restore
 11. }
(70,644 observations deleted)

added scalar:
                e(obs) =  457
(62,173 observations deleted)

added scalar:
                e(obs) =  538

457 men in treated group and 538 women in the treated group when including all covariates in the analysis.

Now when i run the ttest with the following code (example for men i.e. female==0)

Code:

global xvars_wobula1 age lifesat mig foreign labinc uni 

foreach x in 0 {
    preserve
    keep if female==`x'
    foreach var of varlist $xvars_wobula1 {
        estpost ttest `var', by (treat)
    }
    restore
}

i will only post the output for the age variable ( i only used those 6 variables as a test that i get the code right, the final code for the ttest will include all covariates)

Code:

             |      e(b)   e(count)      e(se)       e(t)    e(df_t)     e(p_l)       e(p)     e(p_u) 
-------------+----------------------------------------------------------------------------------------
         age |  5.766025      64830   .3456512   16.68163      64828          1   2.41e-62   1.20e-62 

             |    e(N_1)    e(mu_1)     e(N_2)    e(mu_2) 
-------------+--------------------------------------------
         age |     64046   43.77368        784   38.00765

which is correct, as 784 represents the number of the controlgroup BEFORE applying the matching step with all covariates and 64046 is the sample size of the control group, also before applying the matching step.

But how do i run ttest for the reduced sample size after matching? ( In this case 457 men as shown above )

Thanks in advance

Tags: None

Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#2

02 Sep 2018, 08:28

a t-test, or any other test, is not a good idea as the results might be statistically significant just because of the sample size

there is literature on how to do this; use -search- to find and download; here are 2 examples: -covbal- (which has some lit references to help) and -pbalchk-
1 like
Comment

Konstantin Hofmann

Join Date: Jul 2018
Posts: 18

02 Sep 2018, 09:49

Thanks for the quick answer. Those commands are very helpful, still the point of my questions remains the same.

2 cases...

(1) When i am not including any covariates in the analysis

sample sizes for men

Treated: 539
Control: 64046

(2) When im including the covariates

Treated: 457
control: 49881

With my approach (t-test) i got the difference between the covariates in (1). My final goal is to get the difference between the covariates in (2)

Now both commands that you mentioned helped me only in part with my problem.

for -covbal-

Code:

            |             Treated             |             Control             |        Balance      
             |      Mean   Variance   Skewness |      Mean   Variance   Skewness |  Std-diff  Var-ratio
-------------+---------------------------------+---------------------------------+----------------------
         age |  40.83807   69.31144   .1243609 |  40.83824   69.16416   .2332763 | -.0000194    1.00213
     lifesat |  6.533917   3.402904  -.6970284 |  6.534025   3.395478  -.9139077 | -.0000586   1.002187
         mig |  .1728665   .1432972   1.730261 |   .172882   .1429967   1.730118 |  -.000041   1.002102
   foreigner |   .107221   .0959346   2.539021 |  .1072159   .0957226   2.539107 |  .0000163   1.002214
      labinc |  38035.69   5.84e+08   2.161632 |  38059.43   5.88e+08   2.300851 | -.0009805   .9917155
         uni |  .2407002   .1831644   1.213074 |  .2407512   .1827937   1.212748 | -.0001191   1.002028
--------------------------------------------------------------------------------------------------------

and for -pbalchk-

Code:

               Mean in treated   Mean in Untreated   Standardised diff.
----------------------------------------------------------------------
         age |           40.84              40.84               -0.000
     lifesat |            6.53               6.53               -0.000
         mig |            0.17               0.17               -0.000
   foreigner |            0.11               0.11                0.000
      labinc |        38035.69           38059.43               -0.001
         uni |            0.24               0.24               -0.000
----------------------------------------------------------------------

This are the results for case (2). When looking at "age", 40,84 is the mean for the 457 men. But my goal is, to get the mean for the 49881 men in the control group before matching.

I know that i can use the option to not use weights and get those results, but that would represent (1) again and not (2)....as shown in here

Code:

               Mean in treated   Mean in Untreated   Standardised diff.
----------------------------------------------------------------------
         age |           40.91              45.01               -0.467
     lifesat |            6.41               7.11               -0.391
         mig |            0.18               0.21               -0.073
   foreigner |            0.12               0.17               -0.150
      labinc |        33934.12           37486.18               -0.118
         uni |            0.22               0.25               -0.074
----------------------------------------------------------------------

Sorry if this is confusing. Basically i just want to get the difference in covariates in case (2) before matching

Comment

Konstantin Hofmann

Join Date: Jul 2018

Posts: 18
#4

02 Sep 2018, 13:04

I managed to solve my question. Thank you very much for the answer, Those two commands are very helpful for testing the matching quality.
Comment

Announcement

Applying ttest for treatment and control group after matching process.

Comment

Comment

Comment