Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Applying ttest for treatment and control group after matching process.

    Hello guys,

    for my thesis, i am using a combination of matching and DiD.

    As i want to show the matching quality after my matching step, i want to run "ttest" to see if the difference in the covariates are significant between the treatment and the control group.

    Depending on what covariate i am adding to the analysis, the sample size is decreasing (as not every variable is observable for every year i have in my data).

    My question is. How do i run a ttest for the "reduced" sample size?

    To make my point clear, i will try to show you the problem.

    I am running this code to get my estimates...

    Code:
    set more off
     
    global exact welle_* 
    
    global xvars age age2 mig foreign lifesat lifesat2 uni voctrain labinc labinc2 pgerwzeit pgerwzeit2 nounemp pgexpft pgexpft2 ///
            kids_* badhlth medhlth goodhlth pgpsbil2_1 pgpsbil2_2 pgpsbil2_3 pgpsbil2_5 d11106 own rel3 mardur yearmar_2 yearmar_3
    
    foreach x in 0 1 {
        preserve
        keep if female==`x'
        foreach y in lifesat {
            qui reg d1`y' treat $xvars $exact [weight=w_treat] 
            est store m`x'_1
            qui count if treat==1 & e(sample)
            estadd scalar obs =r(N)
        }
        restore
    }
    which gives me the following output

    Code:
    . foreach x in 0 1 {
      2.         preserve
      3.         keep if female==`x'
      4.         foreach y in lifesat {
      5.                 qui reg d1`y' treat $xvars $exact [weight=w_treat] 
      6.                 est store m`x'_1
      7.                 qui count if treat==1 & e(sample)
      8.                 estadd scalar obs =r(N)
      9.         }
     10.         restore
     11. }
    (70,644 observations deleted)
    
    added scalar:
                    e(obs) =  457
    (62,173 observations deleted)
    
    added scalar:
                    e(obs) =  538
    457 men in treated group and 538 women in the treated group when including all covariates in the analysis.

    Now when i run the ttest with the following code (example for men i.e. female==0)

    Code:
    global xvars_wobula1 age lifesat mig foreign labinc uni 
    
    foreach x in 0 {
        preserve
        keep if female==`x'
        foreach var of varlist $xvars_wobula1 {
            estpost ttest `var', by (treat)
        }
        restore
    }
    i will only post the output for the age variable ( i only used those 6 variables as a test that i get the code right, the final code for the ttest will include all covariates)

    Code:
                 |      e(b)   e(count)      e(se)       e(t)    e(df_t)     e(p_l)       e(p)     e(p_u) 
    -------------+----------------------------------------------------------------------------------------
             age |  5.766025      64830   .3456512   16.68163      64828          1   2.41e-62   1.20e-62 
    
                 |    e(N_1)    e(mu_1)     e(N_2)    e(mu_2) 
    -------------+--------------------------------------------
             age |     64046   43.77368        784   38.00765
    which is correct, as 784 represents the number of the controlgroup BEFORE applying the matching step with all covariates and 64046 is the sample size of the control group, also before applying the matching step.

    But how do i run ttest for the reduced sample size after matching? ( In this case 457 men as shown above )

    Thanks in advance

  • #2
    a t-test, or any other test, is not a good idea as the results might be statistically significant just because of the sample size

    there is literature on how to do this; use -search- to find and download; here are 2 examples: -covbal- (which has some lit references to help) and -pbalchk-

    Comment


    • #3
      Thanks for the quick answer. Those commands are very helpful, still the point of my questions remains the same.

      2 cases...

      (1) When i am not including any covariates in the analysis

      sample sizes for men

      Treated: 539
      Control: 64046

      (2) When im including the covariates

      Treated: 457
      control: 49881

      With my approach (t-test) i got the difference between the covariates in (1). My final goal is to get the difference between the covariates in (2)

      Now both commands that you mentioned helped me only in part with my problem.

      for -covbal-

      Code:
                  |             Treated             |             Control             |        Balance      
                   |      Mean   Variance   Skewness |      Mean   Variance   Skewness |  Std-diff  Var-ratio
      -------------+---------------------------------+---------------------------------+----------------------
               age |  40.83807   69.31144   .1243609 |  40.83824   69.16416   .2332763 | -.0000194    1.00213
           lifesat |  6.533917   3.402904  -.6970284 |  6.534025   3.395478  -.9139077 | -.0000586   1.002187
               mig |  .1728665   .1432972   1.730261 |   .172882   .1429967   1.730118 |  -.000041   1.002102
         foreigner |   .107221   .0959346   2.539021 |  .1072159   .0957226   2.539107 |  .0000163   1.002214
            labinc |  38035.69   5.84e+08   2.161632 |  38059.43   5.88e+08   2.300851 | -.0009805   .9917155
               uni |  .2407002   .1831644   1.213074 |  .2407512   .1827937   1.212748 | -.0001191   1.002028
      --------------------------------------------------------------------------------------------------------
      and for -pbalchk-

      Code:
                     Mean in treated   Mean in Untreated   Standardised diff.
      ----------------------------------------------------------------------
               age |           40.84              40.84               -0.000
           lifesat |            6.53               6.53               -0.000
               mig |            0.17               0.17               -0.000
         foreigner |            0.11               0.11                0.000
            labinc |        38035.69           38059.43               -0.001
               uni |            0.24               0.24               -0.000
      ----------------------------------------------------------------------
      This are the results for case (2). When looking at "age", 40,84 is the mean for the 457 men. But my goal is, to get the mean for the 49881 men in the control group before matching.

      I know that i can use the option to not use weights and get those results, but that would represent (1) again and not (2)....as shown in here

      Code:
                     Mean in treated   Mean in Untreated   Standardised diff.
      ----------------------------------------------------------------------
               age |           40.91              45.01               -0.467
           lifesat |            6.41               7.11               -0.391
               mig |            0.18               0.21               -0.073
         foreigner |            0.12               0.17               -0.150
            labinc |        33934.12           37486.18               -0.118
               uni |            0.22               0.25               -0.074
      ----------------------------------------------------------------------
      Sorry if this is confusing. Basically i just want to get the difference in covariates in case (2) before matching

      Comment


      • #4
        I managed to solve my question. Thank you very much for the answer, Those two commands are very helpful for testing the matching quality.

        Comment

        Working...
        X