Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference with only three clusters (one untreated, two treated)

    Hi there!

    I am currently working on my dissertation studying how a school-based initiative affects students' mental health. However, I only get students' mental health data for three school districts over six years (4 pre-treatment and 2 post-treatment years ). The policy is adopted at the district level and two of the school districts adopt the initiative in the same year. I have about 1,500 observations per district-year (about 30,000 observations in total) and the data is repeated cross-sectional data.

    I would like to use a difference-in-difference strategy. However, I am not sure which is the correct way to get the standard errors. Give only three clusters, it would be incorrect to use the robust-cluster standard error. One possible way is to cluster the standard error at the grade-district level so I will have 3X4=12 clusters (I do not have information on which schools the students were studied in). However, that is still a very small number of clusters. I was wondering if anyone knows how to deal with the standard errors in my situation? Maybe use wild bootstrap standard errors at grade-district level? Thanks.






  • #2
    Wang:
    welcome to this forum.
    Your number of clusters is actually limited.
    You may want to compare default vs bootstrapped standard errors.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo Lazzaro,

      Thank you for replying so promptly. Yes, the clusters are so limited. Due to data restriction, I could not post the data here but I tried the default and bootstrapped standard errors. Below, I only report the standard errors on the treatment variable (treat). It seems that the bootstrapped standard errors are more consistent with the stata default standard error. If this is true, do you think I should report the default robust standard errors? Thanks.

      Code:
      areg mentalhealth treat i.age i.grade i.sex i.race  i.year, absorb(district) cluster(district)
      Std. Err.=.0050719
      t=-4.43
      p=0.047
      
      areg mentalhealth treat i.age i.grade i.sex i.race  i.year, absorb(district) robust
      Std. Err.=.0149898
      t=-1.50
      p=0.134
      
      
      areg mentalhealth treat i.age i.grade i.sex i.race  i.year, absorb(district) vce(bootstrap)
      Std. Err.=. 0141416
      t=-1.59
      p=0.112

      Comment


      • #4
        Wang:
        unfortunately, this is not the way you're expected to post your results, since, in all likelihood, you do not have only one but many predictors.
        See the following toy-example:
        Code:
        . use "https://www.stata-press.com/data/r16/nlswork.dta"
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . areg ln_wage c.age##c.age, absorb(idcode)
        
        Linear regression, absorbing indicators         Number of obs     =     28,510
        Absorbed variable: idcode                       No. of categories =      4,710
                                                        F(   2,  23798)   =    1451.88
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.6659
                                                        Adj R-squared     =     0.5998
                                                        Root MSE          =     0.3025
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
                     |
         c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
                     |
               _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
        ------------------------------------------------------------------------------
        F test of absorbed indicators: F(4709, 23798) = 8.739         Prob > F = 0.000
        
        . areg ln_wage c.age##c.age, absorb(idcode) vce(cluster idcode)
        
        Linear regression, absorbing indicators         Number of obs     =     28,510
        Absorbed variable: idcode                       No. of categories =      4,710
                                                        F(   2,   4709)   =     423.60
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.6659
                                                        Adj R-squared     =     0.5998
                                                        Root MSE          =     0.3025
        
                                     (Std. Err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .0539076   .0047139    11.44   0.000     .0446661    .0631492
                     |
         c.age#c.age |  -.0005973   .0000788    -7.58   0.000    -.0007517   -.0004429
                     |
               _cons |    .639913   .0683166     9.37   0.000     .5059806    .7738454
        ------------------------------------------------------------------------------
        
        . areg ln_wage c.age##c.age, absorb(idcode) vce(bootstrap)
        (running areg on estimation sample)
        
        Bootstrap replications (50)
        ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
        ..................................................    50
        
        Linear regression, absorbing indicators         Number of obs     =     28,510
        Absorbed variable: idcode                       No. of categories =      4,710
                                                        Replications      =         50
                                                        Wald chi2(2)      =    2504.84
                                                        Prob > chi2       =     0.0000
                                                        R-squared         =     0.6659
                                                        Adj R-squared     =     0.5998
                                                        Root MSE          =     0.3025
        
        ------------------------------------------------------------------------------
                     |   Observed   Bootstrap                         Normal-based
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .0539076   .0029272    18.42   0.000     .0481704    .0596449
                     |
         c.age#c.age |  -.0005973    .000048   -12.45   0.000    -.0006914   -.0005033
                     |
               _cons |    .639913    .043419    14.74   0.000     .5548133    .7250127
        ------------------------------------------------------------------------------
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X