Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping a user-written program that has regressions with different samples

    Hello,

    My question is, if you write a program with regressions that are ran on different samples, does stata only bootstrap on the smallest sample (or the intersection) you have in the program? Here is my problem in detail:

    I am interested in the "difference" generated in the program below:

    Code:
    # year_entry=year-years_teaching
    gen young=(years_teaching<=2)
    
    cap program drop diff
    program diff, rclass
    // 1. The effect of the policy on resid0
    areg resid0 incentivized_instrument i.year i.years_teaching, ab(year_entry)
    local effect_resid0=_b[incentivized_instrument]
    
    // 2. The effect of the policy on other outcomes
    areg `1' incentivized_instrument i.year i.years_teaching, ab(year_entry)
    local effect_other=_b[incentivized_instrument]
    
    // 3. Cross-sectional relationships
    reg `1' VA if years_teaching<=2
    local corr=_b[VA]
    
    return scalar difference=`corr'*`effect_resid0'-`effect_other'
    end
    In this program, the sample of the 3rd regression is different from the other two. See the number of strata by young and my bootstrap below:

    Code:
    . egen iden=group(year young year_entry)
    . egen num_strata=nvals(iden),by(young)
    . tab num_strata young
    
               |         young
    num_strata |         0          1 |     Total
    -----------+----------------------+----------
            27 |         0     29,311 |    29,311
           331 |    86,749          0 |    86,749
    -----------+----------------------+----------
         Total |    86,749     29,311 |   116,060
    
    
    
    set seed 1231
    bootstrap difference_fresid=r(difference), saving(bsdata,replace) strata(year year_entry young) reps(1000): diff fresid
    
    Bootstrap results
    Number of strata   =        24                  Number of obs     =     18,345
                                                    Replications      =      1,000
    
          command:  diff2 fresid
    difference_~d:  r(difference)
    
    -----------------------------------------------------------------------------------
                      |   Observed   Bootstrap                         Normal-based
                      |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
    difference_fresid |   .0307855          .        .       .            .           .
    -----------------------------------------------------------------------------------

    But the bootstrap says that the number of strata=24, which is a little bit smaller than the number of strata in the sample "young==1". So my guess is that when my regression have different samples, the stata bootstrap only randomly take observations from the smallest sample (or the intersection of all my samples?) in my program. This also explains why I got the same difference on all bsamples (don't get any standard error). Any idea why and how I can get around this? I am not sure how the stata package bootstrap works internally.

    Thank you!

    Last edited by Yiren Ding; 22 Mar 2019, 09:42. Reason: bootstrap
Working...
X