Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrap panel with lag variables

    Hi guys,

    I am trying to replicate a two step procedure with bootstrap. It is basically a 2SLS where in the second stage we put residuals instead of fitted values.
    My panel database looks as follows:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(idatc3 Year y_firststage y_second
     1 2004 16.353962  0
     1 2005 16.340631  1
     1 2006 16.286907  0
     1 2007 16.273083  0
     1 2008 16.255836  0
     1 2009 15.925304  0
     1 2010 15.859792  0
     2 2004 18.146786  0
     2 2005 18.185635  1
     2 2006 18.199041  0
     2 2007 18.194345  0
     2 2008 18.213171  0
     2 2009 18.345274  0
     2 2010 18.226433  0
     3 2004 13.705626  3
     3 2005  15.13613  7
     3 2006 16.208109  1
     3 2007 16.591629  3
     3 2008 16.917368  0
     3 2009 17.030642  2
     3 2010 17.209492  5
     4 2004  16.50014 25
     4 2005 16.745003  8
     4 2006  16.75969  6
     4 2007 17.048187 14
     4 2008 17.244335  2
     4 2009 17.343176  0
     4 2010 17.933966  2
     5 2004 17.868816  0
     5 2005 17.693306  0
     5 2006 17.696056  2
     5 2007 17.748852  4
     5 2008  17.76949  0
     5 2009 17.838858  0
     5 2010  17.91857  0
     6 2004  19.07523  0
     6 2005 18.995157  0
     6 2006 18.959908  0
     6 2007 18.898798  1
     6 2008  18.66919  0
     6 2009 18.642822  0
     6 2010 18.255278  0
     7 2004 22.090456  1
     7 2005 22.101965  0
     7 2006 22.116083  2
     7 2007  22.14667  0
     7 2008   22.0739  2
     7 2009  22.06895  0
     7 2010  21.95814  0
     8 2004 17.097506  2
     8 2005  16.42637  2
     8 2006  16.32611  0
     8 2007 16.537579  0
     8 2008 16.272715  0
     8 2009 16.364277  0
     8 2010  15.86466  0
     9 2004 18.293467  2
     9 2005  18.54712  0
     9 2006 18.760092  0
     9 2007 17.585657  0
     9 2008  15.65171  1
     9 2009 15.918745  0
     9 2010 18.139769  0
    10 2004  20.34576  0
    10 2005  20.38664  2
    10 2006  20.52285  1
    10 2007  20.07849  0
    10 2008 19.609467  0
    10 2009  19.61134  1
    10 2010  19.63593  0
    11 2004 15.633678  0
    11 2005 16.186264  0
    11 2006  16.36874  2
    11 2007 16.868628  0
    11 2008 16.864525  0
    11 2009  16.66895  0
    11 2010 16.688057  0
    12 2004 18.850607  2
    12 2005 18.661394  0
    12 2006 18.685404  0
    12 2007 18.935335  0
    12 2008 19.289694  0
    12 2009 19.490005  0
    12 2010 18.965286  0
    13 2004 15.167782  1
    13 2005 15.652504  0
    13 2006 16.465668  0
    13 2007 16.760597  0
    13 2008 16.878334  0
    13 2009  17.17612  0
    13 2010  18.00114  0
    14 2004 19.100143  2
    14 2005 18.629805  0
    14 2006  18.49677  0
    14 2007  18.35715  0
    14 2008 18.183216  0
    14 2009 18.108658  0
    14 2010  18.07056  0
    15 2004 17.956966  0
    15 2005 18.202034  0
    end
    Now I would like to replicate the sample estimates obtained by:
    Code:
    use "/Users/federiconutarelli/Desktop/DB_atc3.dta", clear
    rename tot_count trials
    quietly xtreg y L.major_recalls_norm average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented i.Year, fe vce(cluster idatc3)
    predict residuals, e
    xtpoisson trials y residuals average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented i.Year, fe vce(robust)
    with a bootstrap procedure as follows:

    Code:
    use "/Users/federiconutarelli/Desktop/DB_atc3.dta", clear
    rename tot_count trials
    quietly xtreg y L.major_recalls_norm average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented i.Year, fe vce(cluster idatc3)
    predict residuals, e
    xtpoisson trials y residuals average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented i.Year, fe vce(robust)
    keep if e(sample)
            
    capture program drop onebootrep
    program define onebootrep
        preserve //data will be restored after program termination (quindi ogni volta che il programma finisce i dati sono restored)
            quietly xtreg y L.major_recalls_norm average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented i.Year, fe vce(cluster newid)
            cap drop residuals
            predict residuals, e
            quietly xtpoisson trials y residuals average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented i.Year, fe vce(robust)
    end
    
    
    tsset, clear
    gen newid = idatc3
    xtset newid Year
    
    capture bootstrap _b _se, cluster(idatc3) idcluster(newid) seed(1234) reps(200) saving(coeffse, replace) nodots nowarn: onebootrep
    The problem is that I do not get the expected results (i.e. sample coefficients are different from bootstrapped coefficients). I read about issues in using lag variables with bootstrap. Can someone please help me?

    Thanks,

    Federico

  • #2
    can you show the report you are obtaining?
    Both with and without bootstrap.
    That may provide more hints into what may be happening
    Fernando

    Comment


    • #3
      FernandoRios
      Thanks for the reply.
      Here are the outputs:

      Code:
      ***********OUTPUT OF THE PROCEDURE IN SAMPLE
      
      . xtpoisson trials y residuals average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_
      > squared hhi share_expired share_patented i.Year, fe vce(robust)
      note: 16 groups (96 obs) dropped because of all zero outcomes
      
      Iteration 0:   log pseudolikelihood = -662.80877  
      Iteration 1:   log pseudolikelihood = -536.13751  
      Iteration 2:   log pseudolikelihood = -531.42644  
      Iteration 3:   log pseudolikelihood = -531.38266  
      Iteration 4:   log pseudolikelihood = -531.38265  
      
      Conditional fixed-effects Poisson regression    Number of obs      =       492
      Group variable: idatc3                          Number of groups   =        82
      
                                                      Obs per group: min =         6
                                                                     avg =       6.0
                                                                     max =         6
      
                                                      Wald chi2(14)      =    165.22
      Log pseudolikelihood  = -531.38265              Prob > chi2        =    0.0000
      
                                                 (Std. Err. adjusted for clustering on idatc3)
      ----------------------------------------------------------------------------------------
                             |               Robust
                      trials |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -----------------------+----------------------------------------------------------------
                           y |   -.013124   1.117276    -0.01   0.991    -2.202944    2.176696
                   residuals |   .6477263   1.169505     0.55   0.580    -1.644462    2.939914
      average_age_prodbyatc3 |   .0808345   .1003047     0.81   0.420    -.1157592    .2774281
                  avg_prd_sq |  -.0020083   .0047398    -0.42   0.672    -.0112981    .0072816
          mean_agefirm_byatc |   .1352506   .1185617     1.14   0.254     -.097126    .3676272
        mean_agefirm_squared |  -.0022593   .0013686    -1.65   0.099    -.0049418    .0004232
                         hhi |  -.0051958   1.332918    -0.00   0.997    -2.617668    2.607276
               share_expired |   .6503668   1.769393     0.37   0.713     -2.81758    4.118313
              share_patented |  -1.808324   2.323622    -0.78   0.436    -6.362538    2.745891
                             |
                        Year |
                       2006  |  -.1769479   .2048165    -0.86   0.388    -.5783808     .224485
                       2007  |  -.0474872   .2311147    -0.21   0.837    -.5004637    .4054892
                       2008  |  -.7286098   .2858094    -2.55   0.011    -1.288786   -.1684336
                       2009  |  -1.156776   .2913193    -3.97   0.000    -1.727751   -.5858004
                       2010  |  -1.710206   .3746907    -4.56   0.000    -2.444586   -.9758254
      ----------------------------------------------------------------------------------------
      
      
      *********OUTPUT OF THE BOOTSTRAP PROCEDURE
      . sum _all
      
          Variable |       Obs        Mean    Std. Dev.       Min        Max
      -------------+--------------------------------------------------------
        trials_b_y |       197    .4664661    5.548068  -23.92481   30.99598
      trials_b_r~s |       197    .2881713    5.496946  -30.20622   23.99414
      trials_b_a~3 |       197    -.090372      .40962  -3.346032   1.435056
      trials_b_a~q |       197    .0030964    .0224439  -.0614444   .2029107
      trials_b_m~c |       197    .0934635    .3281871  -1.029551   1.047479
      -------------+--------------------------------------------------------
      trials_b_m~d |       197   -.0015979    .0041745  -.0145401   .0120159
      trials_b_hhi |       197   -.4095349    6.528062  -46.62569   31.10167
      trials_b_s.. |       197    2.912103    5.836847  -8.312901   40.47133
      trials_b~ted |       197    .1897212    11.11164  -47.92975   95.60609
            _bs_10 |       197           0           0          0          0
      -------------+--------------------------------------------------------
            _bs_11 |       197    .1498571    .4014745  -1.452932   2.290584
            _bs_12 |       197   -.4702015    .5076519  -2.117427   2.205099
            _bs_13 |       197   -.9370973    .7370731  -4.303524   2.829172
            _bs_14 |       197   -1.623705    1.190134  -6.909701   4.450798
       trials_se_y |       197    2.723217    4.100187   .4866089   43.13839
      -------------+--------------------------------------------------------
      trials_se_~s |       197    2.733679    4.093435   .5141646   43.16332
      trials_se_~3 |       197    .2628804    .2760101    .093289   3.278431
      trials_se_~q |       197    .0136616    .0131062   .0049866   .1413421
      trials_se_~c |       197    .2859746    .1024594   .0938284   .8021792
      trials_se_.. |       197    .0036541    .0013485   .0011378   .0100671
      -------------+--------------------------------------------------------
      trials_se_~i |       197    3.170534    5.084672   .7052852   53.22807
      trials_se_.. |       197     4.15096    4.999714   1.775544     66.724
      trials_s~ted |       197    5.717689    12.73756   1.683599   165.0252
            _bs_24 |       197           0           0          0          0
            _bs_25 |       197    .2748566    .3226996   .0978945   3.827058
      -------------+--------------------------------------------------------
            _bs_26 |       197    .3775587    .3191886   .1369121   2.745712
            _bs_27 |       197    .4823589    .4690307    .185301    4.18394
            _bs_28 |       197     .703846     .772362   .2322593   7.645568
      Thanks again!

      Federico

      Comment


      • #4
        An update: so basically it seems that sample is making newid wrongly. In particular, by generating the following:

        Code:
        use "/Users/federiconutarelli/Desktop/DB_atc3.dta", clear
        rename tot_count trials
        
        local varx "idatc3 Year y trials major_recalls_norm average_age_prodbyatc3 avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_expired share_patented"
        keep `varx'
        
        set more off
        set seed 1234
        global nrepl 20
        
        forval i=1/$nrepl {
            capture bsample, cluster(idatc3) idcluster(newid) strata(idatc3)
            if (_rc > 0 )  {   // error
                    cd "/Users/federiconutarelli/Desktop/error"
                    local ername = "error" + string(trunc(runiform() * 1e5))  // kludgy unique filename
                    save "`ername'"
                }
                else { 
                    cd "/Users/federiconutarelli/Desktop/sample"
                    local fname = "sample" + "`i'"
                    save "`fname'"
                }
        }
        I spotted that after the 14th iteration an error appears as follows: "singleton cluster detected". For what concerns the samples without errors, newid is messed up: sometimes it divides idatc3 (the cluster id) in the right way (i.e. if an id is drawn 3 times, newid takes three different values), but sometimes it does not (i.e. it replicates the cluster id). Don't know why yet.

        Thanks again

        Comment


        • #5
          I get this error (singleton cluster detected) when I run bsample too. How did you solve it?

          Comment

          Working...
          X